Skip to main content

Node Overview

Nodes provide template structure for performing data extraction, processing, and API communication. The diagram below shows how a node performs a base function, but exposes an interface for user-defined code to enable tailoring.

Example node layout

In Jupyter notebooks associated backing editable nodes, user-defined SQL and user-defined Python cells share a light blue background and are committed to a repo + deployed to a workflow orchestrator upon running the Save Pipeline Code cell at the bottom of the notebook, or by pressing Cmd+Shift+S or Ctrl+Shift+S (Windows/Linux).

Key Node Characteristics

A useful way to interact with nodes is to consider how nodes interact with other systems, their input and output types, and whether or not they contain a user-editable component. The table below shows the full list of available nodes, along with whether there is a user-editable component associated with the node.

Available Nodes with Key Characteristics

The table below lists available nodes.

  • Is Editable indicates that the node has a user-editable components
  • Is Multi indicates that the node can accept multiple inputs.
CategoryNameInput TypesOutput TypesIs EditableIs Multi
AnalysisBranch_PythonTable(s) and/or File(s)NodeReturnTrueFalse
AnalysisPythonTable(s) and/or File(s)NodeReturnTrueFalse
AnalysisRunContainerFileAnyFileAnyTrueFalse
AnalysisTransform_SQLTableTableTrueFalse
AnalysisTransform_pyTable or List[Table]NodeReturnTrueFalse
AnalysisTrigger_PythonTable(s) and/or File(s)FileAnyTrueFalse
AppAPINodeAPINodeReturnTrueFalse
AppAirtableExportTableAPITrueFalse
AppAirtableImportAPINodeReturnFalseFalse
AppAzure_QueryAPINodeReturnTrueFalse
AppAzure_ReadAPIFileAnyFalseFalse
AppAzure_Read_MultiAPIFileAnyFalseTrue
AppAzure_WriteFileAnyAPIFalseFalse
AppBenchling_ApiAPINodeReturnTrueFalse
AppBenchling_EventAppFileAnyTrueFalse
AppBenchling_ReadAPINodeReturnTrueFalse
AppBenchling_Read_ObjectAPINodeReturnTrueFalse
AppBenchling_WriteTableNodeReturnTrueFalse
AppBenchling_Write_ObjectOptional[FileAny or List[FileAny]] and Optional[Table or List[Table]]NodeReturnTrueFalse
AppCoda_WriteTableNodeReturnTrueFalse
AppELabNext_WriteTableNodeReturnTrueFalse
AppLoad_Parquet_to_TableAPIFileAnyFalseFalse
AppS3_EventAppFileAnyTrueFalse
AppS3_ReadAPIFileAnyFalseFalse
AppS3_WriteFileAnyAPIFalseFalse
AppSciNote_APITableNodeReturnTrueFalse
AppSmartsheet_ReadAPINodeReturnTrueFalse
AppSnowflake_WriteTableNodeReturnFalseFalse
FileAVI_ReadFileAVINodeReturnTrueFalse
FileAVI_Read_MultiSet[FileAVI]NodeReturnTrueTrue
FileCSV_ReadFileCSVNodeReturnTrueFalse
FileCSV_Read_MultiSet[FileCSV]NodeReturnTrueTrue
FileCSV_WriteTable or List[Table]NodeReturnTrueFalse
FileExcel_ReadFileExcelNodeReturnTrueFalse
FileExcel_Read_MultiSet[FileExcel]NodeReturnTrueTrue
FileExcel_WriteTable or List[Table]NodeReturnTrueFalse
FileFCS_Extract_LoadFileFCSNodeReturnTrueFalse
FileHDF5_ReadFileHDF5NodeReturnTrueFalse
FileImage_ReadFileImageNodeReturnTrueFalse
FileImage_Read_MultiList[FileImage]NodeReturnTrueTrue
FileImage_WriteTable or List[Table]NodeReturnTrueFalse
FileInput_FileFileAnyNodeReturnTrueFalse
FileInput_File_MultiSet[FileAny]NodeReturnTrueTrue
FilePDF_ReadFilePDFNodeReturnTrueFalse
FilePDF_Read_MultiSet[FilePDF]NodeReturnTrueTrue
FilePowerpoint_WriteTable or List[Table]NodeReturnTrueFalse
FileXML_ReadFileXMLNodeReturnTrueFalse
FileZip_ReadFileZipNodeReturnTrueFalse
InstrumentAgilent_HPLC_ReadFileNodeReturnTrueFalse
InstrumentInstron_Tensile_ReadFileIsTensNodeReturnTrueFalse
InstrumentProfilometer_ReadFileHDF5NodeReturnTrueFalse
InstrumentSynergy_ReadFileTxtNodeReturnTrueFalse
InstrumentSynergy_Read_MultiList[FileTxt]NodeReturnTrueTrue
TagBenchling_TagTagBenchlingstringFalseFalse
TagInput_ParamstringstringFalseFalse

Node Categories

  • App: Accesses third-party APIs for processing; in many cases, key exchange between third-party and Ganymede are necessary for functionality
  • Analysis: Performs Python / SQL manipulations
  • Instrument: Lab instrument-specific functions
  • File: For ETL operations on data of specified type into Ganymede cloud
  • Tag: For specifying parameters at flow runtime

Node Input/Output Types

Input/output types are split into the following categories:

  • NodeReturn: Object holding tables and files to return from Node. Initializing a NodeReturn object involves passing the following parameters:
    • tables_to_upload: Dictionary of Pandas DataFrames keyed by table name to store in Ganymede
    • files_to_upload: Dictionary of files (bytes objects) keyed by file name to store in Ganymede
example

Running the following in a cell in the editor notebook

import pandas as pd

def execute():
message = "Message to store in file"
byte_message = bytes(message, "utf-8")

df = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})

# upload a table named 'my_table' and a file named 'my_file'
return NodeReturn(files_to_upload={"my_file": message}, tables_to_upload={"my_table": df})

execute()

returns the following summary of the NodeReturn object:

Rendered NodeReturn object

Docstrings and source code can be viewed by typing ?NodeReturn and ??NodeReturn respectively in a cell in the editor notebook.

  • Table: Tabular data retrieved from or passed to tenant-specific Ganymede data lake. Tables are retrieved from Ganymede data lake via ANSI SQL queries, and are passed to Ganymede data lake as pandas DataFrames
  • API: access via third-party API
  • File-related inputs/outputs: File of specified type.
    • FileAVI: AVI file
    • FileCSV: CSV file
    • FileExcel: Excel file (xls, xlsx, ..)
    • FileImage: Image file (png, bmp, ..)
    • FileHDF5: HDF5 file
    • FileXML: XML file
    • FileZip: Zip file
    • FileAny: generic data file, which may be unstructured
  • TagBenchling: Benchling run tag
  • string: String parameter

Set, List, and Dict correspond to Python sets, lists, and dictionaries respectively.

Optional indicates that the input or output is optional.

User-editable Nodes

User-editable nodes present an interface for modifying and testing code that is executed by the workflow management system. These Jupyter notebooks are split into the following sections:

  • Node Description: A short blurb about the node that the user-editable function corresponds to
  • Node Input Data: For nodes that retrieve tabular data from the data lake as input, the query string in this cell specifies the query (-ies) that are executed and presented to the user-defined function for processing.
  • User-Defined Function: The execute function within this cell processes data. The workflow management system calls the execute function within this cell during flow execution.
info

The execute function may call classes and functions found within the User-Defined Function cell.

  • Save Pipeline Code: This cell stores any changes to the Node Input Data (if present) and User-Defined Function cells.

  • Testing Section: The cells in this section can be used for testing modifications to the SQL query and user-defined python function. This enables rapid iteration on user-defined code; after necessary edits are made, changes can be saved in by running the Save Pipeline Code cell.

List of Available Nodes

CategoryNameBrief Description
AnalysisBranch_PythonProcess data with Python and conditionally execute downstream nodes
AnalysisPythonProcess data with python
AnalysisRunContainerRun container node
AnalysisTransform_SQLSQL analysis Function
AnalysisTransform_pyManipulate data with python
AnalysisTrigger_PythonProcess data with Python and trigger subsequent flow
AppAPINodeGeneric API Access Node
AppAirtableExportExport data from Ganymede data lake to Airtable
AppAirtableImportImport data from Airtable into Ganymede data lake
AppAzure_QueryQuery data from Azure SQL Server
AppAzure_ReadRead data from Azure Blob Storage
AppAzure_Read_MultiRead all data from Azure Blob Storage
AppAzure_WriteWrite data to Azure Blob storage
AppBenchling_ApiRead Benchling data into data lake
AppBenchling_EventCapture events from Benchling for triggering flows
AppBenchling_ReadRead Benchling data into data lake using run tag
AppBenchling_Read_ObjectRead Benchling data into data lake using object ID
AppBenchling_WriteWrite to Benchling
AppBenchling_Write_ObjectWrite object to Benchling
AppCoda_WriteWrite Coda tables
AppELabNext_WriteCreate and write eLabNext entry
AppLoad_Parquet_to_TableCreate datalake table from parquet files
AppS3_EventCapture events from AWS S3 for triggering flows
AppS3_ReadIngest data into Ganymede data storage from AWS S3 storage
AppS3_WriteWrite data to an S3 bucket
AppSciNote_APICreate and write SciNote entry
AppSmartsheet_ReadRead sheet from Smartsheet
AppSnowflake_WriteSync tables in Ganymede data lake to Snowflake
FileAVI_ReadRead in contents of an AVI file to a table
FileAVI_Read_MultiRead in contents of multiple avi files to a table
FileCSV_ReadRead in contents of a CSV file
FileCSV_Read_MultiRead in contents of multiple CSV files
FileCSV_WriteWrite table to CSV file
FileExcel_ReadRead Excel spreadsheet
FileExcel_Read_MultiRead Excel spreadsheets
FileExcel_WriteWrite Excel spreadsheet
FileFCS_Extract_LoadLoad FCS file to data lake
FileHDF5_ReadRead HDF5 data
FileImage_ReadProcess image data; store processed images to data store
FileImage_Read_MultiProcess image data for multiple images; store processed images to data store
FileImage_WriteProcess tabular data; write an image to data lake
FileInput_FileRead data file and process in Ganymede
FileInput_File_MultiRead data files and process in Ganymede
FilePDF_ReadRead in contents of an PDF file to a table
FilePDF_Read_MultiRead in contents of multiple pdf files to a table
FilePowerpoint_WriteProcess tabular data; write a powerpoint presentation to data lake
FileXML_ReadRead XML file into data lake
FileZip_ReadExtract Zip file
InstrumentAgilent_HPLC_ReadRead an Agilent HPLC file of type .uv, .ms, .ch, or .bin
InstrumentInstron_Tensile_ReadLoad .is_tens file to data lake
InstrumentProfilometer_ReadRead Mx Profiler data file
InstrumentSynergy_ReadLoad Synergy text file to data lake
InstrumentSynergy_Read_MultiLoad multiple Synergy texts file to data lake
TagBenchling_TagRead benchling tag
TagInput_ParamInput parameter into Flow