Skip to main content

Node Overview

Nodes provide template structure for performing data extraction, processing, and API communication. The diagram below shows how a node performs a base function, but exposes an interface for user-defined code to enable tailoring.

Example node layout

In Jupyter notebooks associated backing editable nodes, user-defined SQL and user-defined Python cells share a light blue background and are committed to a repo + deployed to a workflow orchestrator upon clicking on the button in the toolbar, or by pressing Cmd+S or Ctrl+S (Windows/Linux).

Key Node Characteristics

A useful way to interact with nodes is to consider how nodes interact with files and APIs, their input and output types, and whether or not they contain a user-editable component. The table below shows the full list of available nodes, along with whether there is a user-editable component associated with the node.

Available Nodes with Key Characteristics

The table below lists available nodes.

  • Is Editable indicates that the node has a user-editable components
  • Is Multi indicates that the node can accept multiple inputs.
CategoryNameInput TypesOutput TypesIs EditableIs Multi
AnalysisBranch_PythonTable(s) and/or File(s)NodeReturnTrueFalse
AnalysisPythonTable(s) and/or File(s)NodeReturnTrueFalse
AnalysisTrigger_PythonTable(s) and/or File(s)FlowInputsTrueFalse
AppBenchling_EventEventFlowInputs or NodeReturnTrueFalse
AppBenchling_Write_ObjectTable(s) and/or File(s)NodeReturnTrueFalse

Node Categories

  • App: Accesses third-party APIs for processing; in many cases, key exchange between third-party and Ganymede are necessary for functionality
  • Analysis: Performs Python / SQL manipulations
  • Instrument: Lab instrument-specific functions
  • File: For ETL operations on data of specified type into Ganymede cloud
  • Tag: For specifying parameters at flow runtime

Input and Output Types for Nodes

NodeReturn Object

Many nodes return a NodeReturn object, which contain tables and files to store in the Ganymede data lake.

Initializing a NodeReturn object involves passing the following parameters:

  • param tables_to_upload: Optional[dict[str, pd.DataFrame]] - Tables keyed by name to store in Ganymede
  • param files_to_upload: Optional[dict[str, bytes]] - Files keyed by filename to store in Ganymede
  • param if_exists: str - String indicating whether to overwrite or append to existing tables in Ganymede data lake. Valid values are "replace", "append", or "fail"; defaults to "replace".
  • param tables_measurement_units: Optional[dict[str, pd.DataFrame]] - if provided, the measurement units for columns; keys are table names, values are pandas DataFrames with "column_name" and "unit" as columns
  • param file_location: Optional[str] - Bucket location to output to, either "input" or "output; only need to specify if files_to_upload is not null, by default output
  • param wait_for_job: whether to wait for write to Ganymede data lake before continuing execution to subsequent node, by default False
  • param tags: Optional[dict[list[dict] | dict]]: Dictionary of files to tag; keys are file names, values are dictionaries of keyword parameters to pass into the add_file_tag function.
NodeReturn Example

The contents of a NodeReturn object can be viewed in the notebook; transposed table heads and list of files are displayed when the object is observed. For example, running the following code in an editor notebook

import pandas as pd

def execute():
message = "Message to store in file"
byte_message = bytes(message, "utf-8")

df = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})

# upload a table named 'my_table' and a file named 'my_file'
return NodeReturn(files_to_upload={"my_file.txt": message}, tables_to_upload={"my_table": df})


returns the following summary of the NodeReturn object:

Rendered NodeReturn object

Docstrings and source code can be viewed by typing ?NodeReturn and ??NodeReturn respectively in a cell in the editor notebook.

NodeReturn Example with Tags

The following code demonstrates how to use the tags parameter in a NodeReturn object:

import pandas as pd

def execute():
message = "Message to store in file"
byte_message = bytes(message, "utf-8")

df = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})

# Tags are added to the file 'my_file.txt'
# Any parameters that can be passed into the add_file_tag function can be passed into the tags parameter
# of the NodeReturn object. For more information on the add_file_tag function, see the Tags page.
# Note that the input_file_path parameter within the add_file_tag function does not need to be specified
return NodeReturn(files_to_upload={"my_file.txt": message}, tables_to_upload={"my_table": df},
tags={"my_file.txt": [{"tag_type_id": "Experiment ID", "display_value": "EXP005"}]})


FlowInputs object

Nodes that trigger other flows return a FlowInputs object, which specifies the inputs to the triggered flow.

Initializing a FlowInputs object involves passing the following parameters, which are found in

  • param files: Optional[List[FlowInputFile]] - Name of flow to trigger
  • param params: Optional[List[FlowInputParam]] - Inputs to pass to triggered flow
  • param tags: Optional[List[Tag]] - Tags to pass to triggered flow

FlowInputFile is a dataclass used to pass file(s) into a node. It has the following attributes:

  • param node_name: str - Name of node within triggered flow to pass file(s) into
  • param param_name: str - Node parameter in the triggered flow node that is used to specify the string pattern that the filename must match (e.g. - "csv" for the CSV_Read node)
  • param files: Dict[str, bytes] - Files to pass into node

FlowInputParam is a dataclass used to pass parameters into a node. It has the following attributes:

  • param node_name: str - Name of node within triggered flow to pass parameter(s) into
  • param param_name: str - Node parameter in the triggered flow node that is used to specify the string pattern that the parameter must match ("param" for the Input_Param node)
  • param param_value: str - Value to pass into node

Tag is a dataclass used to pass tags into a node. It has the following attributes:

  • param node_name: str - Name of node within triggered flow to pass tag(s) into
  • param display_tag: str - Value displayed in the dropdown in Ganymede UI. For Benchling_Tag nodes, this is the name of the tag displayed in the dropdown in Flow View / Flow Editor.
  • param run_tag: str - Underlying value of the tag. For Benchling_Tag nodes, this is the Benchling ID associated with the value selected in the dropdown.

Other input/output types

Some other input and output types characteristic to nodes are:

  • Table: Tabular data retrieved from or passed to tenant-specific Ganymede data lake. Tables are retrieved from Ganymede data lake via ANSI SQL queries, and are passed to Ganymede data lake as pandas DataFrames
  • API: access via third-party API
  • File-related inputs/outputs: File of specified type.
    • FileAVI: AVI file
    • FileCSV: CSV file
    • FileExcel: Excel file (xls, xlsx, ..)
    • FileImage: Image file (png, bmp, ..)
    • FileHDF5: HDF5 file
    • FileXML: XML file
    • FileZip: Zip file
    • FileAny: generic data file, which may be unstructured
  • TagBenchling: Benchling run tag
  • string: String parameter

Set, List, and Dict correspond to Python sets, lists, and dictionaries respectively.

Optional indicates that the input or output is optional.

User-editable Nodes

User-editable nodes present an interface for modifying and testing code that is executed by the workflow management system. These Jupyter notebooks are split into the following sections:

  • Node Description: A short blurb about the node that the user-editable function corresponds to
  • Node Input Data: For nodes that retrieve tabular data from the data lake as input, the query string in this cell specifies the query (-ies) that are executed and presented to the user-defined function for processing.
  • User-Defined Function: The execute function within this cell processes data. The workflow management system calls the execute function within this cell during flow execution.

The execute function may call classes and functions found within the User-Defined Function cell.

  • Testing Section: The cells in this section can be used for testing modifications to the SQL query and user-defined python function. This enables rapid iteration on user-defined code; after necessary edits are made, changes can be saved in by clicking on the button in the toolbar or selecting Save Commit and Deploy from the Kernel menu.

List of Available Nodes

CategoryNameBrief Description
AnalysisBranch_PythonProcess data with Python and conditionally execute downstream nodes
AnalysisPythonProcess data with python
AnalysisRunContainerRun container node
AnalysisTransform_SQLSQL analysis Function
AnalysisTransform_pyManipulate data with python
AnalysisTrigger_PythonProcess data with Python and trigger subsequent flow
AppAPINodeGeneric API Access Node
AppAirtableExportExport data from Ganymede data lake to Airtable
AppAirtableImportImport data from Airtable into Ganymede data lake
AppAzure_QueryQuery data from Azure SQL Server
AppAzure_ReadRead data from Azure Blob Storage
AppAzure_Read_MultiRead all data from Azure Blob Storage
AppAzure_WriteWrite data to Azure Blob storage
AppBenchling_ApiRead Benchling data into data lake
AppBenchling_EventCapture events from Benchling for triggering flows or saving data
AppBenchling_ReadRead Benchling data into data lake using run tag
AppBenchling_Read_ObjectRead Benchling data into data lake using object ID
AppBenchling_Warehouse_QueryQuery Benchling Warehouse from Ganymede
AppBenchling_Warehouse_SyncSync Benchling Warehouse to Ganymede
AppBenchling_WriteWrite to Benchling
AppBenchling_Write_ObjectWrite object to Benchling
AppCoda_WriteWrite Coda tables
AppELabNext_WriteCreate and write eLabNext entry
AppLoad_Parquet_to_TableCreate datalake table from parquet files
AppS3_EventCapture events from AWS S3 for triggering flows
AppS3_ReadIngest data into Ganymede data storage from AWS S3 storage
AppS3_WriteWrite data to an S3 bucket
AppSciNote_APICreate and write SciNote entry
AppSmartsheet_ReadRead sheet from Smartsheet
AppSnowflake_WriteSync tables in Ganymede data lake to Snowflake
FileAVI_ReadRead in contents of an AVI file to a table
FileAVI_Read_MultiRead in contents of multiple avi files to a table
FileCSV_ReadRead in contents of a CSV file
FileCSV_Read_MultiRead in contents of multiple CSV files
FileCSV_WriteWrite table to CSV file
FileExcel_ReadRead Excel spreadsheet
FileExcel_Read_MultiRead Excel spreadsheets
FileExcel_WriteWrite Excel spreadsheet
FileFCS_Extract_LoadLoad FCS file to data lake
FileHDF5_ReadRead HDF5 data
FileImage_ReadProcess image data; store processed images to data store
FileImage_Read_MultiProcess image data for multiple images; store processed images to data store
FileImage_WriteProcess tabular data; write an image to data lake
FileInput_FileRead data file and process in Ganymede
FileInput_File_MultiRead data files and process in Ganymede
FilePDF_ReadRead in contents of an PDF file to a table
FilePDF_Read_MultiRead in contents of multiple pdf files to a table
FilePowerpoint_WriteProcess tabular data; write a powerpoint presentation to data lake
FileXML_ReadRead XML file into data lake
FileZip_ReadExtract Zip file
InstrumentInstron_Tensile_ReadLoad .is_tens file to data lake
InstrumentLCMS_ReadRead and process LCMS file in mzML format
InstrumentLCMS_Read_MultiRead and process multiple LCMS files
InstrumentLC_ReadRead and process an Agilent Chemstation / MassStation HPLC file
InstrumentLC_Read_MultiRead and process multiple Agilent Chemstation / MassStation HPLC files
InstrumentProfilometer_ReadRead Mx Profiler data file
InstrumentSynergy_ReadLoad Synergy text file to data lake
InstrumentSynergy_Read_MultiLoad multiple Synergy texts file to data lake
InstrumentWSP_ReadRead FlowJo WSP file into data lake
TagBenchling_TagRead benchling tag
TagInput_ParamInput parameter into Flow