Skip to main content

Node Overview

Nodes provide template structure for performing data extraction, processing, and API communication. The diagram below shows how a Node performs a base function, but exposes an interface for user-defined code to enable tailoring.

Example node layout

In Jupyter notebooks, Nodes that are editable by users are marked with light blue backgrounds for user-defined SQL and Python cells. These cells can be committed to a repository and deployed to a workflow orchestrator by clicking the button in the toolbar or by pressing Cmd+Shift+S (macOS) or Ctrl+Shift+S (Windows/Linux).

Key Characteristics of Nodes

Understanding how Nodes interact with files, APIs, and their input/output types is essential for effective usage. Nodes may also have a user-editable component, which is indicated in the table below.

List of Available Nodes and Key Characteristics

The table below provides a list of available Nodes, along with details on whether they include a user-editable component and if they can handle multiple inputs.

  • Is Editable: Indicates that the Node includes a user-editable component.
  • Is Multi: Indicates if the Node can accept multiple inputs.
CategoryNameInput TypesOutput TypesIs EditableIs Multi
AnalysisBranch_PythonTable(s) and/or File(s)NodeReturnTrueFalse
AnalysisPythonTable(s) and/or File(s)NodeReturnTrueFalse
AnalysisTransform_SQLTable(s)TableTrueFalse
AnalysisTrigger_PythonTable(s) and/or File(s)FlowInputsTrueFalse
AppAPINodeAPINodeReturnTrueFalse
AppAirtable_ExportTableAPITrueFalse
AppAirtable_ImportAPINodeReturnTrueFalse
AppAzure_QueryTableNodeReturnTrueFalse
AppAzure_ReadAPIFileAnyFalseFalse
AppAzure_Read_MultiAPIFileAnyFalseTrue
AppAzure_WriteFileAnyAPIFalseFalse
AppBenchling_ApiAPINodeReturnTrueFalse
AppBenchling_EventEventFlowInputs or NodeReturnTrueFalse
AppBenchling_ReadAPINodeReturnTrueFalse
AppBenchling_Read_ObjectAPINodeReturnTrueFalse
AppBenchling_Warehouse_QueryTableNodeReturnTrueFalse
AppBenchling_Warehouse_SyncAPINodeReturnTrueFalse
AppBenchling_WriteTableNodeReturnTrueFalse
AppBenchling_Write_ObjectTable(s) and/or File(s)NodeReturnTrueFalse
AppCoda_WriteTableNodeReturnTrueFalse
AppELabNext_WriteTableNodeReturnTrueFalse
AppLoad_Parquet_to_TableAPIFileAnyFalseFalse
AppS3_EventEventFlowInputsTrueFalse
AppS3_ReadAPIFileAnyTrueFalse
AppS3_WriteFileAnyAPIFalseFalse
AppSciNote_APITableNodeReturnTrueFalse
AppSmartsheet_ReadAPINodeReturnTrueFalse
AppSnowflake_WriteTableNodeReturnFalseFalse
AppWebhook_EventEventNodeReturnTrueFalse
FileAVI_ReadFileAVINodeReturnTrueFalse
FileAVI_Read_MultiSet[FileAVI]NodeReturnTrueTrue
FileCSV_ReadFileCSVNodeReturnTrueFalse
FileCSV_Read_MultiSet[FileCSV]NodeReturnTrueTrue
FileCSV_WriteTable(s)NodeReturnTrueFalse
FileExcel_ReadFileExcelNodeReturnTrueFalse
FileExcel_Read_MultiSet[FileExcel]NodeReturnTrueTrue
FileExcel_WriteTable(s)NodeReturnTrueFalse
FileHDF5_ReadFileHDF5NodeReturnTrueFalse
FileImage_ReadFileImageNodeReturnTrueFalse
FileImage_Read_MultiSet[FileImage]NodeReturnTrueTrue
FileImage_WriteTable(s)NodeReturnTrueFalse
FileInput_FileFileAnyNodeReturnTrueFalse
FileInput_File_MultiSet[FileAny]NodeReturnTrueTrue
FilePDF_ReadFilePDFNodeReturnTrueFalse
FilePDF_Read_MultiSet[FilePDF]NodeReturnTrueTrue
FilePowerpoint_WriteTable(s)NodeReturnTrueFalse
FileXML_ReadFileXMLNodeReturnTrueFalse
FileZip_ReadFileZipNodeReturnTrueFalse
InstrumentInstron_Tensile_ReadFileIsTensNodeReturnTrueFalse
InstrumentLCMS_ReadFileNodeReturnTrueFalse
InstrumentLCMS_Read_MultiFileNodeReturnTrueTrue
InstrumentLC_ReadFileNodeReturnTrueFalse
InstrumentLC_Read_MultiFileNodeReturnTrueTrue
InstrumentProfilometer_ReadFileHDF5NodeReturnTrueFalse
InstrumentSynergy_ReadFileTxtNodeReturnTrueFalse
InstrumentSynergy_Read_MultiSet[FileTxt]NodeReturnTrueTrue
InstrumentWSP_ReadFileWSPNodeReturnTrueFalse
TagBenchling_TagTagBenchlingstringFalseFalse
TagInput_ParamstringFalseFalse

Node Categories

  • App: Integrates with third-party APIs for data processing; often requires key exchange between the third-party service and Ganymede.
  • Analysis: Performs data manipulations using Python or SQL.
  • Instrument: Handles data from laboratory instruments.
  • File: Conducts ETL operations on specified data types within the Ganymede cloud.
  • Tag: Defines parameters at Flow runtime.

Input and Output Types for Nodes

NodeReturn Object

Many Nodes return a NodeReturn object, which contain tables and files for storage in the Ganymede data lake.

To initialize a NodeReturn object, the following parameters can be passed:

  • param tables_to_upload: dict[str, pd.DataFrame] | None - Tables to be stored in Ganymede, keyed by name.
  • param files_to_upload: dict[str, bytes] | None - Files to be stored in Ganymede, keyed by filename.
  • param if_exists: str - String indicating whether to overwrite or append to existing tables in Ganymede data lake. Valid values are "replace", "append", or "fail"; defaults to "replace".
  • param tables_measurement_units: Optional[dict[str, pd.DataFrame]] - (If provided) Specifies the measurement units for columns; keys are table names, values are pandas DataFrames with "column_name" and "unit" as columns.
  • param file_location: str - Specifies the bucket location ("input" or "output"); required only if files_to_upload is not null, defaults to "output".
  • param wait_for_job: Whether to wait for the write operation to complete before continuing execution; defaults to False.
  • param tags: dict[str, list[dict[str, str]] | dict[str, str]] | None: Dictionary of files to tag, with keys as file names and values as a dictionary of keyword parameters for the add_file_tag function. Multiple tags can be added to a single file by passing a list of add_file_tag parameters in the dictionary.
NodeReturn Example

The contents of a NodeReturn object can be inspected in the notebook, where table headers and list of files are displayed. Below is an example of creating a NodeReturn object:

import pandas as pd

def execute():
message = "Message to store in file"
byte_message = bytes(message, "utf-8")

df = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})

# upload a table named 'my_table' and a file named 'my_file'
return NodeReturn(files_to_upload={"my_file.txt": message}, tables_to_upload={"my_table": df})

execute()

This code produces the following summary of the NodeReturn object:

Rendered NodeReturn object

Docstrings and source code can be viewed by typing ?NodeReturn and ??NodeReturn respectively in a cell in the editor notebook.

Example: NodeReturn Object with Tags

The following code demonstrates the use of the tags parameter in a NodeReturn object:

import pandas as pd

def execute():
message = "Message to store in file"
byte_message = bytes(message, "utf-8")

df = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})

# Tags are added to the file 'my_file.txt'
# Any parameters that can be passed into the add_file_tag function can be passed into the tags parameter
# of the NodeReturn object. For more information on the add_file_tag function, see the Tags page.
#
# Note that the input_file_path parameter within the add_file_tag function does not need to be specified
return NodeReturn(files_to_upload={"my_file.txt": message}, tables_to_upload={"my_table": df},
tags={"my_file.txt": [{"tag_type_id": "Experiment ID", "display_value": "EXP005"}]})

execute()

FlowInputs object

Nodes that trigger other Flows return a FlowInputs object, which specifies the inputs to the triggered Flow.

To initialize a FlowInputs object, use the following parameters from ganymede_sdk.io:

  • param files: list[FlowInputFile] | None - Files to pass to the triggered Flow.
  • param params: list[FlowInputParam] | None - Parameters to pass to triggered Flow.
  • param tags: list[Tag] | None - Tags to pass to the triggered Flow.

FlowInputFile is a dataclass used for passing files to a Node. Attributes include:

  • param node_name: str - Name of the Node within triggered Flow to pass file(s) into
  • param param_name: str - Node parameter in the triggered Flow Node that specifies the string pattern that the filename must match (e.g., "csv" for the CSV_Read Node)
  • param files: dict[str, bytes] - Files to pass into Node

FlowInputParam is a dataclass used to pass parameters into a Node. It has the following attributes:

  • param node_name: str - Name of the Node within triggered Flow to pass parameter(s) to.
  • param param_name: str - Node parameter in the triggered Flow Node that is used to specify the string pattern that the parameter must match (e.g., "param" for the Input_Param Node).
  • param param_value: str - Value to pass into Node.

Tag is a dataclass used to pass Benchling Tags into a node, and is used exclusively with the Benchling_Tag Node. It has the following attributes:

  • param node_name: str - Name of the node within triggered Flow to pass tag(s) into.
  • param display_tag: str - Value displayed in the dropdown in the Ganymede UI. For Benchling_Tag Nodes, this is the name of the tag displayed in the dropdown in Flow View / Flow Editor.
  • param run_tag: str - Underlying value of the tag. For Benchling_Tag Nodes, this is the Benchling ID associated with the value selected in the dropdown.

Other input/output types

Some other input and output types characteristic to Nodes are:

  • Table: Tabular data retrieved from or passed to the Ganymede data lake via ANSI SQL queries or as Pandas DataFrames.
  • API: Access through third-party APIs.
  • File-related inputs/outputs: Specific file types, including:
    • FileAVI: AVI file
    • FileCSV: CSV file
    • FileExcel: Excel file (xls, xlsx, ..)
    • FileImage: Image file (png, bmp, ..)
    • FileHDF5: HDF5 file
    • FileXML: XML file
    • FileZip: Zip file
    • FileAny: generic data file, which may be unstructured
  • TagBenchling: Benchling run tag
  • string: String parameter

Python sets, lists, and dictionaries are denoted as Set, List, and Dict, respectively.

Optional indicates that the input or output is optional.

User-editable Nodes

User-editable Nodes present an interface for modifying and testing code that is executed by the workflow management system. These Jupyter notebooks are split into the following sections:

  • Node Description: A brief description of the Node's functionality.
  • Node Input Data: For Nodes that retrieve tabular data, this section specifies the SQL query used to fetch data for processing.
  • User-Defined Function: The execute function in this section processes the data. The function is called during the Flow execution.
info

The execute function may call classes and functions found within the User-Defined Function cell.

  • Testing Section: Cells in this section are for testing modifications to the SQL query and user-defined Python function. After making edits, save changes by clicking the button in the toolbar or selecting "Save Commit and Deploy" from the Kernel menu.

List of Available Nodes

CategoryNameBrief Description
AnalysisBranch_PythonProcess data with Python and conditionally execute downstream nodes
AnalysisPythonProcess data with python
AnalysisTransform_SQLSQL analysis Function
AnalysisTrigger_PythonProcess data with Python and trigger subsequent flow
AppAPINodeGeneric API Access Node
AppAirtable_ExportExport data from Ganymede data lake to Airtable
AppAirtable_ImportImport data from Airtable into Ganymede data lake
AppAzure_QueryQuery data from Azure SQL Server
AppAzure_ReadRead data from Azure Blob Storage
AppAzure_Read_MultiRead all data from Azure Blob Storage
AppAzure_WriteWrite data to Azure Blob storage
AppBenchling_ApiRead Benchling data into data lake
AppBenchling_EventCapture events from Benchling for triggering flows or saving data
AppBenchling_ReadRead Benchling data into data lake using run tag
AppBenchling_Read_ObjectRead Benchling data into data lake using object ID
AppBenchling_Warehouse_QueryQuery Benchling Warehouse from Ganymede
AppBenchling_Warehouse_SyncSync Benchling Warehouse to Ganymede
AppBenchling_WriteWrite to Benchling
AppBenchling_Write_ObjectWrite object to Benchling
AppCoda_WriteWrite Coda tables
AppELabNext_WriteCreate and write eLabNext entry
AppLoad_Parquet_to_TableCreate data lake table from parquet files
AppS3_EventCapture events from AWS S3 for triggering flows
AppS3_ReadIngest data into Ganymede data storage from AWS S3 storage
AppS3_WriteWrite data to an S3 bucket
AppSciNote_APICreate and write SciNote entry
AppSmartsheet_ReadRead sheet from Smartsheet
AppSnowflake_WriteSync tables in Ganymede data lake to Snowflake
AppWebhook_EventCapture events from a webhook for triggering flows
FileAVI_ReadRead in contents of an AVI file to a table
FileAVI_Read_MultiRead in contents of multiple avi files to a table
FileCSV_ReadRead in contents of a CSV file
FileCSV_Read_MultiRead in contents of multiple CSV files
FileCSV_WriteWrite table to CSV file
FileExcel_ReadRead Excel spreadsheet
FileExcel_Read_MultiRead Excel spreadsheets
FileExcel_WriteWrite Excel spreadsheet
FileHDF5_ReadRead HDF5 data
FileImage_ReadProcess image data; store processed images to data store
FileImage_Read_MultiProcess image data for multiple images; store processed images to data store
FileImage_WriteProcess tabular data; write an image to data lake
FileInput_FileRead data file and process in Ganymede
FileInput_File_MultiRead data files and process in Ganymede
FilePDF_ReadRead in contents of an PDF file to a table
FilePDF_Read_MultiRead in contents of multiple pdf files to a table
FilePowerpoint_WriteProcess tabular data; write a powerpoint presentation to data lake
FileXML_ReadRead XML file into data lake
FileZip_ReadExtract Zip file
InstrumentInstron_Tensile_ReadLoad .is_tens file to data lake
InstrumentLCMS_ReadRead and process LCMS file in mzML format
InstrumentLCMS_Read_MultiRead and process multiple LCMS files
InstrumentLC_ReadRead and process an Agilent Chemstation / MassStation HPLC file
InstrumentLC_Read_MultiRead and process multiple Agilent Chemstation / MassStation HPLC files
InstrumentProfilometer_ReadRead Mx Profiler data file
InstrumentSynergy_ReadLoad Synergy text file to data lake
InstrumentSynergy_Read_MultiLoad multiple Synergy texts file to data lake
InstrumentWSP_ReadRead FlowJo WSP file into data lake
TagBenchling_TagRead benchling tag
TagInput_ParamInput parameter into Flow