Node Overview
Nodes provide template structure for performing data extraction, processing, and API communication. The diagram below shows how a Node performs a base function, but exposes an interface for user-defined code to enable tailoring.
In Jupyter notebooks, Nodes that are editable by users are marked with light blue backgrounds for user-defined SQL and Python cells. These cells can be committed to a repository and deployed to a workflow orchestrator by clicking the button in the toolbar or by pressing Cmd+S (macOS) or Ctrl+S (Windows/Linux).
Key Characteristics of Nodes
Understanding how Nodes interact with files, APIs, and their input/output types is essential for effective usage. Nodes may also have a user-editable component, which is indicated in the table below.
List of Available Nodes and Key Characteristics
The table below provides a list of available Nodes, along with details on whether they include a user-editable component and if they can handle multiple inputs.
- Is Editable: Indicates that the Node includes a user-editable component.
- Is Multi: Indicates if the Node can accept multiple inputs.
Category | Name | Input Types | Output Types | Is Editable | Is Multi |
---|---|---|---|---|---|
Analysis | Branch_Python | Table(s) and/or File(s) | NodeReturn | True | False |
Analysis | Python | Table(s) and/or File(s) | NodeReturn | True | False |
Analysis | Transform_SQL | Table(s) | Table | True | False |
Analysis | Trigger_Python | Table(s) and/or File(s) | FlowInputs | True | False |
App | APINode | API | NodeReturn | True | False |
App | AirtableExport | Table | API | True | False |
App | AirtableImport | API | NodeReturn | False | False |
App | Azure_Query | Table | NodeReturn | True | False |
App | Azure_Read | API | FileAny | False | False |
App | Azure_Read_Multi | API | FileAny | False | True |
App | Azure_Write | FileAny | API | False | False |
App | Benchling_Api | API | NodeReturn | True | False |
App | Benchling_Event | Event | FlowInputs or NodeReturn | True | False |
App | Benchling_Read | API | NodeReturn | True | False |
App | Benchling_Read_Object | API | NodeReturn | True | False |
App | Benchling_Warehouse_Query | Table | NodeReturn | True | False |
App | Benchling_Warehouse_Sync | API | NodeReturn | True | False |
App | Benchling_Write | Table | NodeReturn | True | False |
App | Benchling_Write_Object | Table(s) and/or File(s) | NodeReturn | True | False |
App | Coda_Write | Table | NodeReturn | True | False |
App | ELabNext_Write | Table | NodeReturn | True | False |
App | Load_Parquet_to_Table | API | FileAny | False | False |
App | S3_Event | Event | FlowInputs | True | False |
App | S3_Read | API | FileAny | True | False |
App | S3_Write | FileAny | API | False | False |
App | SciNote_API | Table | NodeReturn | True | False |
App | Smartsheet_Read | API | NodeReturn | True | False |
App | Snowflake_Write | Table | NodeReturn | False | False |
File | AVI_Read | FileAVI | NodeReturn | True | False |
File | AVI_Read_Multi | Set[FileAVI] | NodeReturn | True | True |
File | CSV_Read | FileCSV | NodeReturn | True | False |
File | CSV_Read_Multi | Set[FileCSV] | NodeReturn | True | True |
File | CSV_Write | Table(s) | NodeReturn | True | False |
File | Excel_Read | FileExcel | NodeReturn | True | False |
File | Excel_Read_Multi | Set[FileExcel] | NodeReturn | True | True |
File | Excel_Write | Table(s) | NodeReturn | True | False |
File | FCS_Extract_Load | FileFCS | NodeReturn | True | False |
File | HDF5_Read | FileHDF5 | NodeReturn | True | False |
File | Image_Read | FileImage | NodeReturn | True | False |
File | Image_Read_Multi | Set[FileImage] | NodeReturn | True | True |
File | Image_Write | Table(s) | NodeReturn | True | False |
File | Input_File | FileAny | NodeReturn | True | False |
File | Input_File_Multi | Set[FileAny] | NodeReturn | True | True |
File | PDF_Read | FilePDF | NodeReturn | True | False |
File | PDF_Read_Multi | Set[FilePDF] | NodeReturn | True | True |
File | Powerpoint_Write | Table(s) | NodeReturn | True | False |
File | XML_Read | FileXML | NodeReturn | True | False |
File | Zip_Read | FileZip | NodeReturn | True | False |
Instrument | Instron_Tensile_Read | FileIsTens | NodeReturn | True | False |
Instrument | LCMS_Read | File | NodeReturn | True | False |
Instrument | LCMS_Read_Multi | File | NodeReturn | True | True |
Instrument | LC_Read | File | NodeReturn | True | False |
Instrument | LC_Read_Multi | File | NodeReturn | True | True |
Instrument | Profilometer_Read | FileHDF5 | NodeReturn | True | False |
Instrument | Synergy_Read | FileTxt | NodeReturn | True | False |
Instrument | Synergy_Read_Multi | Set[FileTxt] | NodeReturn | True | True |
Instrument | WSP_Read | FileWSP | NodeReturn | True | False |
Tag | Benchling_Tag | TagBenchling | string | False | False |
Tag | Input_Param | string | False | False |
Node Categories
- App: Integrates with third-party APIs for data processing; often requires key exchange between the third-party service and Ganymede.
- Analysis: Performs data manipulations using Python or SQL.
- Instrument: Handles data from laboratory instruments.
- File: Conducts ETL operations on specified data types within the Ganymede cloud.
- Tag: Defines parameters at Flow runtime.
Input and Output Types for Nodes
NodeReturn Object
Many Nodes return a NodeReturn
object, which contain tables and files for storage in the Ganymede data lake.
To initialize a NodeReturn
object, the following parameters can be passed:
- param tables_to_upload: dict[str, pd.DataFrame] | None - Tables to be stored in Ganymede, keyed by name.
- param files_to_upload: dict[str, bytes] | None - Files to be stored in Ganymede, keyed by filename.
- param if_exists: str - String indicating whether to overwrite or append to existing tables in Ganymede data lake. Valid values are "replace", "append", or "fail"; defaults to "replace".
- param tables_measurement_units: Optional[dict[str, pd.DataFrame]] - (If provided) Specifies the measurement units for columns; keys are table names, values are pandas DataFrames with "column_name" and "unit" as columns.
- param file_location: str - Specifies the bucket location ("input" or "output"); required only if files_to_upload is not null, defaults to "output".
- param wait_for_job: Whether to wait for the write operation to complete before continuing execution; defaults to False.
- param tags: dict[str, list[dict[str, str]] | dict[str, str]] | None: Dictionary of files to tag, with keys as file names and values as a dictionary of keyword parameters for the add_file_tag function. Multiple tags can be added to a single file by passing a list of add_file_tag parameters in the dictionary.
NodeReturn Example
The contents of a NodeReturn object can be inspected in the notebook, where table headers and list of files are displayed. Below is an example of creating a NodeReturn
object:
import pandas as pd
def execute():
message = "Message to store in file"
byte_message = bytes(message, "utf-8")
df = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
# upload a table named 'my_table' and a file named 'my_file'
return NodeReturn(files_to_upload={"my_file.txt": message}, tables_to_upload={"my_table": df})
execute()
This code produces the following summary of the NodeReturn
object:
Docstrings and source code can be viewed by typing ?NodeReturn
and ??NodeReturn
respectively in a cell in the editor notebook.
Example: NodeReturn Object with Tags
The following code demonstrates the use of the tags
parameter in a NodeReturn
object:
import pandas as pd
def execute():
message = "Message to store in file"
byte_message = bytes(message, "utf-8")
df = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
# Tags are added to the file 'my_file.txt'
# Any parameters that can be passed into the add_file_tag function can be passed into the tags parameter
# of the NodeReturn object. For more information on the add_file_tag function, see the Tags page.
#
# Note that the input_file_path parameter within the add_file_tag function does not need to be specified
return NodeReturn(files_to_upload={"my_file.txt": message}, tables_to_upload={"my_table": df},
tags={"my_file.txt": [{"tag_type_id": "Experiment ID", "display_value": "EXP005"}]})
execute()
FlowInputs object
Nodes that trigger other Flows return a FlowInputs
object, which specifies the inputs to the triggered Flow.
To initialize a FlowInputs
object, use the following parameters from ganymede_sdk.io
:
- param files: list[FlowInputFile] | None - Files to pass to the triggered Flow.
- param params: list[FlowInputParam] | None - Parameters to pass to triggered Flow.
- param tags: list[Tag] | None - Tags to pass to the triggered Flow.
FlowInputFile
is a dataclass used for passing files to a Node. Attributes include:
- param node_name: str - Name of the Node within triggered Flow to pass file(s) into
- param param_name: str - Node parameter in the triggered Flow Node that specifies the string pattern that the filename must match (e.g., "csv" for the CSV_Read Node)
- param files: dict[str, bytes] - Files to pass into Node
FlowInputParam
is a dataclass used to pass parameters into a Node. It has the following attributes:
- param node_name: str - Name of the Node within triggered Flow to pass parameter(s) to.
- param param_name: str - Node parameter in the triggered Flow Node that is used to specify the string pattern that the parameter must match (e.g., "param" for the Input_Param Node).
- param param_value: str - Value to pass into Node.
Tag
is a dataclass used to pass Benchling Tags into a node, and is used exclusively with the Benchling_Tag Node. It has the following attributes:
- param node_name: str - Name of the node within triggered Flow to pass tag(s) into.
- param display_tag: str - Value displayed in the dropdown in the Ganymede UI. For Benchling_Tag Nodes, this is the name of the tag displayed in the dropdown in Flow View / Flow Editor.
- param run_tag: str - Underlying value of the tag. For Benchling_Tag Nodes, this is the Benchling ID associated with the value selected in the dropdown.
Other input/output types
Some other input and output types characteristic to Nodes are:
- Table: Tabular data retrieved from or passed to the Ganymede data lake via ANSI SQL queries or as Pandas DataFrames.
- API: Access through third-party APIs.
- File-related inputs/outputs: Specific file types, including:
- FileAVI: AVI file
- FileCSV: CSV file
- FileExcel: Excel file (xls, xlsx, ..)
- FileImage: Image file (png, bmp, ..)
- FileHDF5: HDF5 file
- FileXML: XML file
- FileZip: Zip file
- FileAny: generic data file, which may be unstructured
- TagBenchling: Benchling run tag
- string: String parameter
Python sets, lists, and dictionaries are denoted as Set, List, and Dict, respectively.
Optional indicates that the input or output is optional.
User-editable Nodes
User-editable Nodes present an interface for modifying and testing code that is executed by the workflow management system. These Jupyter notebooks are split into the following sections:
- Node Description: A brief description of the Node's functionality.
- Node Input Data: For Nodes that retrieve tabular data, this section specifies the SQL query used to fetch data for processing.
- User-Defined Function: The execute function in this section processes the data. The function is called during the Flow execution.
The execute function may call classes and functions found within the User-Defined Function cell.
- Testing Section: Cells in this section are for testing modifications to the SQL query and user-defined Python function. After making edits, save changes by clicking the button in the toolbar or selecting "Save Commit and Deploy" from the Kernel menu.
List of Available Nodes
Category | Name | Brief Description |
---|---|---|
Analysis | Branch_Python | Process data with Python and conditionally execute downstream Nodes |
Analysis | Python | Process data with python |
Analysis | Transform_SQL | SQL analysis Function |
Analysis | Trigger_Python | Process data with Python and trigger subsequent flow |
App | APINode | Generic API Access Node |
App | AirtableExport | Export data from Ganymede data lake to Airtable |
App | AirtableImport | Import data from Airtable into Ganymede data lake |
App | Azure_Query | Query data from Azure SQL Server |
App | Azure_Read | Read data from Azure Blob Storage |
App | Azure_Read_Multi | Read all data from Azure Blob Storage |
App | Azure_Write | Write data to Azure Blob storage |
App | Benchling_Api | Read Benchling data into data lake |
App | Benchling_Event | Capture events from Benchling for triggering flows or saving data |
App | Benchling_Read | Read Benchling data into data lake using run tag |
App | Benchling_Read_Object | Read Benchling data into data lake using object ID |
App | Benchling_Warehouse_Query | Query Benchling Warehouse from Ganymede |
App | Benchling_Warehouse_Sync | Sync Benchling Warehouse to Ganymede |
App | Benchling_Write | Write to Benchling |
App | Benchling_Write_Object | Write object to Benchling |
App | Coda_Write | Write Coda tables |
App | ELabNext_Write | Create and write eLabNext entry |
App | Load_Parquet_to_Table | Create data lake table from parquet files |
App | S3_Event | Capture events from AWS S3 for triggering flows |
App | S3_Read | Ingest data into Ganymede data storage from AWS S3 storage |
App | S3_Write | Write data to an S3 bucket |
App | SciNote_API | Create and write SciNote entry |
App | Smartsheet_Read | Read sheet from Smartsheet |
App | Snowflake_Write | Sync tables in Ganymede data lake to Snowflake |
File | AVI_Read | Read in contents of an AVI file to a table |
File | AVI_Read_Multi | Read in contents of multiple avi files to a table |
File | CSV_Read | Read in contents of a CSV file |
File | CSV_Read_Multi | Read in contents of multiple CSV files |
File | CSV_Write | Write table to CSV file |
File | Excel_Read | Read Excel spreadsheet |
File | Excel_Read_Multi | Read Excel spreadsheets |
File | Excel_Write | Write Excel spreadsheet |
File | FCS_Extract_Load | Load FCS file to data lake |
File | HDF5_Read | Read HDF5 data |
File | Image_Read | Process image data; store processed images to data store |
File | Image_Read_Multi | Process image data for multiple images; store processed images to data store |
File | Image_Write | Process tabular data; write an image to data lake |
File | Input_File | Read data file and process in Ganymede |
File | Input_File_Multi | Read data files and process in Ganymede |
File | PDF_Read | Read in contents of an PDF file to a table |
File | PDF_Read_Multi | Read in contents of multiple pdf files to a table |
File | Powerpoint_Write | Process tabular data; write a powerpoint presentation to data lake |
File | XML_Read | Read XML file into data lake |
File | Zip_Read | Extract Zip file |
Instrument | Instron_Tensile_Read | Load .is_tens file to data lake |
Instrument | LCMS_Read | Read and process LCMS file in mzML format |
Instrument | LCMS_Read_Multi | Read and process multiple LCMS files |
Instrument | LC_Read | Read and process an Agilent Chemstation / MassStation HPLC file |
Instrument | LC_Read_Multi | Read and process multiple Agilent Chemstation / MassStation HPLC files |
Instrument | Profilometer_Read | Read Mx Profiler data file |
Instrument | Synergy_Read | Load Synergy text file to data lake |
Instrument | Synergy_Read_Multi | Load multiple Synergy texts file to data lake |
Instrument | WSP_Read | Read FlowJo WSP file into data lake |
Tag | Benchling_Tag | Read benchling tag |
Tag | Input_Param | Input parameter into Flow |