Node Overview
Nodes provide template structure for performing data extraction, processing, and API communication. The diagram below shows how a node performs a base function, but exposes an interface for user-defined code to enable tailoring.
In Jupyter notebooks associated backing editable nodes, user-defined SQL and user-defined Python cells share a light blue background and are committed to a repo + deployed to a workflow orchestrator upon clicking on the button in the toolbar, or by pressing Cmd+Shift+S
or Ctrl+Shift+S
(Windows/Linux).
Key Node Characteristics
A useful way to interact with nodes is to consider how nodes interact with files and APIs, their input and output types, and whether or not they contain a user-editable component. The table below shows the full list of available nodes, along with whether there is a user-editable component associated with the node.
Available Nodes with Key Characteristics
The table below lists available nodes.
- Is Editable indicates that the node has a user-editable components
- Is Multi indicates that the node can accept multiple inputs.
Category | Name | Input Types | Output Types | Is Editable | Is Multi |
---|---|---|---|---|---|
Analysis | Branch_Python | Table(s) and/or File(s) | NodeReturn | True | False |
Analysis | Python | Table(s) and/or File(s) | NodeReturn | True | False |
Analysis | RunContainer | FileAny | FileAny | True | False |
Analysis | Transform_SQL | Table(s) | Table | True | False |
Analysis | Transform_py | Table(s) | NodeReturn | True | False |
Analysis | Trigger_Python | Table(s) and/or File(s) | FlowInputs | True | False |
App | APINode | API | NodeReturn | True | False |
App | AirtableExport | Table | API | True | False |
App | AirtableImport | API | NodeReturn | False | False |
App | Azure_Query | Table | NodeReturn | True | False |
App | Azure_Read | API | FileAny | False | False |
App | Azure_Read_Multi | API | FileAny | False | True |
App | Azure_Write | FileAny | API | False | False |
App | Benchling_Api | API | NodeReturn | True | False |
App | Benchling_Event | Event | FlowInputs | True | False |
App | Benchling_Read | API | NodeReturn | True | False |
App | Benchling_Read_Object | API | NodeReturn | True | False |
App | Benchling_Warehouse_Query | Table | NodeReturn | True | False |
App | Benchling_Warehouse_Sync | API | NodeReturn | True | False |
App | Benchling_Write | Table | NodeReturn | True | False |
App | Benchling_Write_Object | Table(s) and/or File(s) | NodeReturn | True | False |
App | Coda_Write | Table | NodeReturn | True | False |
App | ELabNext_Write | Table | NodeReturn | True | False |
App | Load_Parquet_to_Table | API | FileAny | False | False |
App | S3_Event | Event | FlowInputs | True | False |
App | S3_Read | API | FileAny | False | False |
App | S3_Write | FileAny | API | False | False |
App | SciNote_API | Table | NodeReturn | True | False |
App | Smartsheet_Read | API | NodeReturn | True | False |
App | Snowflake_Write | Table | NodeReturn | False | False |
File | AVI_Read | FileAVI | NodeReturn | True | False |
File | AVI_Read_Multi | Set[FileAVI] | NodeReturn | True | True |
File | CSV_Read | FileCSV | NodeReturn | True | False |
File | CSV_Read_Multi | Set[FileCSV] | NodeReturn | True | True |
File | CSV_Write | Table(s) | NodeReturn | True | False |
File | Excel_Read | FileExcel | NodeReturn | True | False |
File | Excel_Read_Multi | Set[FileExcel] | NodeReturn | True | True |
File | Excel_Write | Table(s) | NodeReturn | True | False |
File | FCS_Extract_Load | FileFCS | NodeReturn | True | False |
File | HDF5_Read | FileHDF5 | NodeReturn | True | False |
File | Image_Read | FileImage | NodeReturn | True | False |
File | Image_Read_Multi | Set[FileImage] | NodeReturn | True | True |
File | Image_Write | Table(s) | NodeReturn | True | False |
File | Input_File | FileAny | NodeReturn | True | False |
File | Input_File_Multi | Set[FileAny] | NodeReturn | True | True |
File | PDF_Read | FilePDF | NodeReturn | True | False |
File | PDF_Read_Multi | Set[FilePDF] | NodeReturn | True | True |
File | Powerpoint_Write | Table(s) | NodeReturn | True | False |
File | XML_Read | FileXML | NodeReturn | True | False |
File | Zip_Read | FileZip | NodeReturn | True | False |
Instrument | Instron_Tensile_Read | FileIsTens | NodeReturn | True | False |
Instrument | LCMS_Read | File | NodeReturn | True | False |
Instrument | LCMS_Read_Multi | File | NodeReturn | True | True |
Instrument | LC_Read | File | NodeReturn | True | False |
Instrument | LC_Read_Multi | File | NodeReturn | True | True |
Instrument | Profilometer_Read | FileHDF5 | NodeReturn | True | False |
Instrument | Synergy_Read | FileTxt | NodeReturn | True | False |
Instrument | Synergy_Read_Multi | Set[FileTxt] | NodeReturn | True | True |
Instrument | WSP_Read | FileWSP | NodeReturn | True | False |
Tag | Benchling_Tag | TagBenchling | string | False | False |
Tag | Input_Param | string | False | False |
Node Categories
- App: Accesses third-party APIs for processing; in many cases, key exchange between third-party and Ganymede are necessary for functionality
- Analysis: Performs Python / SQL manipulations
- Instrument: Lab instrument-specific functions
- File: For ETL operations on data of specified type into Ganymede cloud
- Tag: For specifying parameters at flow runtime
Input and Output Types for Nodes
NodeReturn Object
Many nodes return a NodeReturn
object, which contain tables and files to store in the Ganymede data lake.
Initializing a NodeReturn object involves passing the following parameters:
- param tables_to_upload: Optional[dict[str, pd.DataFrame]] - Tables keyed by name to store in Ganymede
- param files_to_upload: Optional[dict[str, bytes]] - Files keyed by filename to store in Ganymede
- param if_exists: str - String indicating whether to overwrite or append to existing tables in Ganymede data lake. Valid values are "replace", "append", or "fail"; defaults to "replace".
- param tables_measurement_units: Optional[dict[str, pd.DataFrame]] - if provided, the measurement units for columns; keys are table names, values are pandas DataFrames with "column_name" and "unit" as columns
- param file_location: Optional[str] - Bucket location to output to, either "input" or "output; only need to specify if files_to_upload is not null, by default output
- param wait_for_job: whether to wait for write to Ganymede data lake before continuing execution to subsequent node, by default False
- param tags: Optional[dict[list[dict] | dict]]: Dictionary of files to tag; keys are file names, values are dictionaries of keyword parameters to pass into the add_file_tag function.
NodeReturn Example
The contents of a NodeReturn object can be viewed in the notebook; transposed table heads and list of files are displayed when the object is observed. For example, running the following code in an editor notebook
import pandas as pd
def execute():
message = "Message to store in file"
byte_message = bytes(message, "utf-8")
df = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
# upload a table named 'my_table' and a file named 'my_file'
return NodeReturn(files_to_upload={"my_file.txt": message}, tables_to_upload={"my_table": df})
execute()
returns the following summary of the NodeReturn object:
Docstrings and source code can be viewed by typing ?NodeReturn
and ??NodeReturn
respectively in a cell in the editor notebook.
NodeReturn Example with Tags
The following code demonstrates how to use the tags
parameter in a NodeReturn
object:
import pandas as pd
def execute():
message = "Message to store in file"
byte_message = bytes(message, "utf-8")
df = pd.DataFrame({"col1": [1, 2, 3], "col2": [4, 5, 6]})
# Tags are added to the file 'my_file.txt'
# Any parameters that can be passed into the add_file_tag function can be passed into the tags parameter
# of the NodeReturn object. For more information on the add_file_tag function, see the Tags page.
#
# Note that the input_file_path parameter within the add_file_tag function does not need to be specified
return NodeReturn(files_to_upload={"my_file.txt": message}, tables_to_upload={"my_table": df},
tags={"my_file.txt": [{"tag_type_id": "Experiment ID", "display_value": "EXP005"}]})
execute()
FlowInputs object
Nodes that trigger other flows return a FlowInputs
object, which specifies the inputs to the triggered flow.
Initializing a FlowInputs
object involves passing the following parameters, which are found in ganymede_sdk.io:
- param files: Optional[List[FlowInputFile]] - Name of flow to trigger
- param params: Optional[List[FlowInputParam]] - Inputs to pass to triggered flow
- param tags: Optional[List[Tag]] - Tags to pass to triggered flow
FlowInputFile
is a dataclass used to pass file(s) into a node. It has the following attributes:
- param node_name: str - Name of node within triggered flow to pass file(s) into
- param param_name: str - Node parameter in the triggered flow node that is used to specify the string pattern that the filename must match (e.g. - "csv" for the CSV_Read node)
- param files: Dict[str, bytes] - Files to pass into node
FlowInputParam
is a dataclass used to pass parameters into a node. It has the following attributes:
- param node_name: str - Name of node within triggered flow to pass parameter(s) into
- param param_name: str - Node parameter in the triggered flow node that is used to specify the string pattern that the parameter must match ("param" for the Input_Param node)
- param param_value: str - Value to pass into node
Tag
is a dataclass used to pass tags into a node. It has the following attributes:
- param node_name: str - Name of node within triggered flow to pass tag(s) into
- param display_tag: str - Value displayed in the dropdown in Ganymede UI. For Benchling_Tag nodes, this is the name of the tag displayed in the dropdown in Flow View / Flow Editor.
- param run_tag: str - Underlying value of the tag. For Benchling_Tag nodes, this is the Benchling ID associated with the value selected in the dropdown.
Other input/output types
Some other input and output types characteristic to nodes are:
- Table: Tabular data retrieved from or passed to tenant-specific Ganymede data lake. Tables are retrieved from Ganymede data lake via ANSI SQL queries, and are passed to Ganymede data lake as pandas DataFrames
- API: access via third-party API
- File-related inputs/outputs: File of specified type.
- FileAVI: AVI file
- FileCSV: CSV file
- FileExcel: Excel file (xls, xlsx, ..)
- FileImage: Image file (png, bmp, ..)
- FileHDF5: HDF5 file
- FileXML: XML file
- FileZip: Zip file
- FileAny: generic data file, which may be unstructured
- TagBenchling: Benchling run tag
- string: String parameter
Set, List, and Dict correspond to Python sets, lists, and dictionaries respectively.
Optional indicates that the input or output is optional.
User-editable Nodes
User-editable nodes present an interface for modifying and testing code that is executed by the workflow management system. These Jupyter notebooks are split into the following sections:
- Node Description: A short blurb about the node that the user-editable function corresponds to
- Node Input Data: For nodes that retrieve tabular data from the data lake as input, the query string in this cell specifies the query (-ies) that are executed and presented to the user-defined function for processing.
- User-Defined Function: The execute function within this cell processes data. The workflow management system calls the execute function within this cell during flow execution.
The execute function may call classes and functions found within the User-Defined Function cell.
- Testing Section: The cells in this section can be used for testing modifications to the SQL query and user-defined python function. This enables rapid iteration on user-defined code; after necessary edits are made, changes can be saved in by clicking on the button in the toolbar or selecting Save Commit and Deploy from the Kernel menu.
List of Available Nodes
Category | Name | Brief Description |
---|---|---|
Analysis | Branch_Python | Process data with Python and conditionally execute downstream nodes |
Analysis | Python | Process data with python |
Analysis | RunContainer | Run container node |
Analysis | Transform_SQL | SQL analysis Function |
Analysis | Transform_py | Manipulate data with python |
Analysis | Trigger_Python | Process data with Python and trigger subsequent flow |
App | APINode | Generic API Access Node |
App | AirtableExport | Export data from Ganymede data lake to Airtable |
App | AirtableImport | Import data from Airtable into Ganymede data lake |
App | Azure_Query | Query data from Azure SQL Server |
App | Azure_Read | Read data from Azure Blob Storage |
App | Azure_Read_Multi | Read all data from Azure Blob Storage |
App | Azure_Write | Write data to Azure Blob storage |
App | Benchling_Api | Read Benchling data into data lake |
App | Benchling_Event | Capture events from Benchling for triggering flows |
App | Benchling_Read | Read Benchling data into data lake using run tag |
App | Benchling_Read_Object | Read Benchling data into data lake using object ID |
App | Benchling_Warehouse_Query | Query Benchling Warehouse from Ganymede |
App | Benchling_Warehouse_Sync | Sync Benchling Warehouse to Ganymede |
App | Benchling_Write | Write to Benchling |
App | Benchling_Write_Object | Write object to Benchling |
App | Coda_Write | Write Coda tables |
App | ELabNext_Write | Create and write eLabNext entry |
App | Load_Parquet_to_Table | Create datalake table from parquet files |
App | S3_Event | Capture events from AWS S3 for triggering flows |
App | S3_Read | Ingest data into Ganymede data storage from AWS S3 storage |
App | S3_Write | Write data to an S3 bucket |
App | SciNote_API | Create and write SciNote entry |
App | Smartsheet_Read | Read sheet from Smartsheet |
App | Snowflake_Write | Sync tables in Ganymede data lake to Snowflake |
File | AVI_Read | Read in contents of an AVI file to a table |
File | AVI_Read_Multi | Read in contents of multiple avi files to a table |
File | CSV_Read | Read in contents of a CSV file |
File | CSV_Read_Multi | Read in contents of multiple CSV files |
File | CSV_Write | Write table to CSV file |
File | Excel_Read | Read Excel spreadsheet |
File | Excel_Read_Multi | Read Excel spreadsheets |
File | Excel_Write | Write Excel spreadsheet |
File | FCS_Extract_Load | Load FCS file to data lake |
File | HDF5_Read | Read HDF5 data |
File | Image_Read | Process image data; store processed images to data store |
File | Image_Read_Multi | Process image data for multiple images; store processed images to data store |
File | Image_Write | Process tabular data; write an image to data lake |
File | Input_File | Read data file and process in Ganymede |
File | Input_File_Multi | Read data files and process in Ganymede |
File | PDF_Read | Read in contents of an PDF file to a table |
File | PDF_Read_Multi | Read in contents of multiple pdf files to a table |
File | Powerpoint_Write | Process tabular data; write a powerpoint presentation to data lake |
File | XML_Read | Read XML file into data lake |
File | Zip_Read | Extract Zip file |
Instrument | Instron_Tensile_Read | Load .is_tens file to data lake |
Instrument | LCMS_Read | Read and process LCMS file in mzML format |
Instrument | LCMS_Read_Multi | Read and process multiple LCMS files |
Instrument | LC_Read | Read and process an Agilent Chemstation / MassStation HPLC file |
Instrument | LC_Read_Multi | Read and process multiple Agilent Chemstation / MassStation HPLC files |
Instrument | Profilometer_Read | Read Mx Profiler data file |
Instrument | Synergy_Read | Load Synergy text file to data lake |
Instrument | Synergy_Read_Multi | Load multiple Synergy texts file to data lake |
Instrument | WSP_Read | Read FlowJo WSP file into data lake |
Tag | Benchling_Tag | Read benchling tag |
Tag | Input_Param | Input parameter into Flow |