Skip to main content

Agents for File Capture

Ganymede Agents are a class of programs that can run directly on an instrument computer in the lab. They are a combination of user-defined code and Ganymede configuration, which allows for maximum flexibility in capturing files and interacting with the Ganymede ecosystem. Agents can

  • filter and designate how to process data
  • upload files into Ganymede Cloud
  • initiate data processing pipelines, which can be based
    • upon specific files being written to the local machine by the instrument
    • upon specific files being written to a cloud storage bucket
    • on a scheduled cadence

Users specify which of the above actions to take by configuring the Agent in the Ganymede app. Doing so creates a Linux binary and Windows executable corresponding to the specified configuration, which users can then install on instrument PCs. Once created, users can optionally configure a user-defined Python script associated with the Agent.

Building an Agent Overview

To create a new Agent, click the

New Agent
button on the Agents tab of the Connections page.

Agent Front page

 

The left side of the configuration panel is used to specify the Agent's configuration; Agent parameters relevant to selected configuration are displayed. The right side of the configuration panel shows the default Python script associated with the Agent, which can be modified after Agent installation.

New Agent configuration

 

After filling out this form, click

Create
to start building Windows and Linux executables. When the build is complete, the Agent (i.e. - the configured executables) can be downloaded from the Ganymede application by selecting the corresponding Agent from the Connections tab.

Building an executable typically takes around 10 minutes to complete. Once built, the Agent can be downloaded onto the instrument PC from the Connections tab of the Ganymede web app and installed.

Configuring Agents

All Agents have two required input parameters, Name and Configuration. Once the Configuration is selected, Input Parameters specific to the Configuration may appear.

The Name input is used to specify the display name for the Agent.

The Configuration input specifies the action performed by the Agent, chosen from the following options:

note

For Cloud Watcher agents that download files from Ganymede Cloud, you can specify that they watch for flow inputs (instead of the flow outputs) by adding the following to the Additional Params input box:

-v "input_bucket=true" 

Three additional input parameters are available for all configurations:

Configuration Options

Watch for files locally then run flow

Input Parameters

  • Flow Name: Flow to run upon observing new files matching specified pattern
  • Check Period (seconds): Frequency with which Agent will poll local directory for new files
  • If a discovered filename exists in storage: Whether to use the file in storage or the observed file in the local directory
  • File pattern to parameter mapping: For the selected Flow, the glob pattern to associate with the input parameter(s)
    • This option is only available on create. Further configuration updates will occur inside the notebook, not inside the agent configuration
  • File Tags: Tags to associate with the files uploaded to Ganymede Cloud
  • Image: Image to associate with agent. A common use case is to associate an image of the instrument that the agent is running on.
  • Auto deploy code and configuration changes to Live Connections: If checked, updates to agent code will be reflected on currently installed agents.

The corresponding user-defined code will be generated to map those file patterns to parameters. There are 2 functions that could require modification in the user-defined code during configuration:

  • get_param_mapping: This function is called whenever a file is added or modified in the watch directory. Modify this function to specify the files you want to look for prior to triggering a flow.
  • execute: Called when all glob patterns specified by get_param_mapping have been matched, this object returned by this function specifies the inputs to the flow that is executed when all files are observed.
  • fp: This function returns a function that performs pattern matching against a file path. Use this function as a template for matching files.

For example, if the instrument outputs a CSV file called "{id}_lc_output<YYYYMMDD>.csv" which is to be ingested by a CSV_Read node called "Ingest_Data", an appropriate configuration would be "{id}_lc_output*.csv" for the input box associated with the "Ingest_Data.csv" node. The corresponding user-defined code would be:

from ganymede_sdk.agent.models import TriggerFlowParams, FileWatcherResult
import re
from typing import Dict, Callable
import glob
import os


def fp(watch_dir: str, parent_dir: str, pattern: str) -> Callable[[str], bool]:
"""
This function returns a function that performs pattern matching against a file path.
Use this function as a template for creating your own pattern matching functions, which
you can then use in the values of the return object in the get_param_mapping function.

Returns
-------
Callable[[str], bool]
Function that takes a file as input and returns True if the file matches the pattern.
"""

def fp_res(x: str):
return x in glob.glob(os.path.join(watch_dir, pattern), recursive=True)

return fp_res


def get_param_mapping(
watch_dir: str,
parent_dir: str = "",
file_name: str = "",
modified_time: str = "",
body: bytes = bytes(),
) -> Dict[str, Callable[[str], bool]]:
"""
This function is called when a file is added or modified in the watch directory.
Modify this function to capture the files you want to trigger the flow;
the function should return a dictionary where the keys are <node name>.<param name>
and values are functions for performing pattern matching against the target file.

For nodes that accept multiple inputs, specify a list of functions to match against;
each specified function should uniquely match 1 file.
"""
id_group = re.search(r"^(\w+)", file_name)
if id_group == None:
return {}
id = id_group.group()
return {
"Ingest_Data.csv": fp(watch_dir, parent_dir, f"{id}_lc_output*.csv"),
}


def execute(flow_params_fw: FileWatcherResult) -> TriggerFlowParams:
"""
Called when all glob patterns specified by get_param_mapping have been matched.

Parameters
----------
flow_params_fw : FileWatcherResult
Dict of FileParam objects indexed by <node name>.<param name>
"""
return TriggerFlowParams(
single_file_params=flow_params_fw.files,
multi_file_params=None,
benchling_tag=None,
additional_params={},
)

If a second parameter is added, such as an Excel_Read node called "Experiment_Context" that accepted files of the pattern "{experiment_id}_context.xlsx", then the get_param_mapping function would later have to be modified to include that param like so:

def get_param_mapping(
watch_dir: str,
parent_dir: str = "",
file_name: str = "",
modified_time: str = "",
body: bytes = bytes(),
) -> Dict[str, Callable[[str], bool]]:
"""
This function is called when a file is added or modified in the watch directory.
Modify this function to capture the files you want to trigger the flow;
the function should return a dictionary where the keys are <node name>.<param name>
and values are functions for performing pattern matching against the target file.

For nodes that accept multiple inputs, specify a list of functions to match against;
each specified function should uniquely match 1 file.
"""
id_group = re.search(r"^(\w+)", file_name)
if id_group == None:
return {}
id = id_group.group()
return {
# The keys in the dict below take the form "<node name>.<parameter name>"
# For example, the default Input_File node is called "Input_File"
# and has a parameter called "file_pattern", so the key would be
# "Input_File.file_pattern"
"Ingest_Data.csv": fp(watch_dir, parent_dir, f"{id}_lc_output*.csv"),
"Experiment_Context.excel": fp(watch_dir, parent_dir, f"{id}_context.xlsx"),
}

This would not only ensure the files that match the parameters are sent to the correct flow node, but also only group files of the same id together. If three files came were ingested in the following order:

  • experiment626_lc_output072623.csv
  • experiment627_context.xlsx
  • experiment626_context.xlsx

Instead of starting the flow after the first two files, which do fulfill the parameter file patterns, the experiment ids will be grouped together so that the flow is only started when all experiment626 files are ready.

Whenever a file matches a glob pattern, it will be uploaded to Ganymede storage, even if it's not used in a flow. Files that are written to the watched directory can be ignored by ensuring that they do not match any of the glob patterns for the parameter inputs.

Agent configuration - File Watcher

 

Patterns can also be matched against subdirectories using * for single level subdirectories and ** for any level subdirectories.

For example, if your instrument writes out files in a directory like:

├── experiment_id_1
│   ├── configuration.xml
│   └── results.csv
└── experiment_id_2
├── configuration.xml
└── results.csv

You would use parameters like */configuration.xml and */results.csv to upload the files and submit them to a flow.

Documentation on the objects used in the user-defined code for these agents can be found on the Agent Data Models page.

Example use case

An instrument outputs files to a directory as it completes runs, which are processed in Ganymede Cloud.

Set a cron job to run flows periodically

Input Parameters

  • Flow Name: Flow to run upon observing new files matching specified pattern
  • Time Interval: Frequency and times with which to run flow, based on UTC time
  • File Tags: Tags to associate with the files uploaded to Ganymede Cloud
  • Image: Image to associate with agent. A common use case is to associate an image of the instrument that the agent is running on.
  • Auto deploy code and configuration changes to Live Connections: If checked, updates to agent code will be reflected on currently installed agents.
Agent configuration - Cron

 

Example use case

A user-defined script must be run once a day to poll and capture updates from telemetry devices, which are further processed in a Ganymede flow.

Watch for files locally and upload

Input Parameters

  • Flow Name: Flow to run upon observing new files matching specified pattern
  • Check Period (seconds): Frequency with which Agent will poll local directory for new files
  • If a discovered filename exists in storage: Whether to use the file in storage or the observed file in the local directory
  • File pattern to parameter mapping: For the selected Flow, the glob pattern to associate with the input parameter(s).
  • File Tags: Tags to associate with the files uploaded to Ganymede Cloud
  • Image: Image to associate with agent. A common use case is to associate an image of the instrument that the agent is running on.
  • Auto deploy code and configuration changes to Live Connections: If checked, updates to agent code will be reflected on currently installed agents.
Agent configuration - Cron

 

Example use case

Multiple flow cytometers are used to observe cell populations for a related set of experiments, which are collated by Ganymede Agents configured to systematically capture these runs.

Watch for flow outputs then save locally

Input Parameters

  • Flow Name: Flow from which to download output files. This will autopopulate the glob pattern matching field correspondingly.
  • Glob pattern matching: the glob patterns that output files must match in order to download.
  • Image: Image to associate with agent. A common use case is to associate an image of the instrument that the agent is running on.
  • Auto deploy code and configuration changes to Live Connections: If checked, updates to agent code will be reflected on currently installed agents.
Agent configuration - Download

 

Example use case

Instructions for lab execution are generated on Ganymede Cloud and downloaded to the instrument PC for execution.

Load local files into Ganymede with a custom filter

Agent configuration - Download

 

Example use case

A selected subset of key instrument output files are captured on Ganymede Cloud.

Set a cron job to upload files periodically

Input Parameters

  • Time Interval: Frequency and times with which to upload files, based on UTC time
  • File Tags: Tags to associate with the files uploaded to Ganymede Cloud
  • Image: Image to associate with agent. A common use case is to associate an image of the instrument that the agent is running on.
  • Auto deploy code and configuration changes to Live Connections: If checked, updates to agent code will be reflected on currently installed agents.
Agent configuration - Cron

 

Example use case

A local file is modified on regular intervals and needs to uploaded to Ganymede Cloud after each modification.

Installing Agents

To install an Agent, open the Ganymede application in a browser window and navigate to the Connection tab in the left sidebar. Select the desired Agent by name and download the relevant Windows/Linux installation file.

note

Both Linux and Windows versions of the Agent are built using x86_64 architecture. The Linux executable is Ubuntu-based and the Windows executable is Windows Server 2022.

Windows Installation

After downloading the Agent, launch the installation file to complete Agent configuration

Windows Agent installation

 

  • Connection name is the name seen by users in the instrument computer and in the Ganymede UI
  • Variable definitions are strings in the format "var_name=var_value", which allows users to set context variables to be used in the user-defined Python
  • Label is a string that can be used to identify and group connections within Ganymede. The label is visible in the Connections UI

Variables and labels can be referenced in your user-defined code by extracting the values from kwargs:

labels: list[str] = kwargs.get('labels', [])
# -- OR --
vars: dict[str, str] = kwargs.get('vars', [])

For Agents that reference local directories, the directory watched is specified in the Additional Params input box:

-v "input_path=/absolute/path"

# Example Windows input path specification
# -v "input_path=C:\Users\<username>\Desktop\watch_folder"

# If the directory is in a network drive, be sure to use the UNC path or IP like so:
-v "input_path=//server/share/path"
# where `server` is the name of the server and `share` is the name of the shared folder
Sleep status

Agents may be unable to run if the computer enters a sleep state; this is particularly relevant for cron flows. To prevent this, ensure that the computer is not set to sleep when the flow is expected to execute.

Network privileges

Because Windows services use the Local System User, which does not have network privileges by default, you may need to ensure the service is running on a user which can access the network drive.

To do so, follow the steps below:

  1. Set Up Authentication:
    • There are two options for authenticating to a remote system:
      • Use Windows Credential Manager to store the remote user's credentials on the local system.
      • Create matching local (if cross-domain) or domain (if on the same domain, e.g. AD) user accounts on both the remote and local systems.
  2. Check Network:
    • Ensure both systems can communicate using tools like ping.
  3. Service Configuration:
    • Open services.msc on the local system.
    • Find and right-click the Ganymede service > Properties > Log On. Use the local user account.
  4. Restart Service:
    • In Services, right-click the service and select Restart.

Note: Use IP if systems are on different domains.

Linux Installation

After downloading the agent, create a systemd service file similar to what is shown below:

[Unit]
Description=Ganymede Example Agent
After=network.target
StartLimitBurst=5
StartLimitIntervalSec=10

[Service]
Type=simple
Restart=always
RestartSec=1
User=jane_doe
ExecStart=/path/to/agent/executable -l linux -l service -n CWService
StandardOutput=append:/var/log/ganymede/CWService.log
StandardError=append:/var/log/ganymede/CWService_err.log

[Install]
WantedBy=multi-user.target
  • Description: field to identify the Agent in logs
  • Restart: set to restart the Agent if it crashes
  • User: user to associate Agent runs with
  • ExecStart: path to the Agent executable
  • StandardOutput: log file for Agent output
  • StandardError: log file for Agent errors

Once the service file is created, the Agent can be started by running:

Save this file to /etc/systemd/system/ and set permissions to 644. For example, if the systemd service file were named ganymede_example_agent.service, this could be accomplished by running:

sudo chmod 644 /etc/systemd/system/ganymede_example_agent.service

To interact with this service, you can use the following commands:

# to start the Agent service
sudo systemctl start ganymede_example_agent.service

# to observe status of the Agent service
sudo systemctl status ganymede_example_agent.service

Configuring User-Defined Python

To modify user-defined code executed by the Agent, select the Agent in the Ganymede app and then click on the

Code
button. Doing so opens a notebook for the user to modify user-defined code, which is executed on observed files prior to transfer.

Agent update configuration

 

Agent User-Defined Python

 

Previously built Agents remain available for download in Ganymede in the History tab

Maintaining Agents

Maintaining Agents - configuration

Viewing Logs

Logs can be found on the Logs tab for each Agent. These logs contain status check-ins and information regarding Agent activity.

Agent logs

Monitoring Agent Connections

Agents are capable of communicating with Ganymede Cloud to upload files and run flows. These communications can be monitored by observing status updates and logs in the Ganymede web app UI.

The Connections page shows an overview of all connections that have ever been made and their latest status in addition to other metadata made available from their last status ping.

Agent Status

 

The Agent page displays a list of all connections instantiated from a specific Agent. This provides the user quick and easy access to the running executables associated with a specific Agent configuration.

Agent Connection Status

 

Connection Status

Agents send a heartbeat message to Ganymede Cloud every 30 seconds to inform their status. The status of an Agent can be one of the following:

  • Live: The Agent is currently running and communicating with Ganymede Cloud
  • Disconnected: Ganymede Cloud has not received a status update from the Agent in the last 65 seconds
  • Shutdown: The Agent was intentionally shut down; some potential reasons are if a user shut down the computer, manually stopped the Agent service, or if a known exception occurred for the Agent.
  • Deprecated: The Agent has been disabled in Ganymede UI as described in the Updating Agents section.

Logging

You create custom logs that will be displayed in the Ganymede UI and local log files by using the logger keyword argument in your user-defined processor code. For example:

def execute(**kwargs) -> Optional[TriggerFlowParams]:  # type: ignore
# if there is not logger, the default logger is print
logger = kwargs.get("logger", print)
logger(f"Local files: {os.listdir('./')}")

Updating Agents

To modify Agent settings after creation, select the desired Agent in the Connections tab to update the Agent. Upon clicking

Update
, the Agent will be built and then made available for download.

Agent update configuration

 

Agents can also be archived or disabled.

Archived Agents cannot be updated, but the Agent's connections can still communicate with Ganymede Cloud. All associated connections for these Agents will continue to run, but the Agent can no longer be modified. Archived agents can be restored to an active state by selecting the desired Agent and clicking on the Restore Agent link.

Disabled Agents cannot be updated and its connections can no longer communicate with Ganymede Cloud.

Viewing Build History

Each iteration of Agent build can be viewed in the History tab of the Agent. This view provides context for each change, either in the form of a log of the configuration change or through a custom commit message from the Agent notebook.

Agent build history

 

Configuration differences between two Agent builds can be viewed for audit or debugging purposes by clicking

button.

Agent build history detail

Uninstalling Agents

Windows

The Agent can be uninstalled and associated service removed through the “Add or Remove Programs” panel from the Control Panel.

Agent uninstall

After uninstalling the Agent, the Ganymede folder will remain in the Program Files directory. This folder can be deleted if desired.

Linux

To uninstall the Agent, stop the systemd service associated with the Agent and remove the service file from /etc/systemd/system/