Skip to main content

Tagging Files

What are File Tags?

File tags are a way to organize files within Ganymede. These tags are specified in user-defined code and can be filtered for in the file browser. Tags can be used to systematically capture organize files; for example, they can be used to capture associate samples, run purpose, experiment ID, experiment step, and any other characteristic relevant to that file.

Once tagged, files can be filtered for in the file browser.

Tagged Files

 

To configure a file tag, configure the file tag type if necessary and reference the tag type in user-defined code. The following sections describe how to configure file tag types and tag files.

Configuring File Tag Types

Prior to tagging files, a file tag type must be configured. This is done by navigating to the Manage Tag Types section of the Files page.

Manage Tag Types

 

To create a new tag type, click the

Create Tag Type
button in the upper right corner of the page. This will open a modal where the attributes of a tag type can be specified.

Create Tag Type

 

For each tag type, the following attributes can be specified:

  • Tag Type Name: The name of the tag type.
  • Description: A description of the tag type.
  • Has URL: Indicates that the value of the tag type must be a URL (which would be rendered as a link in the file browser).
  • Allow Multiple: Indicates that multiple tags of the specified type can be applied to a single file.
  • Fill: Background color of tags with this type.
  • Text: Text color of tags with this type.

The strict mode setting, if disabled, allows admins to delete or modify tags. Tags may only be deleted if they are not applied to any files; the delete button is grayed out if tag type deletion is not permitted.

Tagging Files

Files can be tagged in user-defined code within flows and agents, but using slightly different methods. Within flows, files are tagged by passing the path to the file into the add_file_tag function. Within agents, files are tagged by passing the FileParam object into the add_file_tag_to_fileparam function. The FileParam object holds the file that the Agent submits to Ganymede storage (for initiating a flow if the agent is configured to do so).

The full set of methods available for interacting with tags can be found on the File Tag module in the SDK documentation.

Tagging Files in Flows

To tag a file within a flow, use the add_file_tag function in the ganymede_sdk.file_tag module. This function takes the following arguments:

  • param input_file_path: str - The name of the file to tag. The file path can be obtained from the keys in the dictionary returned by the retrieve_files for Ganymede class or by calling the get_gcs_uri method.
  • param tag_type_id: str - The name of the tag type to apply to the file.
  • param display_value: str - Value of the tag to display in the file browser.
  • param tag_id: Optional[str] - Optional tag ID that can be used to reference tags in code.
  • param url: Optional[str] - URL to associate with the tag, if applicable.
  • param bucket: Optional[str] - The bucket to tag the file in; either "input" or "output". If not specified, the full path to the file in storage must be provided.
from typing import Dict

from ganymede_sdk.file_tag import add_file_tag
from ganymede_sdk.io import NodeReturn


def execute(file_data: Dict[str, bytes], ganymede_context) -> NodeReturn:
entry_name = "test_entry"

for filename in file_data.keys():
add_file_tag(
input_file_path=filename,
tag_type_id="benchling_entry_id",
display_value=entry_name,
bucket="input",
)

File Tags can also be added to files returned in the NodeReturn object:

from typing import Dict

from ganymede_sdk.file_tag import add_file_tag
from ganymede_sdk.io import NodeReturn


def execute(file_data: Dict[str, bytes], ganymede_context) -> NodeReturn:
entry_name = "test_entry"
file_data_bytes = list(file_data.values())[0]

return NodeReturn(
files_to_upload={"output_filename.csv": file_data_bytes},

# Parameters for the add_file_tag function are passed in the tags parameter
#
# Note that the input_file_path parameter within the add_file_tag function does not need to be specified
tags={
"output_filename.csv": {
"tag_type_id": "benchling_entry_id",
"display_value": entry_name
}
}
)

Tagging Files in Agents

To tag a file delivered by an Agent to Ganymede, you can configure an agent to tag any files submitted from it using the File tags parameter in Agent configuration.

Tagging can also be performed programmatically using the add_file_tag_to_fileparam function in the ganymede_sdk.agent module. This function takes the following arguments:

  • param file_param: Union[FileParam, MultiFileParam] - The FileParam object to tag. FileParam objects hold a single file, while MultiFileParam objects hold multiple files (for use in nodes that take in multiple files). If a MultiFileParam object is specified, the tag is applied to all files within the object.
  • param tag_type_id: str - The name of the tag type to apply to the file.
  • param display_value: str - Value of the tag to display in the file browser.
  • param tag_id: str - Optional tag ID that can be used to reference tags in code.
  • param url: str - URL to associate with the tag, if applicable.
from ganymede_sdk.agent import (
FileParam,
FileWatcherResult,
MultiFileParam,
TriggerFlowParams,
add_file_tag_to_fileparam,
)

def execute(flow_params_fw: FileWatcherResult, **kwargs) -> TriggerFlowParams:
"""
Called when all glob patterns specified by get_param_mapping have been matched.

Parameters
----------
flow_params_fw : FileWatcherResult
Dict of FileParam objects indexed by <node name>.<param name>
"""
single_file_params = {}
multi_file_params = {}

labels = kwargs.get("labels", [])
var = kwargs.get("vars", []).get("input_path", "")

for param, files in flow_params_fw.files.items():
if isinstance(files, FileParam):
tagged_file_param = add_file_tag_to_fileparam(files, "lab-agent", var)
for label in labels:
tagged_file_param = add_file_tag_to_fileparam(tagged_file_param, "lab-agent" , label)
single_file_params[param] = tagged_file_param
else:
multi_file_params[param] = MultiFileParam.from_file_param(files)
info

Files are passed to and from Virtualization environments using agents, so files passed to and from virtualization environments can be tagged as well.

Note that virtualization stores a json file in the C:/Program Files/Ganymede/ directory that contains information about the session, including the input files and the current session id.

import json
from pathlib import Path

from ganymede_sdk.agent import FileWatcherResult, NoOpFileTagParams, add_file_tag_to_fileparam
from ganymede_sdk.file_tag import get_file_tags

SESSION_JSON_PATH = Path("C:/Program Files/Ganymede/session.json")

# Virtualization agent example for that tagging all files sent to Ganymede cloud storage with a notebook entry ID.
def execute(flow_params_fw: FileWatcherResult, **kwargs) -> NoOpFileTagParams:
file_params = list(flow_params_fw.files.values())

with open(SESSION_JSON_PATH, "r") as f:
session_json = f.read()

session_dict = json.loads(session_json)
input_files = session_dict["input_files"]
entry_tags = []

# A tag type defined in this tenant
NOTEBOOK_ENTRY_TAG_TYPE_ID = "notebook_entry_id"

for file in input_files:
tags = get_file_tags(file)
if tags is None:
continue
for tag in tags:
if tag.tag_type_id == NOTEBOOK_ENTRY_TAG_TYPE_ID:
entry_tags.append(tag)

if len(entry_tags):
entry_tag = entry_tags[0]

for file_param in file_params:
add_file_tag_to_fileparam(
file_param=file_param,
tag_type_id=NOTEBOOK_ENTRY_TAG_TYPE_ID,
display_value=entry_tag.display_value,
url=entry_tag.url,
)

return NoOpFileTagParams(files=file_params)