Skip to main content

Analysis Notebooks

Analysis notebooks are Jupyter notebooks that enable access to data uploaded to the Ganymede Cloud data lake as part of a flow run. A fresh notebook instantiation has templates to retrieve data and save notebooks. To access a fresh notebook instance, on the

Flow Editor
page, click on the Analysis button in the header and select default. The image below shows an example of what this notebook would contain:

Ganymede Notebook

The first 2 cells in the image provide templates for validating and querying data, while the last cell enables the notebook to be saved to git.

info

To save the notebook to a different name, modify the dest key (i.e. - replace new_notebook with the desired notebook name) and execute the cell.

Installing Python packages

A list of available packages can be retrieved by running

!pip freeze --local

Additional packages can be installed using pip magic. For example, the following command installs a number of analytics and plotting packages:

!pip install scikit-learn seaborn matplotlib pandas_gbq

Loading data from Ganymede data lake

Tables in the environment can be accessed via the Ganymede SDK. For example, the code below lists available tables in the environment:

from ganymede_sdk import Ganymede

g = Ganymede()
tables = g.list_tables()
for table in tables:
print(table.table_id)

A query can be run by passing a SQL snippet following ANSI SQL syntax to the query_sql variable and running the results method from Ganymede's query module. This will return a Pandas dataframe. Before running the query, a dry run of the query is also available in the Ganymede's query module using the dry_run method. For example, the commands below perform a dry run and return the results of the query provided in query_sql.

q = 'select * from ganymede_demo.demo_table'
from ganymede_sdk import Ganymede

g.retrieve_sql(query_sql)

Saving notebooks

The final cell contains a code which commits the notebook to the HEAD of the Github repository containing the stored Flow. The src entry in the files dictionary specifies the location of the notebook within the repo, and the dest entry specifies the name that the notebook is committed under.

from ganymede_sdk.notebook import save

files = [{'src': 'default', 'dest': 'notebook_to_save_to'}]
save(files)