Analysis Notebooks
Analysis notebooks offer a scratch space for analyzing data in Ganymede Cloud. A fresh notebook instantiation has templates to retrieve data and save notebooks. To access a fresh notebook instance, go to the
page, click on the Analysis button in the header and selectdefault
. The image below shows an example of what this notebook would contain:
The first 2 cells in the image provide templates for validating and querying data, while the last cell enables the notebook to be saved to git.
To save the notebook to a different name, modify the dest
key (i.e. - replace new_notebook
with the desired notebook name) and execute the cell.
Installing Python packages
A list of available packages can be retrieved by running
!pip freeze --local
Additional packages can be installed using pip magic. For example, the following command installs a number of analytics and plotting packages:
!pip install scikit-learn seaborn matplotlib pandas_gbq
Loading data from Ganymede data lake
Tables in the environment can be accessed via the Ganymede SDK. For example, the code below lists available tables:
from ganymede_sdk import Ganymede
g = Ganymede()
tables = g.list_tables()
for table in tables:
print(table.table_id)
To run a query, pass an ANSI SQL snippet to the query_sql variable and use the retrieve_sql method from Ganymede's query module. This will return the results as a Pandas DataFrame. Below is an example of retrieving query results.
q = 'select * from ganymede_demo.demo_table'
from ganymede_sdk import Ganymede
g.retrieve_sql(query_sql)
Saving notebooks
The final cell contains a code that commits the notebook to the HEAD of the Github repository containing the stored Flow. The src
entry in the files
dictionary specifies the location of the notebook within the repository, and the dest
entry specifies the name under which the notebook is committed.
from ganymede_sdk.notebook import save
files = [{'src': 'default', 'dest': 'notebook_to_save_to'}]
save(files)