Getting Started

This is a short tutorial to get started with atoti and learn the very basics. We encourage you to copy this notebook in your workspace, replay it and even modify it.

Session and imports

A session holds a connection to a Java subprocess where all the data is stored and the computation takes place.

Start by importing atoti and creating your session:

[1]:
import atoti as tt

session = tt.create_session()
Welcome to atoti 0.4.0!

By using this community edition, you agree with the license available at https://www.atoti.io/eula.
Browse the official documentation at https://docs.atoti.io.
Join the community at https://www.atoti.io/register.

You can hide this message by setting the ATOTI_HIDE_EULA_MESSAGE environment variable to True.

Data sources and stores

Several data sources (CSV, Parquet, pandas…) are available to load data into atoti stores. They are all described in the “Data sources” part of the tutorial, this Getting started will only use CSV files.

When loading a source into a store you must specifiy one or more key columns.

It’s advised to do data cleaning before loading the data into a store (for example in pandas or directly in your CSV file).

[2]:
first_store = session.read_csv("data/example.csv", keys=["ID"], store_name="First")

You can retrieve the first rows of the store with head:

[3]:
first_store.head()
[3]:
Date Continent Country City Color Quantity Price
ID
1 2019-01-01 Europe France Paris red 1000.0 500.0
2 2019-01-02 Europe France Lyon red 2000.0 400.0
3 2019-01-05 Europe France Paris blue 3000.0 420.0
4 2018-01-01 Europe France Bordeaux blue 1500.0 480.0
5 2019-01-01 Europe UK London green 3000.0 460.0

You can view a store’s columns with columns

[4]:
first_store.columns
[4]:
['ID', 'Date', 'Continent', 'Country', 'City', 'Color', 'Quantity', 'Price']

References

A reference is a link between 2 stores.

You can specify the column mapping of the reference or use the default one which are the columns with the same name in your stores:

[5]:
capitals_store = session.read_csv(
    "data/capitals.csv", keys=["Country name"], store_name="Capitals"
)
[6]:
capitals_store.head()
[6]:
Capital
Country name
France Paris
UK London
China Beijing
India Dehli
[7]:
first_store.join(capitals_store, mapping={"Country": "Country name"})

Cubes

A cube can be defined on a store.

All the non-numerical columns of the store and the referenced stores will be converted to single level dimensions. Default measures will be created from numerical columns.

[8]:
cube = session.create_cube(first_store, "FirstCube")
[9]:
m = cube.measures
cube.query()
[9]:
Price.MEAN Price.SUM Quantity.MEAN Quantity.SUM contributors.COUNT
0 428.0 4280.0 2270.0 22700.0 10
[10]:
lvl = cube.levels
h = cube.hierarchies
h
[10]:
  • Dimensions
    • Hierarchies
      • Capital
        1. Capital
      • City
        1. City
      • Color
        1. Color
      • Continent
        1. Continent
      • Country
        1. Country
      • Date
        1. Date
      • ID
        1. ID

Measures

New custom measures can be added to your cube.

You will learn more about what is possible in the measure tutorial.

[11]:
m["Half quantity"] = tt.agg.sum(first_store["Quantity"]) / 2
cube.query(m["Half quantity"])
[11]:
Half quantity
0 11350.0

Interactive visualization

There are 2 ways to do interactive dataviz: with the JupyterLab extension and with the atoti application.

JupyterLab extension

When our JupyterLab extension is installed, you can build interactive widgets right there in your notebook:

cube.visualize()

atoti Application

This is our dashboarding suite. Its URL can be retrieved from the session:

[12]:
session.url
[12]:
'http://localhost:32889'