Basics¶
Getting started¶
From CSV to Cube¶
In this part of the tutorial, you will create your first cube from a CSV file and learn multidimensionnal concepts such as cube, dimension, hierarchy, measure.
Let’s start by creating a session:
[1]:
import atoti as tt
session = tt.create_session()
We can now load the data from a CSV file into an in-memory table called a store:
[2]:
sales_store = session.read_csv("data/sales.csv", keys=["Sale ID"])
We can have a look at the loaded data. They are sales from a company:
[3]:
sales_store.head()
[3]:
Date | Shop | Product | Quantity | Unit price | Amount | |
---|---|---|---|---|---|---|
Sale ID | ||||||
S000000000 | 2020-06-12 | shop_0 | TAB_0 | 1.0 | 210.0 | 210.0 |
S000000001 | 2020-06-11 | shop_1 | TAB_1 | 1.0 | 300.0 | 300.0 |
S000000002 | 2020-06-10 | shop_2 | CHA_2 | 2.0 | 60.0 | 120.0 |
S000000003 | 2020-06-09 | shop_3 | BED_3 | 1.0 | 150.0 | 150.0 |
S000000004 | 2020-06-08 | shop_4 | BED_4 | 3.0 | 300.0 | 900.0 |
We will come back to stores in details later, for now we will use the one we have to build a cube:
[4]:
cube = session.create_cube(sales_store)
That’s it, you have created your first cube! But what’s a cube exactly and how to use it?
Multidimensional concepts¶
A cube is a multidimensional view of some data, making it easy to explore, aggregate, filter and compare. It’s called a cube because each attribute of the data can be represented as a dimension of the cube:
The axes of the cube are called hierarchies. The purpose of multidimensionnal analysis is to visualize some numeric indicators at specific coordinates of the cube. These indicators are called measures. An example of measure would be the amount of products sold.
We can list the hierarchies in our cube:
[5]:
# Aliasing the hierarchies property to a shorter variable name because we will use it a lot.
h = cube.hierarchies
h
[5]:
- Dimensions
- Hierarchies
- Date
- Date
- Product
- Product
- Sale ID
- Sale ID
- Shop
- Shop
- Date
- Hierarchies
The cube has automatically created a hierarchy for each non numeric field: “Date”, “Product”, “Sale ID” and “Shop”.
You can see that the hierarchy are grouped into dimensions. Here we have a single dimension, the default one, called “Hierarchies” but we will later group the hierarchies about the same concept in the same dimension.
Hierarchies are also made of levels. Levels of the same hierarchy are attributes with a parent-child relationship. For instance, a city belongs to a country so “Country” and “City” could be the two levels of a “Geography” hierarchy.
At the moment, to keep it simple, we only have single-level hierarchies.
[6]:
lvl = cube.levels
Let’s have a look at the measures of the cube that have been inferred from the data:
[7]:
m = cube.measures
m
[7]:
- Measures
- Amount.MEAN
- formatter: DOUBLE[#,###.00]
- Amount.SUM
- formatter: DOUBLE[#,###.00]
- Quantity.MEAN
- formatter: DOUBLE[#,###.00]
- Quantity.SUM
- formatter: DOUBLE[#,###.00]
- Unit price.MEAN
- formatter: DOUBLE[#,###.00]
- Unit price.SUM
- formatter: DOUBLE[#,###.00]
- contributors.COUNT
- formatter: None
- Amount.MEAN
The cube has automatically created the sum and mean aggregations for all the numeric fields of the dataset.
Note that a measure isn’t a single result number, it’s more a formula that can be evaluated for any coordinates of the cube.
For instance, we can see the grand total of “Quantity.SUM”, which means summing the sold quantities over the whole dataset:
[8]:
cube.query(m["Quantity.SUM"])
[8]:
Quantity.SUM | |
---|---|
0 | 8,077.00 |
But we can also dice the cube to get the quantity for each “Shop”, which means taking one slice of the cube for each “Shop”:
[9]:
cube.query(m["Quantity.SUM"], levels=lvl["Shop"])
[9]:
Quantity.SUM | |
---|---|
Shop | |
shop_0 | 202.00 |
shop_1 | 202.00 |
shop_10 | 203.00 |
shop_11 | 203.00 |
shop_12 | 201.00 |
shop_13 | 202.00 |
shop_14 | 202.00 |
shop_15 | 202.00 |
shop_16 | 201.00 |
shop_17 | 204.00 |
shop_18 | 202.00 |
shop_19 | 202.00 |
shop_2 | 202.00 |
shop_20 | 201.00 |
shop_21 | 201.00 |
shop_22 | 201.00 |
shop_23 | 203.00 |
shop_24 | 203.00 |
shop_25 | 201.00 |
shop_26 | 202.00 |
shop_27 | 202.00 |
shop_28 | 202.00 |
shop_29 | 201.00 |
shop_3 | 201.00 |
shop_30 | 204.00 |
shop_31 | 202.00 |
shop_32 | 202.00 |
shop_33 | 201.00 |
shop_34 | 201.00 |
shop_35 | 201.00 |
shop_36 | 203.00 |
shop_37 | 203.00 |
shop_38 | 201.00 |
shop_39 | 202.00 |
shop_4 | 204.00 |
shop_5 | 202.00 |
shop_6 | 202.00 |
shop_7 | 201.00 |
shop_8 | 201.00 |
shop_9 | 201.00 |
We can slice on a single “Shop”:
[10]:
cube.query(
m["Quantity.SUM"], condition=lvl["Shop"] == "shop_0",
)
[10]:
Quantity.SUM | |
---|---|
0 | 202.00 |
We can dice along 2 different axes and take the quantity per product and date.
[11]:
cube.query(m["Quantity.SUM"], levels=[lvl["Date"], lvl["Product"]])
[11]:
Quantity.SUM | ||
---|---|---|
Date | Product | |
2020-05-14 | BED_24 | 8.00 |
BED_25 | 4.00 | |
BED_26 | 6.00 | |
BED_27 | 4.00 | |
BED_3 | 2.00 | |
... | ... | ... |
2020-06-12 | TSH_52 | 6.00 |
TSH_53 | 4.00 | |
TSH_7 | 3.00 | |
TSH_8 | 5.00 | |
TSH_9 | 3.00 |
1830 rows × 1 columns
We can even combine these operations to slice on one hierarchy and dice on the two others:
[12]:
cube.query(
m["Quantity.SUM"],
levels=[lvl["Date"], lvl["Product"]],
condition=lvl["Shop"] == "shop_0",
)
[12]:
Quantity.SUM | ||
---|---|---|
Date | Product | |
2020-05-23 | BED_24 | 1.00 |
BED_26 | 1.00 | |
BED_3 | 1.00 | |
BED_4 | 1.00 | |
BED_46 | 1.00 | |
... | ... | ... |
2020-06-12 | TSH_51 | 2.00 |
TSH_52 | 1.00 | |
TSH_53 | 1.00 | |
TSH_7 | 1.00 | |
TSH_9 | 1.00 |
125 rows × 1 columns
First visualization¶
So far we have used cube.query
which returns a table as a pandas DataFrame but a better way to visualize multidimensional data is a pivot table. With atoti’s JupyterLab extension, you can do advanced and interactive visualizations such as pivot tables and charts directly into your notebook by calling cube.visualize()
.
This will create a widget and open the atoti tab on the left with tools to manipulate the widget.
Let’s start by creating a pivot table:
Run
cube.visualize()
.Select “Pivot table”.
In the left panel, click on a measure such as “Amount.SUM” to add it.
Click on a hierarchy such as “Date” to get the Amount per date.
Drag and drop another hierarchy such as “Product” to the “Columns” section to get the amount sold per day and per product.
[13]:
cube.visualize()
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
We can also create a first chart representing the evolution of the amount sold per day:
Run
cube.visualize()
.Select “Chart”.
Click on the “Amount.SUM” measure to add it to the chart.
Click on the “Date” hierachy to add it to the chart.
[14]:
cube.visualize()
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
Drilldown and filters¶
Multidimensional analysis is meant to be done from top to bottom: start by visualizing the indicators at the top level then drilldown to explain the top figures with more details.
For instance, we can visualize some measures per date then drilldown on the “Shops” for a specific date, then see the products sold by a specific “Shop” on this date.
Using the previous cube representation, this is like zooming more and more on a part of the cube.
[15]:
cube.visualize()
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
It’s also very easy to filter on a hierarchy when building widgets. Let’s apply a filter on a chart:
[16]:
cube.visualize()
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
Dashboarding application¶
Being able to quickly build widgets inside a notebook without coding is nice to rapidly explore the data, iterate on your model and share some results. However, to provide richer insights, dashboards are even better. That’s why atoti comes with a web application that can be accessed outside of the notebook and where widgets can be laid out to form dashboards.
The URL of this application can be accessed like that:
[17]:
session.url
[17]:
'http://localhost:45697'
It’s possible to publish widgets built in the notebook to the application by right clicking on them and selecting “Publish widget in app”. They will then be available in the “Saved widgets” section.
Enriching the cube¶
In the previous section, you have learned how to create a basic cube and manipulate it. We will now enrich this cube with additional attributes and more interesting measures.
Join¶
Currently, we have very limited information about our products: only the ID. We can load a CSV containing more details into a new store:
[18]:
products_store = session.read_csv("data/products.csv", keys=["Product"])
Note that a store can have a set of keys. These keys are the columns which make each line unique. Here, it’s the product ID.
If you try to insert a new row with the same keys as an existing row, it will override the existing one.
[19]:
products_store.head()
[19]:
Category | Sub category | Size | Purchase price | Color | Brand | |
---|---|---|---|---|---|---|
Product | ||||||
TAB_0 | Furniture | Table | 1m80 | 190.0 | black | Basic |
TAB_1 | Furniture | Table | 2m40 | 280.0 | white | Mega |
CHA_2 | Furniture | Chair | N/A | 48.0 | blue | Basic |
BED_3 | Furniture | Bed | Single | 127.0 | red | Mega |
BED_4 | Furniture | Bed | Double | 252.0 | brown | Basic |
This store contains the category, subcategory, size, color, purchase price and brand of the product. Both stores have a “Product” field we can use to join them.
Note that this is a database-like join and not a pandas-like join. All the details from products_store
won’t be inlined into sales_store
. Instead, this just declares a reference between these two stores that the cube can use to provide more analytical axes.
[20]:
sales_store.join(products_store, mapping={"Product": "Product"})
You can visualize the structure of the whole datastore:
[21]:
session.stores.schema
[21]:
The new columns have been automatically added to the cube as hierarchies:
[22]:
h
[22]:
- Dimensions
- Hierarchies
- Brand
- Brand
- Category
- Category
- Color
- Color
- Date
- Date
- Product
- Product
- Sale ID
- Sale ID
- Shop
- Shop
- Size
- Size
- Sub category
- Sub category
- Brand
- Hierarchies
You can use them directly in a new widget. For instance, let’s create a bar chart to visualize the mean price per subcategory of product:
[23]:
cube.visualize()
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
We can also make a donut chart to see how all the sales are distributed between brands:
[24]:
cube.visualize()
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
In a similar way, we can enrich the data about the “Shops”:
[25]:
shops_store = session.read_csv("data/shops.csv", keys=["Shop ID"])
shops_store.head()
[25]:
City | State or region | Country | Shop size | |
---|---|---|---|---|
Shop ID | ||||
shop_0 | New York | New York | USA | big |
shop_1 | Los Angeles | California | USA | medium |
shop_2 | San Diego | California | USA | medium |
shop_3 | San Jose | California | USA | medium |
shop_4 | San Francisco | California | USA | small |
[26]:
sales_store.join(shops_store, mapping={"Shop": "Shop ID"})
session.stores.schema
[26]:
We can now plot the evolution of the sales per country over time:
[27]:
cube.visualize()
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
New measures¶
So far we have only used the default measures which are basic aggregations of the numeric columns. We can add new custom measures to our cube.
Max¶
We’ll start with a simple aggregation taking the maximum price of the sales store:
[28]:
m["Max price"] = tt.agg.max(sales_store["Unit price"])
This new measure is directly available:
[29]:
cube.query(m["Max price"])
[29]:
Max price | |
---|---|
0 | 440.00 |
[30]:
cube.query(m["Max price"], levels=lvl["Category"])
[30]:
Max price | |
---|---|
Category | |
Cloth | 60.00 |
Furniture | 440.00 |
Margin¶
Now that the price of each product is available from the products store, we can compute the margin.
To do that, we start by computing the cost which is the quantity sold multiplied by the purchase price, summed over all the products.
Note the use of the origin
scope instructing to perform the multiplication for each product and then do the sum.
[31]:
cost = tt.agg.sum(
m["Quantity.SUM"] * products_store["Purchase price"],
scope=tt.scope.origin(lvl["Product"]),
)
[32]:
m["Margin"] = m["Amount.SUM"] - cost
We can also define the margin rate which is the ratio of the margin by the the sold amount:
[33]:
m["Margin rate"] = m["Margin"] / m["Amount.SUM"]
[34]:
cube.query(m["Margin"], m["Margin rate"], levels=lvl["Product"])
[34]:
Margin | Margin rate | |
---|---|---|
Product | ||
BED_24 | 3,082.00 | .15 |
BED_25 | 6,336.00 | .16 |
BED_26 | 8,060.00 | .16 |
BED_27 | 8,580.00 | .15 |
BED_3 | 3,036.00 | .15 |
... | ... | ... |
TSH_52 | 520.00 | .17 |
TSH_53 | 396.00 | .12 |
TSH_7 | 393.00 | .15 |
TSH_8 | 264.00 | .10 |
TSH_9 | 390.00 | .14 |
61 rows × 2 columns
Let’s use this margin rate to do a “Top 10” filter to see the products with the best rate.
Note that you don’t need to put the rate measure and the product level in the pivot table to apply the filter.
[35]:
cube.visualize("10 most profitable products")
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
Cumulative sum over time¶
A cumulative sum is the partial sum of the data up to the current value. For instance, a cumulative sum over time can be used to show how some measure changes over time.
[36]:
m["Cumulative amount"] = tt.agg.sum(
m["Amount.SUM"], scope=tt.scope.cumulative(lvl["Date"])
)
[37]:
cube.visualize()
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
Average per store¶
Aggregations can also be combined. For instance, we can sum inside a “Shop”: then take the average of this to see how much a store sales on average:
[38]:
m["Average amount per shop"] = tt.agg.mean(
m["Amount.SUM"], scope=tt.scope.origin(lvl["Shop"])
)
[39]:
cube.query(m["Average amount per shop"])
[39]:
Average amount per shop | |
---|---|
0 | 24,036.58 |
[40]:
cube.query(m["Average amount per shop"], levels=lvl["Sub category"])
[40]:
Average amount per shop | |
---|---|
Sub category | |
Bed | 12,728.88 |
Chair | 601.50 |
Hoodie | 1,403.20 |
Shoes | 3,184.50 |
Table | 5,023.50 |
Tshirt | 1,095.00 |
Fact-level operations¶
[41]:
sales_store.head()
[41]:
Date | Shop | Product | Quantity | Unit price | Amount | |
---|---|---|---|---|---|---|
Sale ID | ||||||
S000000019 | 2020-05-24 | shop_19 | SHO_19 | 2.0 | 60.0 | 120.0 |
S000000039 | 2020-06-03 | shop_39 | SHO_39 | 1.0 | 60.0 | 60.0 |
S000000058 | 2020-05-15 | shop_18 | SHO_58 | 2.0 | 60.0 | 120.0 |
S000000074 | 2020-05-29 | shop_34 | HOO_13 | 1.0 | 48.0 | 48.0 |
S000000084 | 2020-05-19 | shop_4 | CHA_23 | 2.0 | 60.0 | 120.0 |
As you can see, our data already contains an “Amount” column which is equal to “Quantity” * “Unit price”. It’s a good practice to do this kind of fact-level preprocessing outside atoti. Indeed, it avoids redoing these computations on every row over and over, slowing down every query. Like that, atoti only has to do what it’s the best at: aggregation.
Multilevel hierarchies¶
So far, all our hierarchies only had one level but it’s best to regroup attributes with a parent-child relationship in the same hierarchy.
For example, we can group the “Category”, “SubCategory” and “Product ID” levels into a “Product” hierarchy:
[42]:
h["Product"] = [lvl["Category"], lvl["Sub category"], lvl["Product"]]
And let’s remove the old hierarchies:
[43]:
del h["Category"]
del h["Sub category"]
[44]:
h
[44]:
- Dimensions
- Hierarchies
- Brand
- Brand
- City
- City
- Color
- Color
- Country
- Country
- Date
- Date
- Product
- Category
- Sub category
- Product
- Sale ID
- Sale ID
- Shop
- Shop
- Shop size
- Shop size
- Size
- Size
- State or region
- State or region
- Brand
- Hierarchies
We can also do it with “City”, “Region” and “Country” to build a “Geography” hierarchy.
Note that instead of using existing levels you can also define a hierarchy with the fields of the store the levels will be based on:
[45]:
h["Geography"] = [
shops_store["Country"],
shops_store["State or region"],
shops_store["City"],
]
del h["Country"]
del h["State or region"]
del h["City"]
As we are restructuring the hierarchies, let’s use this opportunity to also change the dimensions.
A dimension regroups hierarchies of the same concept.
To keep it simple here, we will simply move the new “Geography” hierarchy to its own dimension:
[46]:
h["Geography"].dimension = "Location"
h
[46]:
- Dimensions
- Hierarchies
- Brand
- Brand
- Color
- Color
- Date
- Date
- Product
- Category
- Sub category
- Product
- Sale ID
- Sale ID
- Shop
- Shop
- Shop size
- Shop size
- Size
- Size
- Brand
- Location
- Geography
- Country
- State or region
- City
- Geography
- Hierarchies
With that, we can define new measures taking advantage of the multilevel structure. For instance, we can create a measure indicating how much a product contributes to its subcategory:
[47]:
m["Parent category amount"] = tt.parent_value(m["Amount.SUM"], on=h["Product"])
[48]:
m["Percent of parent amount"] = m["Amount.SUM"] / m["Parent category amount"]
[49]:
cube.visualize()
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
Polishing the cube¶
Deleting or hiding measures¶
Some measures have been automatically created from numeric fields but are not useful. For instance, “Unit Price.SUM” does not really make sense as we never want to sum the unit prices. We can delete it:
[50]:
del m["Unit price.SUM"]
Other measures have been used while building the project only as intermediary steps but are not useful to the end users in the application. We can hide them from the UI (they will remain accessible in Python):
[51]:
m["Parent category amount"].visible = False
Measure folders¶
Measures can be rearranged into folders.
[52]:
for measure in [
m["Amount.MEAN"],
m["Amount.SUM"],
m["Average amount per shop"],
m["Cumulative amount"],
m["Percent of parent amount"],
]:
measure.folder = "Amount"
[53]:
m
[53]:
- Measures
- 📁 Amount
- Amount.MEAN
- formatter: DOUBLE[#,###.00]
- Amount.SUM
- formatter: DOUBLE[#,###.00]
- Average amount per shop
- formatter: DOUBLE[#,###.00]
- Cumulative amount
- formatter: DOUBLE[#,###.00]
- Percent of parent amount
- formatter: DOUBLE[#,###.00]
- Amount.MEAN
- Margin
- formatter: DOUBLE[#,###.00]
- Margin rate
- formatter: DOUBLE[#,###.00]
- Max price
- formatter: DOUBLE[#,###.00]
- Purchase price.VALUE
- formatter: DOUBLE[#,###.00]
- Quantity.MEAN
- formatter: DOUBLE[#,###.00]
- Quantity.SUM
- formatter: DOUBLE[#,###.00]
- Unit price.MEAN
- formatter: DOUBLE[#,###.00]
- contributors.COUNT
- formatter: None
- 📁 Amount
Measure formatters¶
Some measures can be formatted for a nicer display. Classic examples of this is changing the number of decimals or adding a percent or a currency symbol.
Let’s do this for our percent of parent amount and margin rate:
Before¶
[54]:
cube.query(m["Percent of parent amount"], m["Margin rate"], levels=lvl["Category"])
[54]:
Percent of parent amount | Margin rate | |
---|---|---|
Category | ||
Cloth | .24 | .23 |
Furniture | .76 | .13 |
[55]:
m["Percent of parent amount"].formatter = "DOUBLE[0.00%]"
m["Margin rate"].formatter = "DOUBLE[0.00%]"
After¶
[56]:
cube.query(m["Percent of parent amount"], m["Margin rate"], levels=lvl["Category"])
[56]:
Percent of parent amount | Margin rate | |
---|---|---|
Category | ||
Cloth | 23.64% | 22.56% |
Furniture | 76.36% | 13.49% |
Simulations¶
Simulations are a way to compare several scenarios and do what-if analysis. This is very powerful as it helps understanding how changing the source data or a piece of the model impact the key indicators.
In atoti, the data model is made of measures chained together. A simulation can be seen as changing one part of the model, either its source data or one of its measure definitions, and then evaluating how it impacts the following measures.
Source simulation¶
Let’s start by changing the source. With pandas or Spark, if you want to compare two results for a different versions of the entry dataset you have to reapply all the transformations to your dataset. With atoti, you only have to provide the new data and all the measures will be automatically available for both versions of the data.
We will create a new scenario using pandas to modify the original dataset.
[57]:
import pandas as pd
For instance, we can simulate what would happen if we had managed to purchase some products at a cheaper price.
[58]:
products_df = pd.read_csv("data/products.csv")
products_df.head()
[58]:
Product | Category | Sub category | Size | Purchase price | Color | Brand | |
---|---|---|---|---|---|---|---|
0 | TAB_0 | Furniture | Table | 1m80 | 190.0 | black | Basic |
1 | TAB_1 | Furniture | Table | 2m40 | 280.0 | white | Mega |
2 | CHA_2 | Furniture | Chair | NaN | 48.0 | blue | Basic |
3 | BED_3 | Furniture | Bed | Single | 127.0 | red | Mega |
4 | BED_4 | Furniture | Bed | Double | 252.0 | brown | Basic |
[59]:
better_prices = {
"TAB_0": 180.0,
"TAB_1": 250.0,
"CHA_2": 40.0,
"BED_3": 110.0,
"BED_4": 210.0,
}
[60]:
for product, purchase_price in better_prices.items():
products_df.loc[
products_df["Product"] == product, "Purchase price"
] = purchase_price
products_df.head()
[60]:
Product | Category | Sub category | Size | Purchase price | Color | Brand | |
---|---|---|---|---|---|---|---|
0 | TAB_0 | Furniture | Table | 1m80 | 180.0 | black | Basic |
1 | TAB_1 | Furniture | Table | 2m40 | 250.0 | white | Mega |
2 | CHA_2 | Furniture | Chair | NaN | 40.0 | blue | Basic |
3 | BED_3 | Furniture | Bed | Single | 110.0 | red | Mega |
4 | BED_4 | Furniture | Bed | Double | 210.0 | brown | Basic |
We can now load this new dataframe into a new scenarios of the product store.
[61]:
products_store.scenarios["Cheaper purchase prices"].load_pandas(products_df)
The session now has two scenarios and the only differences between them are the lines corresponding to the products with better prices, everything else is shared between the scenarios and has not been duplicated: source scenarios in atoti are memory-efficient.
Using the “Source Simulation” hierarchy we can display the margin of the scenario and compare it to the base case.
[62]:
cube.visualize()
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
Note that all the existing measures are immediately available on the new data. For instance, the margin rate still exists, and we can see that in this scenario we would have a better margin for the Furniture products.
[63]:
cube.visualize()
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
Measures simulations¶
The other simulation technique is to change the value of a measure for some coordinates. The value of the simulated measure can be multiplied by some factor, added to a fixed amount, or completely replaced.
When creating the simulation, you can choose at which granularity the modification applies. For instance, we can create a simulation of the quantity and amount measures per country. Doing that, we can answer questions such as “What happens if there is a crisis in France and we sell 20% less ?”
[64]:
quantity_simulation = cube.setup_simulation(
"Country Simulation",
levels=[lvl["Country"]],
multiply=[m["Quantity.SUM"], m["Amount.SUM"]],
)
Let’s add a first scenario to this simulation:
[65]:
france_crisis = quantity_simulation.scenarios["France crisis"]
A scenario is like a store where you can add new parameters to configure the scenario. In this scenario, we decided that France sell 20% less so let’s multiply the quantity and the amount by 80% only for France.
[66]:
france_crisis += ("France", 0.80, 0.80)
france_crisis.head()
[66]:
Country Simulation_Quantity.SUM_multiply | Country Simulation_Amount.SUM_multiply | Priority | |
---|---|---|---|
Country | |||
France | 0.8 | 0.8 | 0.0 |
Let’s query the cube using the new “Country Simulation” hierarchy to compare the quantity and amount between the base case and our new scenario:
[67]:
cube.query(
m["Quantity.SUM"],
m["Amount.SUM"],
levels=[lvl["Country Simulation"], lvl["Country"]],
)
[67]:
Quantity.SUM | Amount.SUM | ||
---|---|---|---|
Country Simulation | Country | ||
Base | France | 3,027.00 | 358,042.00 |
USA | 5,050.00 | 603,421.00 | |
France crisis | France | 2,421.60 | 286,433.60 |
USA | 5,050.00 | 603,421.00 |
A measure simulation can be seen as something intercepting the regular aggregation flow to override the value of one or more measures for the specified coordinates.
Here for example, as the amount has been modified, the measures depending on it such as the cumulative amount are also impacted.
[68]:
cube.query(m["Cumulative amount"], levels=[lvl["Country Simulation"], lvl["Country"]])
[68]:
Cumulative amount | ||
---|---|---|
Country Simulation | Country | |
Base | France | 358,042.00 |
USA | 603,421.00 | |
France crisis | France | 286,433.60 |
USA | 603,421.00 |
The current measure simulation can be illustrated like this:
Note that you are not limited to a single scenario:
[69]:
quantity_simulation.scenarios["US boost"] += ("USA", 1.15, 1.15)
[70]:
cube.query(m["Quantity.SUM"], levels=[lvl["Country Simulation"], lvl["Country"]])
[70]:
Quantity.SUM | ||
---|---|---|
Country Simulation | Country | |
Base | France | 3,027.00 |
USA | 5,050.00 | |
France crisis | France | 2,421.60 |
USA | 5,050.00 | |
US boost | France | 3,027.00 |
USA | 5,807.50 |
The two scenarios can be visualized in the same widget:
[71]:
cube.visualize()
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
Finally, we can even combine the different simulations (the source one and the measure one) to create a matrix of scenarios:
[72]:
cube.visualize()
Open the notebook in JupyterLab with the atoti extension installed and enabled to start editing this widget.
Summary¶
In this tutorial, you have learned all the basics to build a project with atoti, from the concept of multidimensional to powerful simulations. We now encourage you to try the library with your own data!