Measures

Default measures

Let’s quickly create a cube from a CSV file. You can see the value of your measures at the top level of the cube by calling cube.query():

[1]:
import atoti as tt

session = tt.create_session()
store = session.read_csv("data/example.csv", keys=["ID"], store_name="First store")
cube = session.create_cube(store, "FirstCube")
lvl = cube.levels
m = cube.measures
hier = cube.hierarchies
cube.query()
[1]:
Price.AVG Price.SUM Quantity.AVG Quantity.SUM contributors.COUNT
0 428.0 4280.0 2270.0 22700.0 10

SUM and AVG aggregations are automatically created for numeric columns.

Calling cube.query() will display the value of the measure at the top level. It’s also possible to specify levels to split the cube:

[2]:
cube.query(m["Quantity.SUM"], levels=[lvl["Continent"], lvl["Country"]])
[2]:
Quantity.SUM
Continent Country
Asia China 6000.0
India 3700.0
Europe France 7500.0
UK 5500.0

You can also filter on levels:

[3]:
cube.query(
    m["Quantity.SUM"],
    levels=[lvl["Continent"], lvl["Country"]],
    condition=(lvl["Country"] == "France"),
)
[3]:
Quantity.SUM
Continent Country
Europe France 7500.0

Aggregation functions

The available aggregation functions are:

[4]:
from inspect import getmembers, isfunction

[member[0] for member in getmembers(tt.agg) if isfunction(member[1])]
[4]:
['_agg',
 '_count',
 '_vector',
 'array_percentile',
 'avg',
 'count_distinct',
 'long',
 'max',
 'median',
 'min',
 'percentile',
 'prod',
 'short',
 'single_value',
 'sqrt',
 'square_sum',
 'std',
 'stop',
 'sum',
 'variance']
[5]:
m["Quantity.MIN"] = tt.agg.min(store["Quantity"])
m["Quantity.MAX"] = tt.agg.max(store["Quantity"])
m["Quantity.LONG"] = tt.agg.long(store["Quantity"])
m["Quantity.SHORT"] = tt.agg.short(store["Quantity"])
cube.query(levels=[lvl["Country"]])
[5]:
Price.AVG Price.SUM Quantity.AVG Quantity.LONG Quantity.MAX Quantity.MIN Quantity.SHORT Quantity.SUM contributors.COUNT
Country
China 380.0 760.0 3000.0 6000.0 4000.0 2000.0 0.0 6000.0 2
France 450.0 1800.0 1875.0 7500.0 3000.0 1000.0 0.0 7500.0 4
India 380.0 760.0 1850.0 3700.0 2200.0 1500.0 0.0 3700.0 2
UK 480.0 960.0 2750.0 5500.0 3000.0 2500.0 0.0 5500.0 2
[6]:
m["Quantity running sum"] = tt.agg.sum(
    m["Quantity.SUM"], scope=tt.scope.cumulative(lvl["ID"])
)
cube.query(m["Quantity running sum"], levels=[lvl["ID"]])
[6]:
Quantity running sum
ID
1 1000.0
2 3000.0
3 6000.0
4 7500.0
5 10500.0
6 13000.0
7 15000.0
8 19000.0
9 21200.0
10 22700.0
[7]:
m["Distinct City"] = tt.agg.count_distinct(store["City"])
[8]:
m["Percentile 95"] = tt.agg.percentile(store["Quantity"], percentile_value=0.95)
m["Percentile 20"] = tt.agg.percentile(store["Quantity"], percentile_value=0.20)
cube.query(m["Percentile 95"], m["Percentile 20"], levels=[lvl["Continent"]])
[8]:
Percentile 95 Percentile 20
Continent
Asia 3730.0 1800.0
Europe 3000.0 1500.0

Statistics aggregations

[9]:
m["Quantity variance"] = tt.agg.variance(store["Quantity"], mode="sample")
m["Quantity standard deviation"] = tt.agg.std(store["Quantity"], mode="sample")
cube.query(m["Quantity variance"], m["Quantity standard deviation"])
[9]:
Quantity variance Quantity standard deviation
0 784555.555556 885.751407

Delete a measure

Any measure can be removed from the cube using del

[10]:
"Quantity running sum" in m
[10]:
True
[11]:
del m["Quantity running sum"]
"Quantity running sum" in m
[11]:
False

Different aggregations per level

It’s possible to aggregate a column differently depending on whether we are above or below a certain level.

For example, you can sum a quantity for each country and then take the average over all the countries:

[12]:
m["Average per country"] = tt.agg.avg(
    m["Quantity.SUM"], scope=tt.scope.origin("Country")
)
cube.query(m["Average per country"], levels=[lvl["Country"]])
[12]:
Average per country
Country
China 6000.0
France 7500.0
India 3700.0
UK 5500.0
[13]:
cube.query(m["Average per country"])
[13]:
Average per country
0 5675.0

Calculated measures

Measures can be combined together to build more complex measures:

[14]:
m["Turnover"] = tt.agg.sum(m["Quantity.SUM"] * store["Price"])
cube.query(levels=[lvl["Country"]])
[14]:
Average per country Distinct City Percentile 20 Percentile 95 Price.AVG Price.SUM Quantity standard deviation Quantity variance Quantity.AVG Quantity.LONG Quantity.MAX Quantity.MIN Quantity.SHORT Quantity.SUM Turnover contributors.COUNT
Country
China 6000.0 2 2400.0 3900.0 380.0 760.0 1414.213562 2.000000e+06 3000.0 6000.0 4000.0 2000.0 0.0 6000.0 2220000.0 2
France 7500.0 3 1300.0 2850.0 450.0 1800.0 853.912564 7.291667e+05 1875.0 7500.0 3000.0 1000.0 0.0 7500.0 3280000.0 4
India 3700.0 2 1640.0 2165.0 380.0 760.0 494.974747 2.450000e+05 1850.0 3700.0 2200.0 1500.0 0.0 3700.0 1392000.0 2
UK 5500.0 1 2600.0 2975.0 480.0 960.0 353.553391 1.250000e+05 2750.0 5500.0 3000.0 2500.0 0.0 5500.0 2630000.0 2

Multidimensional Functions

Parent Value

[15]:
# Create multi-level hierarchy :
lvl = cube.levels
cube.hierarchies["Geography"] = {
    "continent": lvl["Continent"],
    "country": lvl["Country"],
    "city": lvl["City"],
}
[16]:
help(tt.parent_value)
Help on function parent_value in module atoti._functions.multidimensional:

parent_value(measure: 'MeasureOrName', on_hierarchies: 'Optional[Union[HierarchyOrName, List[HierarchyOrName]]]' = None, top_value: 'Union[object, Measure]' = None)
    Create the a parent value measure.

    Args:
        measure: the measure to take the parent value of.
        on_hierarchies: The hierarchies to take the parent on.
        top_value: The value to take at the top level, either a measure or a constant.
    Returns:
        The parent value measure.

[17]:
m["Parent quantity"] = tt.parent_value(
    m["Quantity.SUM"], on_hierarchies=hier["Geography"]
)

We can check that the value on countries is equal to the Quantity.SUM on their continent :

[18]:
cube.query(m["Parent quantity"], levels=[lvl["continent"], lvl["country"]])
[18]:
Parent quantity
continent country
Asia China 9700.0
India 9700.0
Europe France 13000.0
UK 13000.0
[19]:
cube.query(m["Quantity.SUM"], levels=[lvl["continent"]])
[19]:
Quantity.SUM
continent
Asia 9700.0
Europe 13000.0

Aggregate siblings

Siblings are all the children of the same member. For instance France and UK are children of Europe so France and UK are siblings.

[20]:
m["Sum siblings"] = tt.agg.sum(
    m["Quantity.SUM"], scope=tt.scope.siblings(hier["Geography"])
)
cube.query(m["Sum siblings"], levels=[lvl["country"], lvl["city"]])
[20]:
Sum siblings
continent country city
Asia China Beijing 6000.0
HongKong 6000.0
India Dehli 3700.0
Mumbai 3700.0
Europe France Bordeaux 7500.0
Lyon 7500.0
Paris 7500.0
UK London 5500.0
[21]:
m["Average siblings"] = tt.agg.avg(
    m["Quantity.SUM"], scope=tt.scope.siblings(hier["Geography"])
)
cube.query(m["Average siblings"], levels=[lvl["country"], lvl["city"]])
[21]:
Average siblings
continent country city
Asia China Beijing 3000.0
HongKong 3000.0
India Dehli 1850.0
Mumbai 1850.0
Europe France Bordeaux 2500.0
Lyon 2500.0
Paris 2500.0
UK London 5500.0

Shift

shift allows to get the value of previous or next members on a level

[22]:
m["next quantity"] = tt.shift(m["Quantity.SUM"], on=lvl["Date"], period=1)
cube.query(m["next quantity"], levels=[lvl["Date"]])
[22]:
next quantity
Date
2018-01-01 8000.0
2019-01-01 4000.0
2019-01-02 7000.0
[23]:
cube.query(m["Quantity.SUM"], levels=[lvl["Date"]])
[23]:
Quantity.SUM
Date
2018-01-01 3700.0
2019-01-01 8000.0
2019-01-02 4000.0
2019-01-05 7000.0

Filter

filter allows you to only express the measure where certain conditions are met.
You can only use conditions on levels with filter. If you want to use conditions involving measures you should use where.

You can combine conditions with the & bitwise operator.

[24]:
m["Filtered Quantity.SUM"] = tt.filter(
    m["Quantity.SUM"], (lvl["City"] == "Paris") & (lvl["Color"] == "red")
)
cube.query(
    m["Quantity.SUM"], m["Filtered Quantity.SUM"], levels=[lvl["Country"], lvl["Color"]]
)
[24]:
Quantity.SUM Filtered Quantity.SUM
Country Color
China blue 2000.0 NaN
green 4000.0 NaN
France blue 4500.0 NaN
red 3000.0 1000.0
India blue 1500.0 NaN
red 2200.0 NaN
UK green 3000.0 NaN
red 2500.0 NaN

Where

where defines an if-then-else measure whose value depends on certain conditions. It is similar to numpy’s where function.

[25]:
# Label large amounts
m["Large quantities"] = tt.where(m["Quantity.SUM"] >= 3000, "large", "small")

# Double up UK's quantities
m["Double UK"] = tt.where(
    lvl["Country"] == "UK", 2 * m["Quantity.SUM"], m["Quantity.SUM"]
)

cube.query(
    m["Quantity.SUM"],
    m["Large quantities"],
    m["Double UK"],
    levels=[lvl["Country"], lvl["Color"]],
)
[25]:
Quantity.SUM Large quantities Double UK
Country Color
China blue 2000.0 small 2000.0
green 4000.0 large 4000.0
France blue 4500.0 large 4500.0
red 3000.0 large 3000.0
India blue 1500.0 small 1500.0
red 2200.0 small 2200.0
UK green 3000.0 large 6000.0
red 2500.0 small 5000.0

At

at takes the value of the given measure shifted on one of the level to the given value.

The value can be a literal value or the value of another level

[26]:
m["Blue Quantity"] = tt.at(m["Quantity.SUM"], {lvl["Color"]: "blue"})
cube.query(m["Quantity.SUM"], m["Blue Quantity"], levels=[lvl["Color"]])
[26]:
Quantity.SUM Blue Quantity
Color
blue 8000.0 8000.0
green 7000.0 8000.0
red 7700.0 8000.0

The measure is null if the level is not expressed

[27]:
cube.query(m["Quantity.SUM"], m["Blue Quantity"])
[27]:
Quantity.SUM Blue Quantity
0 22700.0 None

In a multilevel hierarchy, only the shifted level is changed. For instance if we shift the country to “France”, [Europe, UK] is shifted to [Europe,France] and [Asia,China] is shifted to [Asia,France] which does not exist.

[28]:
m["France Quantity Fail"] = tt.at(m["Quantity.SUM"], {lvl["country"]: "France"})
cube.query(
    m["Quantity.SUM"],
    m["France Quantity Fail"],
    levels=[lvl["continent"], lvl["country"]],
)
[28]:
Quantity.SUM France Quantity Fail
continent country
Asia China 6000.0 NaN
India 3700.0 NaN
Europe France 7500.0 7500.0
UK 5500.0 7500.0

The expected value can be obtained by shifting both levels to the requested values as follow.

[29]:
m["France Quantity"] = tt.at(
    m["Quantity.SUM"], {lvl["country"]: "France", lvl["continent"]: "Europe"}
)
cube.query(
    m["Quantity.SUM"], m["France Quantity"], levels=[lvl["continent"], lvl["country"]]
)
[29]:
Quantity.SUM France Quantity
continent country
Asia China 6000.0 7500.0
India 3700.0 7500.0
Europe France 7500.0 7500.0
UK 5500.0 7500.0

Other Functions

Mathematical functions

round, floor, and ceil: closest, lower, and upper rounding of double values

[30]:
m["round"] = tt.round(m["Quantity.SUM"] / 1000) * 1000
m["floor"] = tt.floor(m["Quantity.SUM"] / 1000) * 1000
m["ceil"] = tt.ceil(m["Quantity.SUM"] / 1000) * 1000
cube.query(
    m["Quantity.SUM"], m["floor"], m["ceil"], m["round"], levels=[lvl["Country"]]
)
[30]:
Quantity.SUM floor ceil round
Country
China 6000.0 6000 6000 6000
France 7500.0 7000 8000 8000
India 3700.0 3000 4000 4000
UK 5500.0 5000 6000 6000

exp: Exponential is the Euler’s number e raised to the power of a double value.

[31]:
m["Zero"] = 0.0
m["One"] = tt.exp(m["Zero"])

log: the natural (base e) logarithm

[32]:
m["Log Quantity"] = tt.log(m["Quantity.SUM"])
cube.query(m["Log Quantity"], levels=[lvl["Country"]])
[32]:
Log Quantity
Country
China 8.699515
France 8.922658
India 8.216088
UK 8.612503

log10: the base 10 logarithm

[33]:
m["Log10 Quantity"] = tt.log10(m["Quantity.SUM"])
cube.query(m["Log10 Quantity"], levels=[lvl["Country"]])
[33]:
Log10 Quantity
Country
China 3.778151
France 3.875061
India 3.568202
UK 3.740363

abs: the absolute value of a given measure

[34]:
m["Negative"] = -1 * m["Quantity.SUM"]
m["abs"] = tt.abs(m["Negative"])
cube.query(m["abs"])
[34]:
abs
0 22700.0

sin: the sinus value of a given measure

[35]:
m["Sinus Quantity"] = tt.sin(m["Quantity.SUM"])
cube.query(m["Sinus Quantity"], levels=[lvl["Country"]])
[35]:
Sinus Quantity
Country
China -0.427720
France -0.851236
India -0.714666
UK 0.800864

cos: the cosinus value of a given measure

[36]:
m["Cosinus Quantity"] = tt.cos(m["Quantity.SUM"])
cube.query(m["Cosinus Quantity"], levels=[lvl["Country"]])
[36]:
Cosinus Quantity
Country
China 0.903912
France -0.524783
India 0.699466
UK -0.598846

tan: the tangent value of a given measure

[37]:
m["Tangent Quantity"] = tt.tan(m["Quantity.SUM"])
cube.query(m["Tangent Quantity"], levels=[lvl["Country"]])
[37]:
Tangent Quantity
Country
China -0.473187
France 1.622071
India -1.021730
UK -1.337344

pow: the calculated power value of a given measure

[38]:
m["Power int Quantity"] = tt.pow(m["Quantity.SUM"], 2)
cube.query(m["Power int Quantity"], levels=[lvl["Country"]])
[38]:
Power int Quantity
Country
China 36000000.0
France 56250000.0
India 13690000.0
UK 30250000.0
[39]:
m["Power float Quantity"] = tt.pow(m["Quantity.SUM"], 0.5)
cube.query(m["Power float Quantity"], levels=[lvl["Country"]])
[39]:
Power float Quantity
Country
China 77.459667
France 86.602540
India 60.827625
UK 74.161985
[40]:
m["Power measure Quantity"] = tt.pow(m["Quantity.SUM"], m["Quantity.SUM"])
cube.query(m["Power measure Quantity"], levels=[lvl["Country"]])
[40]:
Power measure Quantity
Country
China Infinity
France Infinity
India Infinity
UK Infinity
[41]:
m["Power neg Quantity"] = tt.pow(m["Quantity.SUM"], -3)
cube.query(m["Power neg Quantity"], levels=[lvl["Country"]])
[41]:
Power neg Quantity
Country
China 4.629630e-12
France 2.370370e-12
India 1.974217e-11
UK 6.010518e-12

sqrt: the square root value of a given measure

[42]:
m["Sqrt Quantity"] = tt.sqrt(m["Quantity.SUM"])
cube.query(m["Sqrt Quantity"], levels=[lvl["Country"]])
[42]:
Sqrt Quantity
Country
China 77.459667
France 86.602540
India 60.827625
UK 74.161985

Time functions

date_diff is the number of days between 2 dates:

[43]:
from datetime import datetime

lvl = cube.levels
m["Days from today"] = tt.date_diff(datetime.now().date(), lvl["Date"])
cube.query(m["Days from today"], levels=[lvl["Date"]])
[43]:
Days from today
Date
2018-01-01 -830
2019-01-01 -465
2019-01-02 -464
2019-01-05 -461

Measure Folder

Measures can be put into folders to group them in the UI.

[44]:
for name in [
    "Quantity.SUM",
    "Quantity.AVG",
    "Quantity.LONG",
    "Quantity.MAX",
    "Quantity.MIN",
    "Quantity.SHORT",
]:
    m[name].folder = "Quantity"
[45]:
m["Quantity.SUM"].folder
[45]:
'Quantity'

Measure formatter

Measures have formatters describing how to represent them as strings. These formatters can be modified in Python or directly in the UI. Here are a few examples of what is possible, more documentation on how to write an MDX formatter can be found on docs.microsoft.com.

[46]:
# Convert to thousands
m["Quantity.SUM"].formatter = "DOUBLE[#,###,K]"

# Add a unit
m["Price.SUM"].formatter = "DOUBLE[#,###.00€]"

# Add more decimal figures
m["Quantity.AVG"].formatter = "DOUBLE[#,###.0000]"

# Display as percent
m["Percent of parent"] = m["Quantity.SUM"] / m["Parent quantity"]
m["Percent of parent"].formatter = "DOUBLE[#,###.00%]"

Launcher