Measures¶
Default measures¶
Let’s quickly create a cube from a CSV file. You can see the value of your measures at the top level of the cube by calling cube.query()
:
[1]:
import atoti as tt
session = tt.create_session()
store = session.read_csv("data/example.csv", keys=["ID"], store_name="First store")
cube = session.create_cube(store, "FirstCube")
lvl = cube.levels
m = cube.measures
hier = cube.hierarchies
cube.query()
[1]:
Price.AVG | Price.SUM | Quantity.AVG | Quantity.SUM | contributors.COUNT | |
---|---|---|---|---|---|
0 | 428.0 | 4280.0 | 2270.0 | 22700.0 | 10 |
SUM
and AVG
aggregations are automatically created for numeric columns.
Calling cube.query()
will display the value of the measure at the top level. It’s also possible to specify levels to split the cube:
[2]:
cube.query(m["Quantity.SUM"], levels=[lvl["Continent"], lvl["Country"]])
[2]:
Quantity.SUM | ||
---|---|---|
Continent | Country | |
Asia | China | 6000.0 |
India | 3700.0 | |
Europe | France | 7500.0 |
UK | 5500.0 |
You can also filter on levels:
[3]:
cube.query(
m["Quantity.SUM"],
levels=[lvl["Continent"], lvl["Country"]],
condition=(lvl["Country"] == "France"),
)
[3]:
Quantity.SUM | ||
---|---|---|
Continent | Country | |
Europe | France | 7500.0 |
Aggregation functions¶
The available aggregation functions are:
[4]:
from inspect import getmembers, isfunction
[member[0] for member in getmembers(tt.agg) if isfunction(member[1])]
[4]:
['_agg',
'_count',
'_vector',
'array_percentile',
'avg',
'count_distinct',
'long',
'max',
'median',
'min',
'percentile',
'prod',
'short',
'single_value',
'sqrt',
'square_sum',
'std',
'stop',
'sum',
'variance']
[5]:
m["Quantity.MIN"] = tt.agg.min(store["Quantity"])
m["Quantity.MAX"] = tt.agg.max(store["Quantity"])
m["Quantity.LONG"] = tt.agg.long(store["Quantity"])
m["Quantity.SHORT"] = tt.agg.short(store["Quantity"])
cube.query(levels=[lvl["Country"]])
[5]:
Price.AVG | Price.SUM | Quantity.AVG | Quantity.LONG | Quantity.MAX | Quantity.MIN | Quantity.SHORT | Quantity.SUM | contributors.COUNT | |
---|---|---|---|---|---|---|---|---|---|
Country | |||||||||
China | 380.0 | 760.0 | 3000.0 | 6000.0 | 4000.0 | 2000.0 | 0.0 | 6000.0 | 2 |
France | 450.0 | 1800.0 | 1875.0 | 7500.0 | 3000.0 | 1000.0 | 0.0 | 7500.0 | 4 |
India | 380.0 | 760.0 | 1850.0 | 3700.0 | 2200.0 | 1500.0 | 0.0 | 3700.0 | 2 |
UK | 480.0 | 960.0 | 2750.0 | 5500.0 | 3000.0 | 2500.0 | 0.0 | 5500.0 | 2 |
[6]:
m["Quantity running sum"] = tt.agg.sum(
m["Quantity.SUM"], scope=tt.scope.cumulative(hier["ID"])
)
cube.query(m["Quantity running sum"], levels=[lvl["ID"]])
[6]:
Quantity running sum | |
---|---|
ID | |
1 | 1000.0 |
2 | 3000.0 |
3 | 6000.0 |
4 | 7500.0 |
5 | 10500.0 |
6 | 13000.0 |
7 | 15000.0 |
8 | 19000.0 |
9 | 21200.0 |
10 | 22700.0 |
[7]:
m["Distinct City"] = tt.agg.count_distinct(store["City"])
[8]:
m["Percentile 95"] = tt.agg.percentile(store["Quantity"], percentile_value=0.95)
m["Percentile 20"] = tt.agg.percentile(store["Quantity"], percentile_value=0.20)
cube.query(m["Percentile 95"], m["Percentile 20"], levels=[lvl["Continent"]])
[8]:
Percentile 95 | Percentile 20 | |
---|---|---|
Continent | ||
Asia | 3730.0 | 1800.0 |
Europe | 3000.0 | 1500.0 |
Statistics aggregations
[9]:
m["Quantity variance"] = tt.agg.variance(store["Quantity"], mode="sample")
m["Quantity standard deviation"] = tt.agg.std(store["Quantity"], mode="sample")
cube.query(m["Quantity variance"], m["Quantity standard deviation"])
[9]:
Quantity variance | Quantity standard deviation | |
---|---|---|
0 | 784555.555556 | 885.751407 |
Delete a measure¶
Any measure can be removed from the cube using del
[10]:
"Quantity running sum" in m
[10]:
True
[11]:
del m["Quantity running sum"]
"Quantity running sum" in m
[11]:
False
Different aggregations per level¶
It’s possible to aggregate a column differently depending on whether we are above or below a certain level.
For example, you can sum a quantity for each country and then take the average over all the countries:
[12]:
m["Average per country"] = tt.agg.avg(
m["Quantity.SUM"], scope=tt.scope.origin("Country")
)
cube.query(m["Average per country"], levels=[lvl["Country"]])
[12]:
Average per country | |
---|---|
Country | |
China | 6000.0 |
France | 7500.0 |
India | 3700.0 |
UK | 5500.0 |
[13]:
cube.query(m["Average per country"])
[13]:
Average per country | |
---|---|
0 | 5675.0 |
Calculated measures¶
Measures can be combined together to build more complex measures:
[14]:
m["Turnover"] = tt.agg.sum(m["Quantity.SUM"] * store["Price"])
cube.query(levels=[lvl["Country"]])
[14]:
Average per country | Distinct City | Percentile 20 | Percentile 95 | Price.AVG | Price.SUM | Quantity standard deviation | Quantity variance | Quantity.AVG | Quantity.LONG | Quantity.MAX | Quantity.MIN | Quantity.SHORT | Quantity.SUM | Turnover | contributors.COUNT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | ||||||||||||||||
China | 6000.0 | 2 | 2400.0 | 3900.0 | 380.0 | 760.0 | 1414.213562 | 2.000000e+06 | 3000.0 | 6000.0 | 4000.0 | 2000.0 | 0.0 | 6000.0 | 2220000.0 | 2 |
France | 7500.0 | 3 | 1300.0 | 2850.0 | 450.0 | 1800.0 | 853.912564 | 7.291667e+05 | 1875.0 | 7500.0 | 3000.0 | 1000.0 | 0.0 | 7500.0 | 3280000.0 | 4 |
India | 3700.0 | 2 | 1640.0 | 2165.0 | 380.0 | 760.0 | 494.974747 | 2.450000e+05 | 1850.0 | 3700.0 | 2200.0 | 1500.0 | 0.0 | 3700.0 | 1392000.0 | 2 |
UK | 5500.0 | 1 | 2600.0 | 2975.0 | 480.0 | 960.0 | 353.553391 | 1.250000e+05 | 2750.0 | 5500.0 | 3000.0 | 2500.0 | 0.0 | 5500.0 | 2630000.0 | 2 |
Multidimensional Functions¶
Parent Value¶
[15]:
# Create multi-level hierarchy :
lvl = cube.levels
cube.hierarchies["Geography"] = {
"continent": lvl["Continent"],
"country": lvl["Country"],
"city": lvl["City"],
}
[16]:
help(tt.parent_value)
Help on function parent_value in module atoti._functions.multidimensional:
parent_value(measure: 'MeasureOrName', on_hierarchies: 'Optional[Union[HierarchyOrName, List[HierarchyOrName]]]' = None, top_value: 'Union[object, Measure]' = None)
Create the a parent value measure.
Args:
measure: the measure to take the parent value of.
on_hierarchies: The hierarchies to take the parent on.
top_value: The value to take at the top level, either a measure or a constant.
Returns:
The parent value measure.
[17]:
m["Parent quantity"] = tt.parent_value(
m["Quantity.SUM"], on_hierarchies=hier["Geography"]
)
We can check that the value on countries is equal to the Quantity.SUM
on their continent :
[18]:
cube.query(m["Parent quantity"], levels=[lvl["continent"], lvl["country"]])
[18]:
Parent quantity | ||
---|---|---|
continent | country | |
Asia | China | 9700.0 |
India | 9700.0 | |
Europe | France | 13000.0 |
UK | 13000.0 |
[19]:
cube.query(m["Quantity.SUM"], levels=[lvl["continent"]])
[19]:
Quantity.SUM | |
---|---|
continent | |
Asia | 9700.0 |
Europe | 13000.0 |
Aggregate siblings¶
Siblings are all the children of the same member. For instance France and UK are children of Europe so France and UK are siblings.
[20]:
m["Sum siblings"] = tt.agg.sum(
m["Quantity.SUM"], scope=tt.scope.siblings(hier["Geography"])
)
cube.query(m["Sum siblings"], levels=[lvl["country"], lvl["city"]])
[20]:
Sum siblings | |||
---|---|---|---|
continent | country | city | |
Asia | China | Beijing | 6000.0 |
HongKong | 6000.0 | ||
India | Dehli | 3700.0 | |
Mumbai | 3700.0 | ||
Europe | France | Bordeaux | 7500.0 |
Lyon | 7500.0 | ||
Paris | 7500.0 | ||
UK | London | 5500.0 |
[21]:
m["Average siblings"] = tt.agg.avg(
m["Quantity.SUM"], scope=tt.scope.siblings(hier["Geography"])
)
cube.query(m["Average siblings"], levels=[lvl["country"], lvl["city"]])
[21]:
Average siblings | |||
---|---|---|---|
continent | country | city | |
Asia | China | Beijing | 3000.0 |
HongKong | 3000.0 | ||
India | Dehli | 1850.0 | |
Mumbai | 1850.0 | ||
Europe | France | Bordeaux | 2500.0 |
Lyon | 2500.0 | ||
Paris | 2500.0 | ||
UK | London | 5500.0 |
Shift¶
shift
allows to get the value of previous or next members on a level
[22]:
m["next quantity"] = tt.shift(m["Quantity.SUM"], on=lvl["Date"], period=1)
cube.query(m["next quantity"], levels=[lvl["Date"]])
[22]:
next quantity | |
---|---|
Date | |
2018-01-01 | 8000.0 |
2019-01-01 | 4000.0 |
2019-01-02 | 7000.0 |
[23]:
cube.query(m["Quantity.SUM"], levels=[lvl["Date"]])
[23]:
Quantity.SUM | |
---|---|
Date | |
2018-01-01 | 3700.0 |
2019-01-01 | 8000.0 |
2019-01-02 | 4000.0 |
2019-01-05 | 7000.0 |
Filter¶
filter
allows you to only express the measure where certain conditions are met.filter
. If you want to use conditions involving measures you should use where
.You can combine conditions with the &
bitwise operator.
[24]:
m["Filtered Quantity.SUM"] = tt.filter(
m["Quantity.SUM"], (lvl["City"] == "Paris") & (lvl["Color"] == "red")
)
cube.query(
m["Quantity.SUM"], m["Filtered Quantity.SUM"], levels=[lvl["Country"], lvl["Color"]]
)
[24]:
Quantity.SUM | Filtered Quantity.SUM | ||
---|---|---|---|
Country | Color | ||
China | blue | 2000.0 | NaN |
green | 4000.0 | NaN | |
France | blue | 4500.0 | NaN |
red | 3000.0 | 1000.0 | |
India | blue | 1500.0 | NaN |
red | 2200.0 | NaN | |
UK | green | 3000.0 | NaN |
red | 2500.0 | NaN |
Where¶
where
defines an if-then-else measure whose value depends on certain conditions. It is similar to numpy’s where function.
[25]:
# Label large amounts
m["Large quantities"] = tt.where(m["Quantity.SUM"] >= 3000, "large", "small")
# Double up UK's quantities
m["Double UK"] = tt.where(
lvl["Country"] == "UK", 2 * m["Quantity.SUM"], m["Quantity.SUM"]
)
cube.query(
m["Quantity.SUM"],
m["Large quantities"],
m["Double UK"],
levels=[lvl["Country"], lvl["Color"]],
)
[25]:
Quantity.SUM | Large quantities | Double UK | ||
---|---|---|---|---|
Country | Color | |||
China | blue | 2000.0 | small | 2000.0 |
green | 4000.0 | large | 4000.0 | |
France | blue | 4500.0 | large | 4500.0 |
red | 3000.0 | large | 3000.0 | |
India | blue | 1500.0 | small | 1500.0 |
red | 2200.0 | small | 2200.0 | |
UK | green | 3000.0 | large | 6000.0 |
red | 2500.0 | small | 5000.0 |
At¶
at
takes the value of the given measure shifted on one of the level to the given value.
The value can be a literal value or the value of another level
[26]:
m["Blue Quantity"] = tt.at(m["Quantity.SUM"], {lvl["Color"]: "blue"})
cube.query(m["Quantity.SUM"], m["Blue Quantity"], levels=[lvl["Color"]])
[26]:
Quantity.SUM | Blue Quantity | |
---|---|---|
Color | ||
blue | 8000.0 | 8000.0 |
green | 7000.0 | 8000.0 |
red | 7700.0 | 8000.0 |
The measure is null if the level is not expressed
[27]:
cube.query(m["Quantity.SUM"], m["Blue Quantity"])
[27]:
Quantity.SUM | Blue Quantity | |
---|---|---|
0 | 22700.0 | None |
In a multilevel hierarchy, only the shifted level is changed. For instance if we shift the country to “France”, [Europe, UK] is shifted to [Europe,France] and [Asia,China] is shifted to [Asia,France] which does not exist.
[28]:
m["France Quantity Fail"] = tt.at(m["Quantity.SUM"], {lvl["country"]: "France"})
cube.query(
m["Quantity.SUM"],
m["France Quantity Fail"],
levels=[lvl["continent"], lvl["country"]],
)
[28]:
Quantity.SUM | France Quantity Fail | ||
---|---|---|---|
continent | country | ||
Asia | China | 6000.0 | NaN |
India | 3700.0 | NaN | |
Europe | France | 7500.0 | 7500.0 |
UK | 5500.0 | 7500.0 |
The expected value can be obtained by shifting both levels to the requested values as follow.
[29]:
m["France Quantity"] = tt.at(
m["Quantity.SUM"], {lvl["country"]: "France", lvl["continent"]: "Europe"}
)
cube.query(
m["Quantity.SUM"], m["France Quantity"], levels=[lvl["continent"], lvl["country"]]
)
[29]:
Quantity.SUM | France Quantity | ||
---|---|---|---|
continent | country | ||
Asia | China | 6000.0 | 7500.0 |
India | 3700.0 | 7500.0 | |
Europe | France | 7500.0 | 7500.0 |
UK | 5500.0 | 7500.0 |
Other Functions¶
Mathematical functions¶
round
, floor
, and ceil
: closest, lower, and upper rounding of double values
[30]:
m["round"] = tt.round(m["Quantity.SUM"] / 1000) * 1000
m["floor"] = tt.floor(m["Quantity.SUM"] / 1000) * 1000
m["ceil"] = tt.ceil(m["Quantity.SUM"] / 1000) * 1000
cube.query(
m["Quantity.SUM"], m["floor"], m["ceil"], m["round"], levels=[lvl["Country"]]
)
[30]:
Quantity.SUM | floor | ceil | round | |
---|---|---|---|---|
Country | ||||
China | 6000.0 | 6000 | 6000 | 6000 |
France | 7500.0 | 7000 | 8000 | 8000 |
India | 3700.0 | 3000 | 4000 | 4000 |
UK | 5500.0 | 5000 | 6000 | 6000 |
exp
: Exponential is the Euler’s number e
raised to the power of a double value.
[31]:
m["Zero"] = 0.0
m["One"] = tt.exp(m["Zero"])
log
: the natural (base e
) logarithm
[32]:
m["Log Quantity"] = tt.log(m["Quantity.SUM"])
cube.query(m["Log Quantity"], levels=[lvl["Country"]])
[32]:
Log Quantity | |
---|---|
Country | |
China | 8.699515 |
France | 8.922658 |
India | 8.216088 |
UK | 8.612503 |
log10
: the base 10 logarithm
[33]:
m["Log10 Quantity"] = tt.log10(m["Quantity.SUM"])
cube.query(m["Log10 Quantity"], levels=[lvl["Country"]])
[33]:
Log10 Quantity | |
---|---|
Country | |
China | 3.778151 |
France | 3.875061 |
India | 3.568202 |
UK | 3.740363 |
abs
: the absolute value of a given measure
[34]:
m["Negative"] = -1 * m["Quantity.SUM"]
m["abs"] = tt.abs(m["Negative"])
cube.query(m["abs"])
[34]:
abs | |
---|---|
0 | 22700.0 |
sin
: the sinus value of a given measure
[35]:
m["Sinus Quantity"] = tt.sin(m["Quantity.SUM"])
cube.query(m["Sinus Quantity"], levels=[lvl["Country"]])
[35]:
Sinus Quantity | |
---|---|
Country | |
China | -0.427720 |
France | -0.851236 |
India | -0.714666 |
UK | 0.800864 |
cos
: the cosinus value of a given measure
[36]:
m["Cosinus Quantity"] = tt.cos(m["Quantity.SUM"])
cube.query(m["Cosinus Quantity"], levels=[lvl["Country"]])
[36]:
Cosinus Quantity | |
---|---|
Country | |
China | 0.903912 |
France | -0.524783 |
India | 0.699466 |
UK | -0.598846 |
tan
: the tangent value of a given measure
[37]:
m["Tangent Quantity"] = tt.tan(m["Quantity.SUM"])
cube.query(m["Tangent Quantity"], levels=[lvl["Country"]])
[37]:
Tangent Quantity | |
---|---|
Country | |
China | -0.473187 |
France | 1.622071 |
India | -1.021730 |
UK | -1.337344 |
pow
: the calculated power value of a given measure
[38]:
m["Power int Quantity"] = tt.pow(m["Quantity.SUM"], 2)
cube.query(m["Power int Quantity"], levels=[lvl["Country"]])
[38]:
Power int Quantity | |
---|---|
Country | |
China | 36000000.0 |
France | 56250000.0 |
India | 13690000.0 |
UK | 30250000.0 |
[39]:
m["Power float Quantity"] = tt.pow(m["Quantity.SUM"], 0.5)
cube.query(m["Power float Quantity"], levels=[lvl["Country"]])
[39]:
Power float Quantity | |
---|---|
Country | |
China | 77.459667 |
France | 86.602540 |
India | 60.827625 |
UK | 74.161985 |
[40]:
m["Power measure Quantity"] = tt.pow(m["Quantity.SUM"], m["Quantity.SUM"])
cube.query(m["Power measure Quantity"], levels=[lvl["Country"]])
[40]:
Power measure Quantity | |
---|---|
Country | |
China | Infinity |
France | Infinity |
India | Infinity |
UK | Infinity |
[41]:
m["Power neg Quantity"] = tt.pow(m["Quantity.SUM"], -3)
cube.query(m["Power neg Quantity"], levels=[lvl["Country"]])
[41]:
Power neg Quantity | |
---|---|
Country | |
China | 4.629630e-12 |
France | 2.370370e-12 |
India | 1.974217e-11 |
UK | 6.010518e-12 |
sqrt
: the square root value of a given measure
[42]:
m["Sqrt Quantity"] = tt.sqrt(m["Quantity.SUM"])
cube.query(m["Sqrt Quantity"], levels=[lvl["Country"]])
[42]:
Sqrt Quantity | |
---|---|
Country | |
China | 77.459667 |
France | 86.602540 |
India | 60.827625 |
UK | 74.161985 |
Time functions¶
date_diff
is the number of days between 2 dates:
[43]:
from datetime import datetime
lvl = cube.levels
m["Days from today"] = tt.date_diff(datetime.now().date(), lvl["Date"])
cube.query(m["Days from today"], levels=[lvl["Date"]])
[43]:
Days from today | |
---|---|
Date | |
2018-01-01 | -822 |
2019-01-01 | -457 |
2019-01-02 | -456 |
2019-01-05 | -453 |
Measure Folder¶
Measures can be put into folders to group them in the UI.
[44]:
for name in [
"Quantity.SUM",
"Quantity.AVG",
"Quantity.LONG",
"Quantity.MAX",
"Quantity.MIN",
"Quantity.SHORT",
]:
m[name].folder = "Quantity"
[45]:
m["Quantity.SUM"].folder
[45]:
'Quantity'
Measure formatter¶
Measures have formatters describing how to represent them as strings. These formatters can be modified in Python or directly in the UI. Here are a few examples of what is possible, more documentation on how to write an MDX formatter can be found on docs.microsoft.com.
[46]:
# Convert to thousands
m["Quantity.SUM"].formatter = "DOUBLE[#,###,K]"
# Add a unit
m["Price.SUM"].formatter = "DOUBLE[#,###.00€]"
# Add more decimal figures
m["Quantity.AVG"].formatter = "DOUBLE[#,###.0000]"
# Display as percent
m["Percent of parent"] = m["Quantity.SUM"] / m["Parent quantity"]
m["Percent of parent"].formatter = "DOUBLE[#,###.00%]"