atoti.cube module#
- class atoti.cube.CubeContext#
Manage the shared context of a cube.
- class atoti.Cube#
Cube of a
atoti.Session
.- property aggregate_providers: MutableMapping[str, AggregateProvider]#
- property aggregates_cache: AggregatesCache#
Aggregates cache of the cube.
- create_date_hierarchy(name, *, column, levels={'Year': 'y', 'Month': 'M', 'Day': 'd'})#
Create a multilevel date hierarchy based on a date column.
The new levels are created by matching a date pattern. Here is a non-exhaustive list of patterns that can be used:
Pattern
Description
Type
Examples
y
Year
Integer
2001, 2005, 2020
yyyy
4-digits year
String
"2001", "2005", "2020"
M
Month of the year (1 based)
Integer
1, 5, 12
MM
2-digits month
String
"01", "05", "12"
d
Day of the month
Integer
1, 15, 30
dd
2-digits day of the month
String
"01", "15", "30"
w
Week number
Integer
1, 12, 51
Q
Quarter
Integer
1, 2, 3, 4
QQQ
Quarter prefixed with Q
String
"Q1", "Q2", "Q3", "Q4"
H
Hour of day (0-23)
Integer
0, 12, 23
HH
2-digits hour of day
String
"00", "12", "23"
m
Minute of hour
Integer
0, 30, 59
mm
2-digits minute of hour
String
"00", "30", "59"
s
Second of minute
Integer
0, 5, 55
ss
2-digits second of minute
String
"00", "05", "55"
- Parameters
Example
>>> from datetime import date >>> df = pd.DataFrame( ... columns=["Date", "Quantity"], ... data=[ ... (date(2020, 1, 10), 150.0), ... (date(2020, 1, 20), 240.0), ... (date(2019, 3, 17), 270.0), ... (date(2019, 12, 12), 200.0), ... ], ... ) >>> table = session.read_pandas( ... df, keys=["Date"], table_name="create_date_hierarchy example" ... ) >>> cube = session.create_cube(table) >>> l, m = cube.levels, cube.measures >>> cube.create_date_hierarchy("Date parts", column=table["Date"]) >>> cube.query( ... m["Quantity.SUM"], ... include_totals=True, ... levels=[l["Year"], l["Month"], l["Day"]], ... ) Quantity.SUM Year Month Day Total 860.00 2019 470.00 3 270.00 17 270.00 12 200.00 12 200.00 2020 390.00 1 390.00 10 150.00 20 240.00
The full date can also be added back as the last level of the hierarchy:
>>> h = cube.hierarchies >>> h["Date parts"] = {**h["Date parts"].levels, "Date": table["Date"]} >>> cube.query( ... m["Quantity.SUM"], ... include_totals=True, ... levels=[l["Date parts", "Date"]], ... ) Quantity.SUM Year Month Day Date Total 860.00 2019 470.00 3 270.00 17 270.00 2019-03-17 270.00 12 200.00 12 200.00 2019-12-12 200.00 2020 390.00 1 390.00 10 150.00 2020-01-10 150.00 20 240.00 2020-01-20 240.00
Data inserted into the table after the hierarchy creation will be automatically hierarchized:
>>> table += (date(2021, 8, 30), 180.0) >>> cube.query( ... m["Quantity.SUM"], ... include_totals=True, ... levels=[l["Date parts", "Date"]], ... filter=l["Year"] == "2021", ... ) Quantity.SUM Year Month Day Date Total 180.00 2021 180.00 8 180.00 30 180.00 2021-08-30 180.00
- create_parameter_hierarchy_from_column(name, column)#
Create a single-level hierarchy which dynamically takes its members from a column.
- Parameters
Example
>>> df = pd.DataFrame( ... { ... "Seller": ["Seller_1", "Seller_1", "Seller_2", "Seller_2"], ... "ProductId": ["aBk3", "ceJ4", "aBk3", "ceJ4"], ... "Price": [2.5, 49.99, 3.0, 54.99], ... } ... ) >>> table = session.read_pandas(df, table_name="Seller") >>> cube = session.create_cube(table) >>> l, m = cube.levels, cube.measures >>> cube.create_parameter_hierarchy_from_column( ... "Competitor", table["Seller"] ... ) >>> m["Price"] = tt.agg.single_value(table["Price"]) >>> m["Competitor price"] = tt.at( ... m["Price"], l["Seller"] == l["Competitor"] ... ) >>> cube.query( ... m["Competitor price"], ... levels=[l["Seller"], l["ProductId"]], ... ) Competitor price Seller ProductId Seller_1 aBk3 2.50 ceJ4 49.99 Seller_2 aBk3 2.50 ceJ4 49.99 >>> cube.query( ... m["Competitor price"], ... levels=[l["Seller"], l["ProductId"]], ... filter=l["Competitor"] == "Seller_2", ... ) Competitor price Seller ProductId Seller_1 aBk3 3.00 ceJ4 54.99 Seller_2 aBk3 3.00 ceJ4 54.99
- create_parameter_hierarchy_from_members(name, members, *, data_type=None, index_measure_name=None)#
Create a single-level hierarchy with the given members.
It can be used as a parameter hierarchy in advanced analyzes.
- Parameters
name (str) – The name of hierarchy and its single level.
members (Collection[Any]) – The members of the hierarchy.
data_type (Optional[DataType]) – The type with which the members will be stored. Automatically inferred by default.
index_measure_name (Optional[str]) – The name of the indexing measure to create for this hierarchy, if any.
Example
>>> df = pd.DataFrame( ... { ... "Seller": ["Seller_1", "Seller_2", "Seller_3"], ... "Prices": [ ... [2.5, 49.99, 3.0, 54.99], ... [2.6, 50.99, 2.8, 57.99], ... [2.99, 44.99, 3.6, 59.99], ... ], ... } ... ) >>> table = session.read_pandas(df, table_name="Seller prices") >>> cube = session.create_cube(table) >>> l, m = cube.levels, cube.measures >>> cube.create_parameter_hierarchy_from_members( ... "ProductID", ... ["aBk3", "ceJ4", "aBk5", "ceJ9"], ... index_measure_name="Product index", ... ) >>> m["Prices"] = tt.agg.single_value(table["Prices"]) >>> m["Product price"] = m["Prices"][m["Product index"]] >>> cube.query( ... m["Product price"], ... levels=[l["Seller"], l["ProductID"]], ... ) Product price Seller ProductID Seller_1 aBk3 2.50 aBk5 3.00 ceJ4 49.99 ceJ9 54.99 Seller_2 aBk3 2.60 aBk5 2.80 ceJ4 50.99 ceJ9 57.99 Seller_3 aBk3 2.99 aBk5 3.60 ceJ4 44.99 ceJ9 59.99
- create_parameter_simulation(name, *, measures=None, levels=(), base_scenario_name='Base')#
Create a parameter simulation and its associated measures.
- Parameters
name (str) – The name of the simulation. This is also the name of the corresponding table that will be created.
measures (Optional[Mapping[str, Optional[Union[float, int, Iterable[int], Iterable[float]]]]]) – The mapping from the names of the created measures to their default value.
base_scenario_name (str) – The name of the base scenario.
- Return type
Example
>>> sales_table = session.read_csv( ... f"{TUTORIAL_RESOURCES}/sales.csv", ... table_name="Sales", ... keys=["Sale ID"], ... ) >>> shops_table = session.read_csv( ... f"{TUTORIAL_RESOURCES}/shops.csv", ... table_name="Shops", ... keys=["Shop ID"], ... ) >>> sales_table.join( ... shops_table, sales_table["Shop"] == shops_table["Shop ID"] ... ) >>> cube = session.create_cube(sales_table) >>> l, m = cube.levels, cube.measures
Creating a parameter simulation on one level:
>>> country_simulation = cube.create_parameter_simulation( ... "Country simulation", ... measures={"Country parameter": 1.0}, ... levels=[l["Country"]], ... ) >>> country_simulation += ("France crash", "France", 0.8) >>> country_simulation.head() Country parameter Scenario Country France crash France 0.8
France crash
is the name of the scenario.France
is the coordinate at which the value will be changed.0.8
is the value the Country parameter measure will have in this scenario.
>>> m["Unparametrized turnover"] = tt.agg.sum( ... sales_table["Unit price"] * sales_table["Quantity"] ... ) >>> m["Turnover"] = tt.agg.sum( ... m["Unparametrized turnover"] * m["Country parameter"], ... scope=tt.OriginScope(l["Country"]), ... ) >>> cube.query(m["Turnover"], levels=[l["Country simulation"]]) Turnover Country simulation Base 961,463.00 France crash 889,854.60
Drilldown to the Country level for more details:
>>> cube.query( ... m["Unparametrized turnover"], ... m["Country parameter"], ... m["Turnover"], ... levels=[l["Country simulation"], l["Country"]], ... ) Unparametrized turnover Country parameter Turnover Country simulation Country Base France 358,042.00 1.00 358,042.00 USA 603,421.00 1.00 603,421.00 France crash France 358,042.00 .80 286,433.60 USA 603,421.00 1.00 603,421.00
Creating a parameter simulation on multiple levels:
>>> size_simulation = cube.create_parameter_simulation( ... "Size simulation", ... measures={"Size parameter": 1.0}, ... levels=[l["Country"], l["Shop size"]], ... ) >>> size_simulation += ( ... "Going local", ... None, # ``None`` serves as a wildcard matching any member value. ... "big", ... 0.8, ... ) >>> size_simulation += ("Going local", "USA", "small", 1.2) >>> m["Turnover"] = tt.agg.sum( ... m["Unparametrized turnover"] ... * m["Country parameter"] ... * m["Size parameter"], ... scope=tt.OriginScope(l["Country"], l["Shop size"]), ... ) >>> cube.query( ... m["Turnover"], ... levels=[l["Size simulation"], l["Shop size"]], ... ) Turnover Size simulation Shop size Base big 120,202.00 medium 356,779.00 small 484,482.00 Going local big 96,161.60 medium 356,779.00 small 547,725.20
When several rules contain
None
, the one where the firstNone
appears last takes precedence.>>> size_simulation += ("Going France and Local", "France", None, 2) >>> size_simulation += ("Going France and Local", None, "small", 10) >>> cube.query( ... m["Unparametrized turnover"], ... m["Turnover"], ... levels=[l["Country"], l["Shop size"]], ... filter=l["Size simulation"] == "Going France and Local", ... ) Unparametrized turnover Turnover Country Shop size France big 47,362.00 94,724.00 medium 142,414.00 284,828.00 small 168,266.00 336,532.00 USA big 72,840.00 72,840.00 medium 214,365.00 214,365.00 small 316,216.00 3,162,160.00
Creating a parameter simulation without levels:
>>> crisis_simulation = cube.create_parameter_simulation( ... "Global Simulation", ... measures={"Global parameter": 1.0}, ... ) >>> crisis_simulation += ("Global Crisis", 0.9) >>> m["Turnover"] = m["Unparametrized turnover"] * m["Global parameter"] >>> cube.query(m["Turnover"], levels=[l["Global Simulation"]]) Turnover Global Simulation Base 961,463.00 Global Crisis 865,316.70
Creating a parameter simulation with multiple measures:
>>> multi_parameter_simulation = cube.create_parameter_simulation( ... "Price And Quantity", ... measures={ ... "Price parameter": 1.0, ... "Quantity parameter": 1.0, ... }, ... ) >>> multi_parameter_simulation += ("Price Up Quantity Down", 1.2, 0.8) >>> m["Simulated Price"] = ( ... tt.agg.single_value(sales_table["Unit price"]) ... * m["Price parameter"] ... ) >>> m["Simulated Quantity"] = ( ... tt.agg.single_value(sales_table["Quantity"]) ... * m["Quantity parameter"] ... ) >>> m["Turnover"] = tt.agg.sum_product( ... m["Simulated Price"], ... m["Simulated Quantity"], ... scope=tt.OriginScope(l["Sale ID"]), ... ) >>> cube.query(m["Turnover"], levels=[l["Price And Quantity"]]) Turnover Price And Quantity Base 961,463.00 Price Up Quantity Down 923,004.48
- explain_query(*measures, filter=None, include_empty_rows=False, include_totals=False, levels=(), scenario='Base', timeout=datetime.timedelta(seconds=30), **kwargs)#
Run the query but, instead of returning its result, return the explanation of how it was executed containing a summary, global timings, and the query plan with all the retrievals.
See also
query()
for the roles of the parameters.- Return type
QueryAnalysis
- property hierarchies: _LocalHierarchies#
Hierarchies of the cube.
- property levels: LevelsT#
Levels of the cube.
- property measures: _LocalMeasures#
Measures of the cube.
- query(*measures, context={}, filter=None, include_empty_rows=False, include_totals=False, levels=(), mode='pretty', scenario='Base', timeout=datetime.timedelta(seconds=30), **kwargs)#
Query the cube to retrieve the value of the passed measures on the given levels.
In JupyterLab with the
atoti-jupyterlab
plugin installed, query results can be converted to interactive widgets with the Convert to Widget Below action available in the command palette or by right clicking on the representation of the returned Dataframe.- Parameters
measures (BaseMeasure) – The measures to query.
filter (Optional[QueryFilter]) –
The filtering condition. Only conditions on level equality with a string are supported.
Examples
>>> df = pd.DataFrame( ... columns=["Continent", "Country", "Currency", "Price"], ... data=[ ... ("Europe", "France", "EUR", 200.0), ... ("Europe", "Germany", "EUR", 150.0), ... ("Europe", "United Kingdom", "GBP", 120.0), ... ("America", "United states", "USD", 240.0), ... ("America", "Mexico", "MXN", 270.0), ... ], ... ) >>> table = session.read_pandas( ... df, ... keys=["Continent", "Country", "Currency"], ... table_name="Prices", ... ) >>> cube = session.create_cube(table) >>> del cube.hierarchies["Continent"] >>> del cube.hierarchies["Country"] >>> cube.hierarchies["Geography"] = [ ... table["Continent"], ... table["Country"], ... ] >>> cube.measures["American Price"] = tt.where( ... cube.levels["Continent"] == "America", ... cube.measures["Price.SUM"], ... ) >>> h, l, m = cube.hierarchies, cube.levels, cube.measures
>>> cube.query( ... m["Price.SUM"], ... levels=[l["Country"]], ... filter=l["Continent"] == "Europe", ... ) Price.SUM Continent Country Europe France 200.00 Germany 150.00 United Kingdom 120.00
>>> cube.query( ... m["Price.SUM"], ... levels=[l["Country"], l["Currency"]], ... filter=( ... (l["Continent"] == "Europe") ... & (l["Currency"] == "EUR") ... ), ... ) Price.SUM Continent Country Currency Europe France EUR 200.00 Germany EUR 150.00
>>> cube.query( ... m["Price.SUM"], ... levels=[l["Country"]], ... filter=h["Geography"].isin( ... ("America",), ("Europe", "Germany") ... ), ... ) Price.SUM Continent Country America Mexico 270.00 United states 240.00 Europe Germany 150.00
include_empty_rows (bool) –
Whether to keep the rows where all the requested measures have no value.
Example
>>> cube.query( ... m["American Price"], ... levels=[l["Continent"]], ... include_empty_rows=True, ... ) American Price Continent America 510.00 Europe
include_totals (bool) –
Whether to query the grand total and subtotals and keep them in the returned DataFrame. Totals can be useful but they make the DataFrame harder to work with since its index will have some empty values.
Example
>>> cube.query( ... m["Price.SUM"], ... levels=[l["Country"], l["Currency"]], ... include_totals=True, ... ) Price.SUM Continent Country Currency Total 980.00 America 510.00 Mexico 270.00 MXN 270.00 United states 240.00 USD 240.00 Europe 470.00 France 200.00 EUR 200.00 Germany 150.00 EUR 150.00 United Kingdom 120.00 GBP 120.00
levels (Iterable[BaseLevel]) – The levels to split on. If
None
, the value of the measures at the top of the cube is returned.scenario (str) – The scenario to query.
timeout (timedelta) – The amount of time the query execution can take before aborting it.
mode (Literal['pretty', 'raw']) –
The query mode.
"pretty"
is best for queries returning small results:A
QueryResult
will be returned and its rows will be sorted according to the level order.
>>> cube.query( ... m["Price.SUM"], ... levels=[l["Continent"]], ... mode="pretty", ... ) Price.SUM Continent America 510.00 Europe 470.00
"raw"
is best for benchmarks or large exports:A faster and more efficient endpoint reducing the data transfer from Java to Python will be used.
A classic
pandas.DataFrame
will be returned.include_totals="True"
will not be allowed.The Convert to Widget Below action provided by the
atoti-jupyterlab
plugin will not be available.
>>> cube.query( ... m["Price.SUM"], ... levels=[l["Continent"]], ... mode="raw", ... ) Continent Price.SUM 0 Europe 470.0 1 America 510.0
Context values to use when executing the query.
Defaults to
atoti.Cube.shared_context
.
- Return type
- property schema: Any#
Schema of the cube’s tables, as an SVG image in IPython, as a path to the image otherwise.
Note
This requires Graphviz to be installed.
Context values shared by all the users.
Context values can also be set at query time, and per user, directly from the UI. The values in the shared context are the default ones for all the users.
queriesTimeLimit
The number of seconds after which a running query is cancelled and its resources reclaimed. Set to
-1
to remove the limit. Defaults to 30 seconds.queriesResultLimit.intermediateLimit
The limit number of point locations for a single intermediate result. This works as a safe-guard to prevent queries from consuming too much memory, which is especially useful when going to production with several simultaneous users on the same server. Set to
-1
to remove the limit.Defaults to
1_000_000
if theatoti-plus
plugin is enabled, and to no limit otherwise.queriesResultLimit.transientLimit
Similar to intermediateLimit but across all the intermediate results of the same query. Set to
-1
to remove the limit.Defaults to
10_000_000
if theatoti-plus
plugin is enabled, and to no limit otherwise.
Example
>>> df = pd.DataFrame( ... columns=["City", "Price"], ... data=[ ... ("London", 240.0), ... ("New York", 270.0), ... ("Paris", 200.0), ... ], ... ) >>> table = session.read_pandas( ... df, keys=["City"], table_name="shared_context example" ... ) >>> cube = session.create_cube(table) >>> cube.shared_context["queriesTimeLimit"] = 60 >>> cube.shared_context["queriesResultLimit.intermediateLimit"] = 1000000 >>> cube.shared_context["queriesResultLimit.transientLimit"] = 10000000 >>> cube.shared_context {'queriesTimeLimit': '60', 'queriesResultLimit.transientLimit': '10000000', 'queriesResultLimit.intermediateLimit': '1000000'}