atoti.cube module#

class atoti.cube.CubeContext#

Manage the shared context of a cube.

class atoti.Cube#

Cube of a atoti.Session.

property aggregate_providers: MutableMapping[str, AggregateProvider]#
property aggregates_cache: AggregatesCache#

Aggregates cache of the cube.

create_date_hierarchy(name, *, column, levels={'Year': 'y', 'Month': 'M', 'Day': 'd'})#

Create a multilevel date hierarchy based on a date column.

The new levels are created by matching a date pattern. Here is a non-exhaustive list of patterns that can be used:

Pattern

Description

Type

Examples

y

Year

Integer

2001, 2005, 2020

yyyy

4-digits year

String

"2001", "2005", "2020"

M

Month of the year (1 based)

Integer

1, 5, 12

MM

2-digits month

String

"01", "05", "12"

d

Day of the month

Integer

1, 15, 30

dd

2-digits day of the month

String

"01", "15", "30"

w

Week number

Integer

1, 12, 51

Q

Quarter

Integer

1, 2, 3, 4

QQQ

Quarter prefixed with Q

String

"Q1", "Q2", "Q3", "Q4"

H

Hour of day (0-23)

Integer

0, 12, 23

HH

2-digits hour of day

String

"00", "12", "23"

m

Minute of hour

Integer

0, 30, 59

mm

2-digits minute of hour

String

"00", "30", "59"

s

Second of minute

Integer

0, 5, 55

ss

2-digits second of minute

String

"00", "05", "55"

Parameters
  • name (str) – The name of the hierarchy to create.

  • column (Column) – A table column containing a date or a datetime.

  • levels (Mapping[str, str]) – The mapping from the names of the levels to the patterns from which they will be created.

Example

>>> from datetime import date
>>> df = pd.DataFrame(
...     columns=["Date", "Quantity"],
...     data=[
...         (date(2020, 1, 10), 150.0),
...         (date(2020, 1, 20), 240.0),
...         (date(2019, 3, 17), 270.0),
...         (date(2019, 12, 12), 200.0),
...     ],
... )
>>> table = session.read_pandas(
...     df, keys=["Date"], table_name="create_date_hierarchy example"
... )
>>> cube = session.create_cube(table)
>>> l, m = cube.levels, cube.measures
>>> cube.create_date_hierarchy("Date parts", column=table["Date"])
>>> cube.query(
...     m["Quantity.SUM"],
...     include_totals=True,
...     levels=[l["Year"], l["Month"], l["Day"]],
... )
                Quantity.SUM
Year  Month Day
Total                 860.00
2019                  470.00
      3               270.00
            17        270.00
      12              200.00
            12        200.00
2020                  390.00
      1               390.00
            10        150.00
            20        240.00

The full date can also be added back as the last level of the hierarchy:

>>> h = cube.hierarchies
>>> h["Date parts"] = {**h["Date parts"].levels, "Date": table["Date"]}
>>> cube.query(
...     m["Quantity.SUM"],
...     include_totals=True,
...     levels=[l["Date parts", "Date"]],
... )
                           Quantity.SUM
Year  Month Day Date
Total                            860.00
2019                             470.00
      3                          270.00
            17                   270.00
                2019-03-17       270.00
      12                         200.00
            12                   200.00
                2019-12-12       200.00
2020                             390.00
      1                          390.00
            10                   150.00
                2020-01-10       150.00
            20                   240.00
                2020-01-20       240.00

Data inserted into the table after the hierarchy creation will be automatically hierarchized:

>>> table += (date(2021, 8, 30), 180.0)
>>> cube.query(
...     m["Quantity.SUM"],
...     include_totals=True,
...     levels=[l["Date parts", "Date"]],
...     filter=l["Year"] == "2021",
... )
                           Quantity.SUM
Year  Month Day Date
Total                            180.00
2021                             180.00
      8                          180.00
            30                   180.00
                2021-08-30       180.00
create_parameter_hierarchy_from_column(name, column)#

Create a single-level hierarchy which dynamically takes its members from a column.

Parameters
  • name (str) – Name given to the created dimension, hierarchy and its single level.

  • column (Column) – Column from which to take members.

Example

>>> df = pd.DataFrame(
...     {
...         "Seller": ["Seller_1", "Seller_1", "Seller_2", "Seller_2"],
...         "ProductId": ["aBk3", "ceJ4", "aBk3", "ceJ4"],
...         "Price": [2.5, 49.99, 3.0, 54.99],
...     }
... )
>>> table = session.read_pandas(df, table_name="Seller")
>>> cube = session.create_cube(table)
>>> l, m = cube.levels, cube.measures
>>> cube.create_parameter_hierarchy_from_column(
...     "Competitor", table["Seller"]
... )
>>> m["Price"] = tt.agg.single_value(table["Price"])
>>> m["Competitor price"] = tt.at(
...     m["Price"], l["Seller"] == l["Competitor"]
... )
>>> cube.query(
...     m["Competitor price"],
...     levels=[l["Seller"], l["ProductId"]],
... )
                   Competitor price
Seller   ProductId
Seller_1 aBk3                  2.50
         ceJ4                 49.99
Seller_2 aBk3                  2.50
         ceJ4                 49.99
>>> cube.query(
...     m["Competitor price"],
...     levels=[l["Seller"], l["ProductId"]],
...     filter=l["Competitor"] == "Seller_2",
... )
                   Competitor price
Seller   ProductId
Seller_1 aBk3                  3.00
         ceJ4                 54.99
Seller_2 aBk3                  3.00
         ceJ4                 54.99
create_parameter_hierarchy_from_members(name, members, *, data_type=None, index_measure_name=None)#

Create a single-level hierarchy with the given members.

It can be used as a parameter hierarchy in advanced analyzes.

Parameters
  • name (str) – The name of hierarchy and its single level.

  • members (Collection[Any]) – The members of the hierarchy.

  • data_type (Optional[DataType]) – The type with which the members will be stored. Automatically inferred by default.

  • index_measure_name (Optional[str]) – The name of the indexing measure to create for this hierarchy, if any.

Example

>>> df = pd.DataFrame(
...     {
...         "Seller": ["Seller_1", "Seller_2", "Seller_3"],
...         "Prices": [
...             [2.5, 49.99, 3.0, 54.99],
...             [2.6, 50.99, 2.8, 57.99],
...             [2.99, 44.99, 3.6, 59.99],
...         ],
...     }
... )
>>> table = session.read_pandas(df, table_name="Seller prices")
>>> cube = session.create_cube(table)
>>> l, m = cube.levels, cube.measures
>>> cube.create_parameter_hierarchy_from_members(
...     "ProductID",
...     ["aBk3", "ceJ4", "aBk5", "ceJ9"],
...     index_measure_name="Product index",
... )
>>> m["Prices"] = tt.agg.single_value(table["Prices"])
>>> m["Product price"] = m["Prices"][m["Product index"]]
>>> cube.query(
...     m["Product price"],
...     levels=[l["Seller"], l["ProductID"]],
... )
                   Product price
Seller   ProductID
Seller_1 aBk3               2.50
         aBk5               3.00
         ceJ4              49.99
         ceJ9              54.99
Seller_2 aBk3               2.60
         aBk5               2.80
         ceJ4              50.99
         ceJ9              57.99
Seller_3 aBk3               2.99
         aBk5               3.60
         ceJ4              44.99
         ceJ9              59.99
create_parameter_simulation(name, *, measures=None, levels=(), base_scenario_name='Base')#

Create a parameter simulation and its associated measures.

Parameters
Return type

Table

Example

>>> sales_table = session.read_csv(
...     f"{TUTORIAL_RESOURCES}/sales.csv",
...     table_name="Sales",
...     keys=["Sale ID"],
... )
>>> shops_table = session.read_csv(
...     f"{TUTORIAL_RESOURCES}/shops.csv",
...     table_name="Shops",
...     keys=["Shop ID"],
... )
>>> sales_table.join(shops_table, mapping={"Shop": "Shop ID"})
>>> cube = session.create_cube(sales_table)
>>> l, m = cube.levels, cube.measures

Creating a parameter simulation on one level:

>>> country_simulation = cube.create_parameter_simulation(
...     "Country simulation",
...     measures={"Country parameter": 1.0},
...     levels=[l["Country"]],
... )
>>> country_simulation += ("France crash", "France", 0.8)
>>> country_simulation.head()
                      Country parameter
Scenario     Country
France crash France                 0.8
  • France crash is the name of the scenario.

  • France is the coordinate at which the value will be changed.

  • 0.8 is the value the Country parameter measure will have in this scenario.

>>> m["Unparametrized turnover"] = tt.agg.sum(
...     sales_table["Unit price"] * sales_table["Quantity"]
... )
>>> m["Turnover"] = tt.agg.sum(
...     m["Unparametrized turnover"] * m["Country parameter"],
...     scope=tt.OriginScope(l["Country"]),
... )
>>> cube.query(m["Turnover"], levels=[l["Country simulation"]])
                      Turnover
Country simulation
Base                961,463.00
France crash        889,854.60

Drilldown to the Country level for more details:

>>> cube.query(
...     m["Unparametrized turnover"],
...     m["Country parameter"],
...     m["Turnover"],
...     levels=[l["Country simulation"], l["Country"]],
... )
                           Unparametrized turnover Country parameter    Turnover
Country simulation Country
Base               France               358,042.00              1.00  358,042.00
                   USA                  603,421.00              1.00  603,421.00
France crash       France               358,042.00               .80  286,433.60
                   USA                  603,421.00              1.00  603,421.00

Creating a parameter simulation on multiple levels:

>>> size_simulation = cube.create_parameter_simulation(
...     "Size simulation",
...     measures={"Size parameter": 1.0},
...     levels=[l["Country"], l["Shop size"]],
... )
>>> size_simulation += (
...     "Going local",
...     None,  # ``None`` serves as a wildcard matching any member value.
...     "big",
...     0.8,
... )
>>> size_simulation += ("Going local", "USA", "small", 1.2)
>>> m["Turnover"] = tt.agg.sum(
...     m["Unparametrized turnover"]
...     * m["Country parameter"]
...     * m["Size parameter"],
...     scope=tt.OriginScope(l["Country"], l["Shop size"]),
... )
>>> cube.query(
...     m["Turnover"],
...     levels=[l["Size simulation"], l["Shop size"]],
... )
                             Turnover
Size simulation Shop size
Base            big        120,202.00
                medium     356,779.00
                small      484,482.00
Going local     big         96,161.60
                medium     356,779.00
                small      547,725.20

When several rules contain None, the one where the first None appears last takes precedence.

>>> size_simulation += ("Going France and Local", "France", None, 2)
>>> size_simulation += ("Going France and Local", None, "small", 10)
>>> cube.query(
...     m["Unparametrized turnover"],
...     m["Turnover"],
...     levels=[l["Country"], l["Shop size"]],
...     filter=l["Size simulation"] == "Going France and Local",
... )
                  Unparametrized turnover      Turnover
Country Shop size
France  big                     47,362.00     94,724.00
        medium                 142,414.00    284,828.00
        small                  168,266.00    336,532.00
USA     big                     72,840.00     72,840.00
        medium                 214,365.00    214,365.00
        small                  316,216.00  3,162,160.00

Creating a parameter simulation without levels:

>>> crisis_simulation = cube.create_parameter_simulation(
...     "Global Simulation",
...     measures={"Global parameter": 1.0},
... )
>>> crisis_simulation += ("Global Crisis", 0.9)
>>> m["Turnover"] = m["Unparametrized turnover"] * m["Global parameter"]
>>> cube.query(m["Turnover"], levels=[l["Global Simulation"]])
                     Turnover
Global Simulation
Base               961,463.00
Global Crisis      865,316.70

Creating a parameter simulation with multiple measures:

>>> multi_parameter_simulation = cube.create_parameter_simulation(
...     "Price And Quantity",
...     measures={
...         "Price parameter": 1.0,
...         "Quantity parameter": 1.0,
...     },
... )
>>> multi_parameter_simulation += ("Price Up Quantity Down", 1.2, 0.8)
>>> m["Simulated Price"] = (
...     tt.agg.single_value(sales_table["Unit price"])
...     * m["Price parameter"]
... )
>>> m["Simulated Quantity"] = (
...     tt.agg.single_value(sales_table["Quantity"])
...     * m["Quantity parameter"]
... )
>>> m["Turnover"] = tt.agg.sum_product(
...     m["Simulated Price"],
...     m["Simulated Quantity"],
...     scope=tt.OriginScope(l["Sale ID"]),
... )
>>> cube.query(m["Turnover"], levels=[l["Price And Quantity"]])
                          Turnover
Price And Quantity
Base                    961,463.00
Price Up Quantity Down  923,004.48
explain_query(*measures, filter=None, include_empty_rows=False, include_totals=False, levels=(), scenario='Base', timeout=datetime.timedelta(seconds=30), **kwargs)#

Run the query but, instead of returning its result, return the explanation of how it was executed containing a summary, global timings, and the query plan with all the retrievals.

See also

query() for the roles of the parameters.

Return type

QueryAnalysis

property hierarchies: _LocalHierarchies#

Hierarchies of the cube.

property levels: LevelsT#

Levels of the cube.

property measures: _LocalMeasures#

Measures of the cube.

property name: str#

Name of the cube.

query(*measures, context={}, filter=None, include_empty_rows=False, include_totals=False, levels=(), mode='pretty', scenario='Base', timeout=datetime.timedelta(seconds=30), **kwargs)#

Query the cube to retrieve the value of the passed measures on the given levels.

In JupyterLab with the atoti-jupyterlab plugin installed, query results can be converted to interactive widgets with the Convert to Widget Below action available in the command palette or by right clicking on the representation of the returned Dataframe.

Parameters
  • measures (BaseMeasure) – The measures to query.

  • filter (Optional[QueryFilter]) –

    The filtering condition. Only conditions on level equality with a string are supported.

    Examples

    >>> df = pd.DataFrame(
    ...     columns=["Continent", "Country", "Currency", "Price"],
    ...     data=[
    ...         ("Europe", "France", "EUR", 200.0),
    ...         ("Europe", "Germany", "EUR", 150.0),
    ...         ("Europe", "United Kingdom", "GBP", 120.0),
    ...         ("America", "United states", "USD", 240.0),
    ...         ("America", "Mexico", "MXN", 270.0),
    ...     ],
    ... )
    >>> table = session.read_pandas(
    ...     df,
    ...     keys=["Continent", "Country", "Currency"],
    ...     table_name="Prices",
    ... )
    >>> cube = session.create_cube(table)
    >>> del cube.hierarchies["Continent"]
    >>> del cube.hierarchies["Country"]
    >>> cube.hierarchies["Geography"] = [
    ...     table["Continent"],
    ...     table["Country"],
    ... ]
    >>> cube.measures["American Price"] = tt.where(
    ...     cube.levels["Continent"] == "America",
    ...     cube.measures["Price.SUM"],
    ... )
    >>> h, l, m = cube.hierarchies, cube.levels, cube.measures
    
    >>> cube.query(
    ...     m["Price.SUM"],
    ...     levels=[l["Country"]],
    ...     filter=l["Continent"] == "Europe",
    ... )
                             Price.SUM
    Continent Country
    Europe    France            200.00
              Germany           150.00
              United Kingdom    120.00
    
    >>> cube.query(
    ...     m["Price.SUM"],
    ...     levels=[l["Country"], l["Currency"]],
    ...     filter=(
    ...         (l["Continent"] == "Europe")
    ...         & (l["Currency"] == "EUR")
    ...     ),
    ... )
                               Price.SUM
    Continent Country Currency
    Europe    France  EUR         200.00
              Germany EUR         150.00
    
    >>> cube.query(
    ...     m["Price.SUM"],
    ...     levels=[l["Country"]],
    ...     filter=h["Geography"].isin(
    ...         ("America",), ("Europe", "Germany")
    ...     ),
    ... )
                            Price.SUM
    Continent Country
    America   Mexico           270.00
              United states    240.00
    Europe    Germany          150.00
    

  • include_empty_rows (bool) –

    Whether to keep the rows where all the requested measures have no value.

    Example

    >>> cube.query(
    ...     m["American Price"],
    ...     levels=[l["Continent"]],
    ...     include_empty_rows=True,
    ... )
              American Price
    Continent
    America           510.00
    Europe
    

  • include_totals (bool) –

    Whether to query the grand total and subtotals and keep them in the returned DataFrame. Totals can be useful but they make the DataFrame harder to work with since its index will have some empty values.

    Example

    >>> cube.query(
    ...     m["Price.SUM"],
    ...     levels=[l["Country"], l["Currency"]],
    ...     include_totals=True,
    ... )
                                      Price.SUM
    Continent Country        Currency
    Total                                980.00
    America                              510.00
              Mexico                     270.00
                             MXN         270.00
              United states              240.00
                             USD         240.00
    Europe                               470.00
              France                     200.00
                             EUR         200.00
              Germany                    150.00
                             EUR         150.00
              United Kingdom             120.00
                             GBP         120.00
    

  • levels (Iterable[BaseLevel]) – The levels to split on. If None, the value of the measures at the top of the cube is returned.

  • scenario (str) – The scenario to query.

  • timeout (timedelta) – The amount of time the query execution can take before aborting it.

  • mode (Literal['pretty', 'raw']) –

    The query mode.

    • "pretty" is best for queries returning small results:

      • A QueryResult will be returned and its rows will be sorted according to the level order.

      >>> cube.query(
      ...     m["Price.SUM"],
      ...     levels=[l["Continent"]],
      ...     mode="pretty",
      ... )
                Price.SUM
      Continent
      America      510.00
      Europe       470.00
      
    • "raw" is best for benchmarks or large exports:

      • A faster and more efficient endpoint reducing the data transfer from Java to Python will be used.

      • A classic pandas.DataFrame will be returned.

      • include_totals="True" will not be allowed.

      • The Convert to Widget Below action provided by the atoti-jupyterlab plugin will not be available.

      >>> cube.query(
      ...     m["Price.SUM"],
      ...     levels=[l["Continent"]],
      ...     mode="raw",
      ... )
        Continent  Price.SUM
      0    Europe      470.0
      1   America      510.0
      

  • context (Mapping[str, Any]) –

    Context values to use when executing the query.

    Defaults to atoti.Cube.shared_context.

Return type

DataFrame

property schema: Any#

Schema of the cube’s tables, as an SVG image in IPython, as a path to the image otherwise.

Note

This requires Graphviz to be installed.

property shared_context: CubeContext#

Context values shared by all the users.

Context values can also be set at query time, and per user, directly from the UI. The values in the shared context are the default ones for all the users.

  • queriesTimeLimit

    The number of seconds after which a running query is cancelled and its resources reclaimed. Set to -1 to remove the limit. Defaults to 30 seconds.

  • queriesResultLimit.intermediateLimit

    The limit number of point locations for a single intermediate result. This works as a safe-guard to prevent queries from consuming too much memory, which is especially useful when going to production with several simultaneous users on the same server. Set to -1 to remove the limit.

    Defaults to 1_000_000 if the atoti-plus plugin is enabled, and to no limit otherwise.

  • queriesResultLimit.transientLimit

    Similar to intermediateLimit but across all the intermediate results of the same query. Set to -1 to remove the limit.

    Defaults to 10_000_000 if the atoti-plus plugin is enabled, and to no limit otherwise.

Example

>>> df = pd.DataFrame(
...     columns=["City", "Price"],
...     data=[
...         ("London", 240.0),
...         ("New York", 270.0),
...         ("Paris", 200.0),
...     ],
... )
>>> table = session.read_pandas(
...     df, keys=["City"], table_name="shared_context example"
... )
>>> cube = session.create_cube(table)
>>> cube.shared_context["queriesTimeLimit"] = 60
>>> cube.shared_context["queriesResultLimit.intermediateLimit"] = 1000000
>>> cube.shared_context["queriesResultLimit.transientLimit"] = 10000000
>>> cube.shared_context
{'queriesTimeLimit': '60', 'queriesResultLimit.transientLimit': '10000000', 'queriesResultLimit.intermediateLimit': '1000000'}