atoti.AggregateProvider#

class atoti.AggregateProvider#

An aggregate provider pre-aggregates some table columns up to certain levels.

If a step of a query uses a subset of the aggregate provider’s levels and measures, the provider will speed up the query.

An aggregate provider uses additional memory to store the intermediate aggregates. The more levels and measures are added, the more memory it requires.

Example

>>> df = pd.DataFrame(
...     {
...         "Seller": ["Seller_1", "Seller_1", "Seller_2", "Seller_2"],
...         "ProductId": ["aBk3", "ceJ4", "aBk3", "ceJ4"],
...         "Price": [2.5, 49.99, 3.0, 54.99],
...     }
... )
>>> table = session.read_pandas(df, table_name="Seller")
>>> cube = session.create_cube(table)
>>> l, m = cube.levels, cube.measures
>>> cube.aggregate_providers.update(
...     {
...         "Seller provider": tt.AggregateProvider(
...             key="bitmap",
...             levels=[l["Seller"], l["ProductId"]],
...             measures=[m["Price.SUM"]],
...             filter=l["ProductId"] == "cdJ4",
...             partitioning="hash4(Seller)",
...         )
...     }
... )
filter: Condition[LevelIdentifier, Literal['eq', 'isin'], Constant, Literal['and'] | None] | None = None#

Only compute and provide aggregates matching this condition.

The levels used in the condition do not have to be part of this provider’s levels.

key: Literal['bitmap', 'leaf'] = 'leaf'#

The key of the provider.

The bitmap is generally faster but also takes more memory.

levels: Sequence[HasIdentifier[LevelIdentifier] | LevelIdentifier] = ()#

The levels to build the provider on.

measures: Sequence[HasIdentifier[MeasureIdentifier] | MeasureIdentifier]#

The measures to build the provider on.

partitioning: str | None = None#

The partitioning of the provider.

Default to the partitioning of the cube’s base table.