atoti package

Submodules

atoti.agg module

atoti.agg.count_distinct(measure, *, scope=None)

Return a measure equal to the distinct count of the passed measure across the specified scope.

Parameters
  • measure (Union[Measure, MeasureConvertible]) – The measure or store column to aggregate.

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.agg.long(measure, *, scope=None)

Return a measure equal to the sum of the positive values of the passed measure across the specified scope.

Parameters
  • measure (Union[Measure, MeasureConvertible]) – The measure or store column to aggregate.

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.agg.max(measure, *, scope=None)

Return a measure equal to the maximum of the passed measure across the specified scope.

Parameters
  • measure (Union[Measure, MeasureConvertible]) – The measure or store column to aggregate.

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.agg.max_member(measure, level)

Return a measure equal to the member maximizing the passed measure on the given level.

When multiple members maximize the passed measure, the first one (according to the comparator of the given level) is returned.

Example

Considering this dataset:

Continent

City

Price

Europe

Paris

200.0

Europe

Berlin

150.0

Europe

London

240.0

North America

New York

270.0

And this measure:

m["City with max price"] = atoti.agg.max_member(m["Price"], lvl["City"])

Then, at the given level, the measure is equal to the current member of the City level:

cube.query(m["Price"], m["City with max price"], levels=lvl["City"])

City

Price

City with max price

Paris

200.0

Paris

Berlin

150.0

Berlin

London

240.0

London

New York

270.0

New York

At a level above it, the measure is equal to the city of each continent with the maximum price:

cube.query(m["City with min price"], levels=lvl["Continent"])

Continent

City with max price

Europe

London

North America

New York

At the top level, the measure is equal to the city with the maxium price across all continents:

cube.query(m["City with max price"])

City with max Price

New York

Parameters
Return type

Measure

atoti.agg.mean(measure, *, scope=None)

Return a measure equal to the mean of the passed measure across the specified scope.

Parameters
  • measure (Union[Measure, MeasureConvertible]) – The measure or store column to aggregate.

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.agg.median(measure, *, scope=None)

Return a measure equal to the median of the passed measure across the specified scope.

Parameters
  • measure (Union[Measure, MeasureConvertible]) – The measure or store column to aggregate.

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.agg.min(measure, *, scope=None)

Return a measure equal to the minimum of the passed measure across the specified scope.

Parameters
  • measure (Union[Measure, MeasureConvertible]) – The measure or store column to aggregate.

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.agg.min_member(measure, level)

Return a measure equal to the member minimizing the passed measure on the given level.

When multiple members maximize the passed measure, the first one (according to the comparator of the given level) is returned.

Example

Considering this dataset:

Continent

City

Price

Europe

Paris

200.0

Europe

Berlin

150.0

Europe

London

240.0

North America

New York

270.0

And this measure:

m["City with min price"] = atoti.agg.min_member(m["Price"], lvl["City"])

Then, at the given level, the measure is equal to the current member of the City level:

cube.query(m["Price"], m["City with min price"], levels=lvl["City"])

City

Price

City with min price

Paris

200.0

Paris

Berlin

150.0

Berlin

London

240.0

London

New York

270.0

New York

At a level above it, the measure is equal to the city of each continent with the minimum price:

cube.query(m["City with min price"], levels=lvl["Continent"])

Continent

City with min price

Europe

Berlin

North America

New York

At the top level, the measure is equal to the city with the minium price across all continents:

cube.query(m["City with min price"])

City with min Price

Berlin

Parameters
atoti.agg.prod(measure, *, scope=None)

Return a measure equal to the product of the passed measure across the specified scope.

Parameters
  • measure (Union[Measure, MeasureConvertible]) – The measure or store column to aggregate.

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.agg.quantile(measure, q, *, mode='inc', interpolation='linear', scope=None)

Return a measure equal to the requested quantile of the passed measure across the specified scope.

Here is how to obtain the same behaviour as these standard quantile calculation methods:

  • R-1: mode="centered" and interpolation="lower"

  • R-2: mode="centered" and interpolation="midpoint"

  • R-3: mode="simple" and interpolation="nearest"

  • R-4: mode="simple" and interpolation="linear"

  • R-5: mode="centered" and interpolation="linear"

  • R-6 (similar to Excel’s PERCENTILE.EXC): mode="exc" and interpolation="linear"

  • R-7 (similar to Excel’s PERCENTILE.INC): mode="inc" and interpolation="linear"

  • R-8 and R-9 are not supported

The formulae given for the calculation of the quantile index assume a 1-based indexing system.

Parameters
  • measure (Union[Measure, MeasureConvertible]) – The measure to get the quantile of.

  • q (Union[float, Measure]) – The quantile to take. Must be between 0 and 1. For instance, 0.95 is the 95th percentile and 0.5 is the median.

  • mode (Literal[‘simple’, ‘centered’, ‘inc’, ‘exc’]) –

    The method used to calculate the index of the quantile. Available options are, when searching for the q quantile of a vector X:

    • simple: len(X) * q

    • centered: len(X) * q + 0.5

    • exc: (len(X) + 1) * q

    • inc: (len(X) - 1) * q + 1

  • interpolation (Literal[‘linear’, ‘higher’, ‘lower’, ‘nearest’, ‘midpoint’]) –

    If the quantile index is not an integer, the interpolation decides what value is returned. The different options are, considering a quantile index k with i < k < j for a sorted vector X:

    • linear: v = X[i] + (X[j] - X[i]) * (k - i)

    • lowest: v = X[i]

    • highest: v = X[j]

    • nearest: v = X[i] or v = X[j] depending on which of i or j is closest to k

    • midpoint: v = (X[i] + X[j]) / 2

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.agg.short(measure, *, scope=None)

Return a measure equal to the sum of the negative values of the passed measure across the specified scope.

Parameters
  • measure (Union[Measure, MeasureConvertible]) – The measure or store column to aggregate.

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.agg.square_sum(measure, *, scope=None)

Return a measure equal to the sum of the square of the passed measure across the specified scope.

Parameters
  • measure (Union[Measure, MeasureConvertible]) – The measure or store column to aggregate.

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.agg.std(measure, *, mode='sample', scope=None)

Return a measure equal to the standard deviation of the passed measure across the specified scope.

Parameters
  • measure (Union[Measure, MeasureConvertible]) – The measure to get the standard deviation of.

  • mode (Literal[‘sample’, ‘population’]) –

    One of the supported modes:

    • The sample standard deviation, similar to Excel’s STDEV.S, is \(\sqrt{\frac{\sum_{i=0}^{n} (X_i - m)^{2}}{n - 1}}\) where m is the sample mean and n the size of the sample. Use this mode if the data represents a sample of the population.

    • The population standard deviation, similar to Excel’s STDEV.P is \(\sqrt{\frac{\sum_{i=0}^{n}(X_i - m)^{2}}{n}}\) where m is the mean of the Xi elements and n the size of the population. Use this mode if the data represents the entire population.

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.agg.sum(measure, *, scope=None)

Return a measure equal to the sum of the passed measure across the specified scope.

Parameters
  • measure (Union[Measure, MeasureConvertible]) – The measure or store column to aggregate.

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.agg.sum_product(*factors, scope=None)

Return a measure equal to the sum product aggregation of the passed factors across the specified scope.

Example

Considering this dataset with, for each day and product, the sold amount and the product price:

Date

Product ID

Category

Price

Amount

Array

2020-01-01

001

TV

300.0

5.0

[10,15]

2020-01-02

001

TV

200.0

1.0

[5,15]

2020-01-01

002

Computer

900.0

2.0

[2,3]

2020-01-02

002

Computer

800.0

3.0

[10,20]

2020-01-01

003

TV

500.0

2.0

[3,10]

To compute the turnover:

m["turnover"] = atoti.agg.sum_product(store["Price"], store["Amount"])

To compute the turnover per category:

cube.query(m["turnover"], levels=["Category"])

It returns:

Category

turnover

TV

2700

Computer

4200

Sum product is also optimized for operations on vectors:

m["array sum product"] = atoti.agg.sum_product(store["Amount"], store["Array"])

cube.query(m[“array sum product”]) return [95.0, 176.0]

Parameters
  • factors (Union[Measure, MeasureConvertible]) – Column, Measure or Level to do the sum product of.

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.agg.var(measure, *, mode='sample', scope=None)

Return a measure equal to the variance of the passed measure across the specified scope.

Parameters
  • measure (Union[Measure, MeasureConvertible]) – The measure to get the variance of.

  • mode (Literal[‘sample’, ‘population’]) –

    One of the supported modes:

    • The sample variance, similar to Excel’s VAR.S, is \(\frac{\sum_{i=0}^{n} (X_i - m)^{2}}{n - 1}\) where m is the sample mean and n the size of the sample. Use this mode if the data represents a sample of the population.

    • The population variance, similar to Excel’s VAR.P is \(\frac{\sum_{i=0}^{n}(X_i - m)^{2}}{n}\) where m is the mean of the Xi elements and n the size of the population. Use this mode if the data represents the entire population.

  • scope (Optional[Scope]) – The scope of the aggregation. When None is specified, the natural aggregation scope is used: it contains all the data in the cube which coordinates match the ones of the currently evaluated member.

Return type

Measure

atoti.aggregates_cache module

class atoti.aggregates_cache.AggregatesCache(_java_api, _cube)

Bases: object

The aggregates cache associated with a cube.

property capacity

Capacity of the cache.

It is the number of {location: measure} pairs of all the aggregates that can be stored.

A strictly negative value will disable caching.

A zero value will enable sharing but no caching. This means that queries will share their computations if they are executed at the same time, but the aggregated values will not be stored to be retrieved later.

Return type

int

atoti.array module

atoti.array.len(measure)

Return a measure equal to the number of elements of the passed array measure.

Return type

Measure

atoti.array.max(measure)

Return a measure equal to the maximum element of the passed array measure.

The max of an empty array is None.

Return type

Measure

atoti.array.mean(measure)

Return a measure equal to the mean of all the elements of the passed array measure.

The mean of an empty array is 0.

Return type

Measure

atoti.array.min(measure)

Return a measure equal to the minimum element of the passed array measure.

The min of an empty array is None.

Return type

Measure

atoti.array.n_greatest(measure, n)

Return an array measure containing the n greatest elements of the passed array measure.

Return type

Measure

atoti.array.n_greatest_indices(measure, n)

Return an array measure containing the indices of the n greatest elements of the passed array measure.

Example

The following example creates a measure that returns the 3 greatest indices of an array:

m["array"] = [ 400, 200, 100, 500, 300]
m["3 greatest indices"] = atoti.array.n_greatest_indices(m["array"], 3)

This measure will return [3, 0, 4] because the greatest values are 500 at index 3, then 400 at index 0 and 300 at index 4.

Return type

Measure

atoti.array.n_lowest(measure, n)

Return an array measure containing the n lowest elements of the passed array measure.

Return type

Measure

atoti.array.n_lowest_indices(measure, n)

Return an array measure containing the indices of the n lowest elements of the passed array measure.

Example

The following example creates a measure that returns the 3 lowest indices of an array:

m["array"] = [ 400, 200, 100, 500, 300]
m["3 lowest indices"] = atoti.array.n_lowest_indices(m["array"], 3)

This measure will return [2, 1, 4] because the lowest values are 100 at index 2, then 200 at index 1 and 300 at index 4.

Return type

Measure

atoti.array.negative_values(measure)

Return a measure where all the elements > 0 of the passed array measure are replaced by 0.

Return type

Measure

atoti.array.nth_greatest(measure, n)

Return a measure equal to the n-th greatest element of the passed array measure.

Return type

Measure

atoti.array.nth_lowest(measure, n)

Return a measure equal to the n-th lowest element of the passed array measure.

Return type

Measure

atoti.array.positive_values(measure)

Return a measure where all the elements < 0 of the passed array measure are replaced by 0.

Return type

Measure

atoti.array.prefix_sum(measure)

Return a measure equal to the sum of the previous elements in the passed array measure.

Example

If an array has the following values: [2.0, 1.0, 0.0, 3.0], the returned array will be: [2.0, 3.0, 3.0, 6.0].

Return type

Measure

atoti.array.prod(measure)

Return a measure equal to the product of all the elements of the passed array measure.

The product of an empty array is 1.

Return type

Measure

atoti.array.quantile(measure, q, *, mode='inc', interpolation='linear')

Return a measure equal to the requested quantile of the elements of the passed array measure.

Here is how to obtain the same behaviour as these standard quantile calculation methods:

  • R-1: mode="centered" and interpolation="lower"

  • R-2: mode="centered" and interpolation="midpoint"

  • R-3: mode="simple" and interpolation="nearest"

  • R-4: mode="simple" and interpolation="linear"

  • R-5: mode="centered" and interpolation="linear"

  • R-6 (similar to Excel’s PERCENTILE.EXC): mode="exc" and interpolation="linear"

  • R-7 (similar to Excel’s PERCENTILE.INC): mode="inc" and interpolation="linear"

  • R-8 and R-9 are not supported

The formulae given for the calculation of the quantile index assume a 1-based indexing system.

Parameters
  • measure (Measure) – The measure to get the quantile of.

  • q (Union[float, Measure]) – The quantile to take. Must be between 0 and 1. For instance, 0.95 is the 95th percentile and 0.5 is the median.

  • mode (Literal[‘simple’, ‘centered’, ‘inc’, ‘exc’]) –

    The method used to calculate the index of the quantile. Available options are, when searching for the q quantile of a vector X:

    • simple: len(X) * q

    • centered: len(X) * q + 0.5

    • exc: (len(X) + 1) * q

    • inc: (len(X) - 1) * q + 1

  • interpolation (Literal[‘linear’, ‘higher’, ‘lower’, ‘nearest’, ‘midpoint’]) –

    If the quantile index is not an integer, the interpolation decides what value is returned. The different options are, considering a quantile index k with i < k < j for a sorted vector X:

    • linear: v = X[i] + (X[j] - X[i]) * (k - i)

    • lowest: v = X[i]

    • highest: v = X[j]

    • nearest: v = X[i] or v = X[j] depending on which of i or j is closest to k

    • midpoint: v = (X[i] + X[j]) / 2

Return type

Measure

atoti.array.quantile_index(measure, q, *, mode='inc', interpolation='lower')

Return a measure equal to the index of requested quantile of the elements of the passed array measure.

Parameters
  • measure (Measure) – The measure to get the quantile of.

  • q (Union[float, Measure]) – The quantile to take. Must be between 0 and 1. For instance, 0.95 is the 95th percentile and 0.5 is the median.

  • mode (Literal[‘simple’, ‘centered’, ‘inc’, ‘exc’]) –

    The method used to calculate the index of the quantile. Available options are, when searching for the q quantile of a vector X:

    • simple: len(X) * q

    • centered: len(X) * q + 0.5

    • exc: (len(X) + 1) * q

    • inc: (len(X) - 1) * q + 1

  • interpolation (Literal[‘higher’, ‘lower’, ‘nearest’]) –

    If the quantile index is not an integer, the interpolation decides what value is returned. The different options are, considering a quantile index k with i < k < j for the original vector X and the sorted vector Y:

    • lowest: the index in X of Y[i]

    • highest: the index in X of Y[j]

    • nearest: the index in X of Y[i] or Y[j] depending on which of i or j is closest to k

Return type

Measure

atoti.array.sort(measure, *, ascending=True)

Return an array measure with the elements of the passed array measure sorted.

Parameters
  • measure (Measure) – The array measure to sort.

  • ascending (bool) – When set to False, the first value will be the greatest.

Return type

Measure

atoti.array.std(measure, *, mode='sample')

Return a measure equal to the standard deviation of the elements of the passed array measure.

Parameters
  • measure (Measure) – The measure to get the standard deviation of.

  • mode (Literal[‘sample’, ‘population’]) –

    One of the supported modes:

    • The sample standard deviation, similar to Excel’s STDEV.S, is \(\sqrt{\frac{\sum_{i=0}^{n} (X_i - m)^{2}}{n - 1}}\) where m is the sample mean and n the size of the sample. Use this mode if the data represents a sample of the population.

    • The population standard deviation, similar to Excel’s STDEV.P is \(\sqrt{\frac{\sum_{i=0}^{n}(X_i - m)^{2}}{n}}\) where m is the mean of the Xi elements and n the size of the population. Use this mode if the data represents the entire population.

Return type

Measure

atoti.array.sum(measure)

Return a measure equal to the sum of all the elements of the passed array measure.

The sum of an empty array is 0.

Return type

Measure

atoti.array.var(measure, *, mode='sample')

Return a measure equal to the variance of the elements of the passed array measure.

Parameters
  • measure (Measure) – The measure to get the variance of.

  • mode (Literal[‘sample’, ‘population’]) –

    One of the supported modes:

    • The sample variance, similar to Excel’s VAR.S, is \(\frac{\sum_{i=0}^{n} (X_i - m)^{2}}{n - 1}\) where m is the sample mean and n the size of the sample. Use this mode if the data represents a sample of the population.

    • The population variance, similar to Excel’s VAR.P is \(\frac{\sum_{i=0}^{n}(X_i - m)^{2}}{n}\) where m is the mean of the Xi elements and n the size of the population. Use this mode if the data represents the entire population.

Return type

Measure

atoti.column module

class atoti.column.Column(name, data_type, _store)

Bases: atoti.measure.MeasureConvertible

Column of a Store.

data_type: DataType

The type of the elements in the column.

name: str

The name of the column.

atoti.comparator module

class atoti.comparator.Comparator(_name, _first_members)

Bases: object

Level comparator.

atoti.comparator.first_members(*members)

Create a level comparator with the given first members.

Example:

atoti.comparator.first_members("gold", "silver", "bronze")
Return type

Comparator

atoti.copy_tutorial module

atoti.cube module

class atoti.cube.Cube(java_api, name, base_store, session)

Bases: atoti._local_cube.ALocalCube

Cube of a Session.

property aggregates_cache

Aggregates cache of the cube.

Return type

AggregatesCache

create_static_parameter_hierarchy(name, members, *, data_type=None, index_measure=None, indices=None, store_name=None)

Create an arbitrary single-level static hierarchy with the given members.

It can be used as a parameter hierarchy in advanced analyses.

Parameters
  • name (str) – The name of hierarchy and its single level.

  • members (Sequence[Any]) – The members of the hierarchy.

  • data_type (Optional[DataType]) – The type with which the members will be stored. Automatically inferred by default.

  • index_measure (Optional[str]) – The name of the indexing measure to create for this hierarchy, if any.

  • indices (Optional[Sequence[int]]) – The custom indices for each member in the new hierarchy. They are used when accessing a member through the index_measure. Defaults to range(len(members)).

  • store_name (Optional[str]) – The name of the store backing the parameter hierarchy. Defaults to the passed name argument.

create_store_column_parameter_hierarchy(name, column)

Create a single level static hierarchy which takes its members from a column.

explain_query(*measures, levels=None, condition=None, scenario='Base', timeout=30)

Run the query but return an explanation of the query instead of the result.

The explanation contains a summary, global timings and the query plan with all the retrievals.

Parameters
  • measures (~_Measure) – The measures to query. If None, all the measures are queried.

  • levels (Union[~_Level, Sequence[~_Level], None]) – The levels to split on. If None, the value of the measures at the top of the cube is returned.

  • condition (Union[LevelCondition, MultiCondition, LevelIsInCondition, HierarchyIsInCondition, None]) –

    The filtering condition. Only conditions on level equality with a string are supported. For instance:

    • lvl["Country"] == "France"

    • (lvl["Country"] == "USA") & (lvl["Currency"] == "USD")

  • scenario (str) – The scenario to query.

  • timeout (int) – The query timeout in seconds.

Return type

QueryAnalysis

Returns

The query explanation.

property hierarchies

Hierarchies of the cube.

Return type

~_Hierarchies

property levels

Levels of the cube.

Return type

~_Levels

property measures

Measures of the cube.

Return type

~_Measures

property name

Name of the cube.

Return type

str

query(*measures, levels=None, condition=None, scenario='Base', timeout=30)

Query the cube to get the value of some measures.

The value of the measures is given on all the members of the given levels.

Parameters
  • measures (~_Measure) – The measures to query. If None, all the measures are queried.

  • levels (Union[~_Level, Sequence[~_Level], None]) – The levels to split on. If None, the value of the measures at the top of the cube is returned.

  • condition (Union[LevelCondition, MultiCondition, LevelIsInCondition, HierarchyIsInCondition, None]) –

    The filtering condition. Only conditions on level equality with a string are supported. For instance:

    • lvl["Country"] == "France"

    • (lvl["Country"] == "USA") & (lvl["Currency"] == "USD")

  • scenario (str) – The scenario to query.

  • timeout (int) – The query timeout in seconds.

Return type

QueryResult

Returns

The resulting DataFrame.

property schema

Schema of the cube’s stores as an SVG graph.

Note

Graphviz is required to display the graph. It can be installed with Conda: conda install graphviz or by following the download instructions.

Return type

Any

Returns

An SVG image in IPython and a Path to the SVG file otherwise.

setup_simulation(name, *, base_scenario='Base', levels=None, multiply=None, replace=None, add=None)

Create a simulation store for the given measures.

Simulations can have as many scenarios as desired.

The same measure cannot be passed in several methods.

Parameters
Return type

Simulation

Returns

The simulation on which scenarios can be made.

property shared_context

Context values shared by all the users.

Context values can also be set at query time, and per user, directly from the UI. The values in the shared context are the default ones for all the users.

  • queriesTimeLimit

    The number of seconds after which a running query is cancelled and its resources reclaimed. Set to -1 to remove the limit. Defaults to 30s.

  • queriesResultLimit.intermediateSize

    The limit number of point locations for a single intermediate result. This works as a safe-guard to prevent queries from consuming too much memory, which is especially useful when going to production with several simulatenous users on the same server. Set to -1 to use the maximum limit. In atoti, the maximum limit is the default while in Atoti+ it defaults to 1000000.

  • queriesResultLimit.tansientResultSize

    Similar to intermediateSize but across all the intermediate results of the same query. Set to -1 to use the maximum limit. In atoti, the maximum limit is the default while in Atoti+ it defaults to 10000000.

Example

>>> df = pd.DataFrame(
...     columns=["City", "Price"],
...     data=[
...         ("London", 240.0),
...         ("New York", 270.0),
...         ("Paris", 200.0),
...     ],
... )
>>> store = session.read_pandas(
...     df, keys=["City"], store_name="SharedContext"
... )
>>> cube = session.create_cube(store)
>>> cube.shared_context["queriesTimeLimit"] = 60
>>> cube.shared_context["queriesResultLimit.intermediateSize"] = 1000000
>>> cube.shared_context["queriesResultLimit.transientSize"] = 10000000
>>> cube.shared_context
queriesTimeLimit: 60
queriesResultLimit.intermediateSize: 1000000
queriesResultLimit.transientSize: 10000000
Return type

CubeContext

property simulations

Simulations of the cube.

Return type

Simulations

class atoti.cube.CubeContext(_java_api, _cube)

Bases: MutableMapping[str, str]

clear() → None. Remove all items from D.
get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() → an object providing a view on D’s values

atoti.cubes module

class atoti.cubes.Cubes(_java_api, _cubes=<factory>)

Bases: MutableMapping[str, atoti.cube.Cube]

Manage the cubes of the session.

clear() → None. Remove all items from D.
get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() → an object providing a view on D’s values

atoti.exceptions module

Custom atoti exceptions.

They disguise the unhelpful Py4J stack traces occuring when Java throws an exception. If any other exception is raised by the code inside the custom hook, it is processed normally.

exception atoti.exceptions.AtotiException

Bases: Exception

The generic atoti exception class.

All exceptions which inherit from this class will be treated differently when raised. However, this exception is still handled by the default excepthook.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception atoti.exceptions.AtotiJavaException(message, java_traceback, java_exception)

Bases: atoti.exceptions.AtotiException

Exception thrown when Py4J throws a Java exception.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception atoti.exceptions.AtotiNetworkException

Bases: atoti.exceptions.AtotiException

Exception thrown when Py4J throws a network exception.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception atoti.exceptions.AtotiPy4JException

Bases: atoti.exceptions.AtotiException

Exception thrown when Py4J throws a Py4JError.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception atoti.exceptions.NoCubeStartedException

Bases: Exception

Exception thrown when an action requires a cube to be strated but it is not.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

atoti.hierarchies module

class atoti.hierarchies.Hierarchies(_java_api, _cube)

Bases: atoti._mappings.DelegateMutableMapping[Tuple[str, str], atoti.hierarchy.Hierarchy]

Manage the hierarchies.

clear() → None. Remove all items from D.
get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items()

Return a set-like object providing a view on the items.

Return type

AbstractSet[Tuple[~_Key, ~_Value]]

keys()

Return a set-like object providing a view on the keys.

Return type

AbstractSet[~_Key]

pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()

Return an object providing a view on the values.

Return type

ValuesView[~_Value]

atoti.hierarchies.convert_key(key)

Get the dimension and hierarchy from the key.

Return type

Tuple[Optional[str], str]

atoti.hierarchies.multiple_hierarchies_error(key, hierarchies)

Get the error to raise when multiple hierarchies match the key.

Return type

KeyError

atoti.hierarchy module

class atoti.hierarchy.Hierarchy(_name, _levels, _dimension, _slicing, _cube, _java_api, _visible)

Bases: object

Hierarchy of a Cube.

property dimension

Name of the dimension of the hierarchy.

Return type

str

isin(*member_paths)

Return a condition to check that the hierarchy is on one of the given members.

Considering hierarchy_1 containing level_1 and level_2, hierarchy_1.isin((a, x), (b,)) is equivalent to ((level_1 == a) & (level_2 == x)) | (level_1 == b).

Example

Considering a “Geography” hierarchy containing two levels “Country” and “City”, and this measure:

measures["Price in USA/Paris and Germany"] = atoti.filter(
    measures["Price"],
    hierarchies["Geography"].isin(("USA", "Paris"), ("Germany", ))
)

The behavior is the following one:

Country

City

Price

measures[“Price in USA/Paris and Germany”]

France

Paris

200.0

Germany

Berlin

150.0

150.0

UK

London

240.0

USA

New York

270.0

USA

Paris

500.0

500.0

Parameters

members – One or more members expressed as tuples on which the hierarchy should be. Each element in a tuple corresponds to a level of the hierarchy, from the shallowest to the deepest.

Return type

HierarchyIsInCondition

property levels

Levels of the hierarchy.

Return type

Mapping[str, Level]

property name

Name of the hierarchy.

Return type

str

property slicing

Whether the hierarchy is slicing or not.

Return type

bool

property visible

Whether the hierarchy is visible or not.

Return type

bool

atoti.level module

class atoti.level.Level(_name, _column_name, _data_type, _hierarchy=None, _comparator=None)

Bases: atoti.measure.MeasureConvertible

Level of a Hierarchy.

property comparator

Comparator of the level.

Return type

Optional[Comparator]

property data_type

Type of the level members.

Return type

DataType

property dimension

Name of the dimension holding the level.

Return type

str

property hierarchy

Name of the hierarchy holding the level.

Return type

str

isin(*members)

Return a condition to check that the level is on one of the given members.

lvl["x"].isin("a", "b") is equivalent to lvl["x"] == "a" OR lvl["x"] == "b".

Parameters

members (Any) – One or more members on which the level should be.

Example

>>> df = pd.DataFrame(
...     columns=["City", "Price"],
...     data=[
...         ("Berlin", 150.0),
...         ("London", 240.0),
...         ("New York", 270.0),
...         ("Paris", 200.0),
...     ],
... )
>>> store = session.read_pandas(df, keys=["City"], store_name="Cities")
>>> cube = session.create_cube(store)
>>> lvl, m = cube.levels, cube.measures
>>> m["Price.SUM in Berlin"] = tt.filter(
...     m["Price.SUM"], lvl["City"].isin("Berlin")
... )
>>> m["Price.SUM in London and Paris"] = tt.filter(
...     m["Price.SUM"], lvl["City"].isin("London", "Paris")
... )
>>> cube.query(
...     m["Price.SUM"],
...     m["Price.SUM in Berlin"],
...     m["Price.SUM in London and Paris"],
...     levels=lvl["City"],
... )
        Price.SUM Price.SUM in Berlin Price.SUM in London and Paris
City
Berlin      150.00              150.00                          <NA>
London      240.00                <NA>                        240.00
New York    270.00                <NA>                          <NA>
Paris       200.00                <NA>                        200.00
Return type

LevelIsInCondition

property name

Name of the level.

Return type

str

atoti.levels module

class atoti.levels.Levels(_hierarchies)

Bases: atoti._base_levels.BaseLevels[atoti.level.Level, atoti.hierarchies.Hierarchies]

Flat representation of all the levels in the cube.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
values() → an object providing a view on D’s values

atoti.logs module

class atoti.logs.Logs(lines)

Bases: object

Lines of logs.

lines: Collection[str]

Lines of logs.

atoti.measure module

class atoti.measure.Measure

Bases: abc.ABC

A measure is a mostly-numeric data value, computed on demand for aggregation purposes.

Measures can be compared to other objects, such as a literal value, a atoti.level.Level, or another measure. The returned measure represents the outcome of the comparison and this measure can be used as a condition. If the measure’s value is None when evaluating a conditon, the returned value will be False.

Example

>>> df = pd.DataFrame(
...     columns=["Id", "Value", "Threshold"],
...     data=[
...         (0, 1.0, 5.0),
...         (1, 2.0, None),
...         (2, 3.0, 3.0),
...         (3, 4.0, None),
...         (4, 5.0, 1.0),
...     ],
... )
>>> store = session.read_pandas(df, store_name="Thresholds", keys=["Id"])
>>> cube = session.create_cube(store)
>>> lvl, m = cube.levels, cube.measures
>>> m["Condition"] = m["Value.SUM"] > m["Threshold.SUM"]
>>> cube.query(m["Condition"], levels=lvl["Id"])
        Condition
Id
0   false
1   false
2   false
3   false
4    true
class atoti.measure.MeasureConvertible

Bases: abc.ABC

Instances of this class can be converted to measures.

atoti.measures module

class atoti.measures.Measures(_java_api, _cube)

Bases: atoti._mappings.DelegateMutableMapping[str, atoti.named_measure.NamedMeasure]

Manage the measures.

clear() → None. Remove all items from D.
get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items()

Return a set-like object providing a view on the items.

Return type

AbstractSet[Tuple[~_Key, ~_Value]]

keys()

Return a set-like object providing a view on the keys.

Return type

AbstractSet[~_Key]

pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values()

Return an object providing a view on the values.

Return type

ValuesView[~_Value]

atoti.named_measure module

class atoti.named_measure.NamedMeasure(_name, _data_type, _cube, _java_api, _folder=None, _formatter=None, _visible=True, _description=None)

Bases: atoti.measure.Measure

A named measure is a measure that has been published to the cube.

property data_type

Type of the measure members.

Return type

DataType

property description

Description of the measure.

Return type

Optional[str]

property folder

Folder of the measure.

It can be changed by assigning a new value to the property (None to clear it).

Return type

Optional[str]

property formatter

Formatter of the measure.

It can be changed by assigning a new value to the property (None to clear it).

Examples

  • DOUBLE[0.00%] for percentages

  • DOUBLE[#,###] to remove decimals

  • DOUBLE[$#,##0.00] for dollars

  • DATE[yyyy-MM-dd HH:mm:ss] for datetimes

The spec for the pattern between the DATE or DOUBLE’s brackets is the one from Microsoft Analysis Services. The formatter only impacts how the measure is displayed, derived measures will still be computed from unformatted value. To round a measure, use atoti.math.round() instead.

atoti provides an extra formatter for array measures:
  • ARRAY['|';1:3] this formatter allows you to choose the separator to use (| in this example), and the slice of the array to display.

Return type

Optional[str]

property name

Name of the measure.

Return type

str

property visible

Whether the measure is visible or not.

It can be toggled by assigning a new boolean value to the property.

Return type

bool

atoti.report module

Reports of data loaded into stores.

Each store has a global loading_report made of several indiviual loading reports.

When an error occures while loading data, a warning is displayed. These warnings can be disabled like this:

import logging
logging.getLogger("atoti.loading").setLevel("ERROR")
class atoti.report.LoadingReport(name, source, loaded, errors, duration, error_messages)

Bases: object

Report about the loading of a single file or operation.

duration: int

Duration of the loading in milliseconds.

error_messages: List[str]

Messages of the errors.

errors: int

Number of errors.

loaded: int

Number of loaded lines.

name: str

Name of the loaded file or operation.

source: str

Source used to load the data.

class atoti.report.StoreReport(store_name, reports)

Bases: object

Report about the data loaded into a store.

It is made of several LoadingReport.

property error_messages

Error messages.

Return type

List[str]

reports: List[atoti.report.LoadingReport]

Reports of indiviadual loading.

store_name: str
property total_errors

Total number of errors.

Return type

int

property total_loaded

Total number of loaded rows.

Return type

int

atoti.sampling module

Sampling modes describe how data is loaded into stores.

atoti can handle very large volumes of data while still providing fast answers to queries. However, loading a large amount of data during the modeling phase of the application is rarely a good idea because creating stores, joins, cubes, hierarchies and measures are all operations that take more time when there is more data.

atoti speeds up the sampling process by incoporating an automated sampling mechanism.

For instance, datasets can be automatically sampled on their first lines while working on the model and then switched to the full dataset when the project is ready to be shared with other users.

By reducing the amount of data, sampling is a way to have immediate feedback for each cell run in a notebook and keep the modeling phase as snappy as possible.

As a rule of thumb:

  • sampling is always recommended while building a project.

  • load_all_data() should be called as late as possible.

atoti.sampling.FULL = SamplingMode(name='full', parameters=[])

Load all the data in all the stores.

class atoti.sampling.SamplingMode(name, parameters)

Bases: atoti.config._utils.Configuration

Mode of source loading.

name: str

Name of the sampling mode.

parameters: List[Any]

Sampling parameters (number of lines, number of files, …).

atoti.sampling.first_files(limit)

Mode to load only the first files of the source.

Parameters

limit (int) – The maximum number of files to read.

Return type

SamplingMode

atoti.sampling.first_lines(limit)

Mode to load only the first lines of the source.

Parameters

limit (int) – The maximum number of lines to read.

Return type

SamplingMode

atoti.session module

class atoti.session.Session(name, *, config, **kwargs)

Bases: atoti._local_session.ALocalSession

Holds a connection to the Java gateway.

close()

Close this session and free all the associated resources.

Return type

None

property closed

Return whether the session is closed or not.

Return type

bool

create_cube(base_store, name=None, *, mode='auto')

Create a cube using based on the passed store.

Parameters
  • base_store (Store) – The cube’s base store.

  • name (Optional[str]) – The name of the created cube. Defaults to the name of the base store.

  • mode (Literal[‘auto’, ‘manual’, ‘no_measures’]) –

    The cube creation mode:

    • auto: Creates hierarchies for every non-numeric column, and measures for every numeric column.

    • manual: Does not create any hierarchy or measure (except from the count).

    • no_measures: Creates the hierarchies like auto but does not create any measures.

    For stores with hierarchized_columns specified, these will be converted into hierarchies regardless of the cube creation mode.

See also

Hierarchies and measures created by a join().

Return type

Cube

create_scenario(name, *, origin='Base')

Create a new source scenario in the datastore.

Parameters
  • name (str) – The name of the scenario.

  • origin (str) – The scenario to fork.

create_store(types, store_name, *, keys=None, partitioning=None, sampling_mode=None, hierarchized_columns=None)

Create a store from a schema.

Parameters
  • types (Mapping[str, DataType]) – Types for all columns of the store. This defines the columns which will be expected in any future data loaded into the store.

  • store_name (str) – The name of the store to create.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • sampling_mode (Optional[SamplingMode]) – The sampling mode. Defaults to this session’s one.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Return type

Store

property cubes

Cubes of the session.

Return type

Cubes

delete_scenario(scenario)

Delete the source scenario with the provided name if it exists.

Return type

None

endpoint(route, *, method='GET')

Create a custom endpoint at f"{session.url}/atoti/pyapi/{route}".

The decorated function must take three arguments with types User, HttpRequest and Session and return a response body as a Python data structure that can be converted to JSON. DELETE, POST, and PUT requests can have a body but it must be JSON.

Path parameters can be configured by wrapping their name in curly braces in the route.

Example:

@session.endpoint("simple_get")
def callback(request: HttpRequest, user: User, session: Session):
    return "something that will be in response.data"


@session.endpoint(f"simple_post/{store_name}", method="POST")
def callback(request: HttpRequest, user: User, session: Session):
    return request.path_parameters.store_name
Parameters
  • route (str) – The path suffix after /atoti/pyapi/. For instance, if custom/search is passed, a request to /atoti/pyapi/custom/search?query=test#results will match. The route should not contain the query (?) or fragment (#).

  • method (Literal[‘POST’, ‘GET’, ‘PUT’, ‘DELETE’]) – The HTTP method the request must be using to trigger this endpoint.

Return type

Any

property excel_url

URL of the Excel endpoint.

To connect to the session in Excel, create a new connection to an Analysis Services. Use this URL for the server field and choose to connect with “User Name and Password”:

  • Without authentication, leave these fields blank.

  • With Basic authentication, fill them with your username and password.

  • Other authentication types (such as Auth0) are not supported by Excel.

Return type

str

explain_mdx_query(mdx, *, timeout=30)

Explain an MDX query.

Parameters
  • mdx (str) – The MDX SELECT query to execute.

  • timeout (int) – The query timeout in seconds.

Return type

QueryAnalysis

export_translations_template(path)

Export a template containing all translatable values in the session’s cubes.

Parameters

path (Union[str, Path]) – The path at which to write the template.

load_all_data()

Trigger the full loading of the data.

Calling this method will change the sampling mode to atoti.sampling.FULL which triggers the loading of all the data. All subsequent loads, including new stores, will not be sampled.

When building a project, this method should be called as late as possible.

property logs_path

Path to the session logs file.

Return type

Path

logs_tail(n=20)

Return the n last lines of the logs or all the lines if n <= 0.

Return type

Logs

property name

Name of the session.

Return type

str

property port

Port on which the session is exposed.

Can be set in SessionConfiguration.

Return type

int

query_mdx(mdx, *, timeout=30)

Execute an MDX query and return its result as a pandas DataFrame.

Resulting cells representing totals are ignored, they will not be part of the returned DataFrame. Members for which all the measures are None are ignored too.

Example

An MDX query that would be displayed as this pivot table:

Country

Total Price.SUM

2018-01-01

2019-01-01

2019-01-02

2019-01-05

Price.SUM

Price.SUM

Price.SUM

Price.SUM

Total Country

2,280.00

840.00

1,860.00

810.00

770.00

China

760.00

410.00

350.00

France

1,800.00

480.00

500.00

400.00

420.00

India

760.00

360.00

400.00

UK

960.00

960.00

will return this DataFrame:

Date

Country

Price.SUM

2019-01-02

China

410.0

2019-01-05

China

350.0

2018-01-01

France

480.0

2019-01-01

France

500.0

2019-01-02

France

400.0

2019-01-05

France

420.0

2018-01-01

India

360.0

2019-01-01

India

400.0

2019-01-01

UK

960.0

Parameters
  • mdx (str) – The MDX SELECT query to execute.

  • timeout (int) – The query timeout in seconds.

Return type

QueryResult

read_csv(path, *, keys=None, store_name=None, in_all_scenarios=True, sep=None, encoding='utf-8', process_quotes=None, partitioning=None, types=None, watch=False, array_sep=None, sampling_mode=None, hierarchized_columns=None)

Read a CSV file into a store.

Parameters
  • path (Union[str, Path]) –

    The path to the CSV file or directory to load.

    If a path pointing to a directory is provided, all of the files with the .csv extension in the directory and subdirectories will be loaded into the same store and, as such, they are all expected to share the same schema.

    .gz, .tar.gz and .zip files containing compressed CSV(s) are also supported.

    The path can contain glob parameters (e.g. path/to/directory/**.*.csv) and will be expanded correctly. Be careful, when using glob expressions in paths, all files which match the expression will be loaded, regardless of their extension. When the provided path is a directory, the default glob parameter of **.csv is used.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • store_name (Optional[str]) – The name of the store to create. Defaults to the final component of the given path.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • sep (Optional[str]) – The delimiter to use. If None, the separator will automatically be detected.

  • encoding (str) – The encoding to use to read the CSV.

  • process_quotes (Optional[bool]) –

    Whether double quotes should be processed to follow the official CSV specification:

    • True:

      • Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

      • A double quote appearing inside a field must be escaped by preceding it with another double quote.

      • Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.

    • False: all double-quotes within a field will be treated as any regular character, following Excel’s behavior. In this mode, it is expected that fields are not enclosed in double quotes. It is also not possible to have a line break inside a field.

    • None: The behavior will be inferred from the first lines of the CSV file.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • types (Optional[Mapping[str, DataType]]) – Types for some or all columns of the store. Types for non specified columns will be inferred from the first 1,000 lines.

  • watch (bool) – Whether the source file or directory at the given path should be watched for changes. When set to True, changes to the source will automatically be reflected in the store. If the source is a directory, new files will be loaded into the same store as the initial data and must therefore have the same schema as the initial data as well. Any non-CSV files added to the directory will be ignored.

  • array_sep (Optional[str]) – The delimiter to use for arrays. Setting it to a non-None value will parse all the columns containing this separator as arrays.

  • sampling_mode (Optional[SamplingMode]) – The sampling mode. Defaults to this session’s one.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Return type

Store

Returns

The created store holding the content of the CSV file(s).

read_numpy(array, columns, store_name, *, keys=None, in_all_scenarios=True, partitioning=None, hierarchized_columns=None, **kwargs)

Read a NumPy 2D array into a new store.

Parameters
  • array (ndarray) – The NumPy 2D ndarray to read the data from.

  • columns (Sequence[str]) – The names to use for the store’s columns. They must be in the same order as the values in the NumPy array.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • store_name (str) – The name of the store to create.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Return type

Store

Returns

The created store holding the content of the array.

read_pandas(dataframe, store_name, *, keys=None, in_all_scenarios=True, partitioning=None, types=None, hierarchized_columns=None, **kwargs)

Read a pandas DataFrame into a store.

All the named indices of the DataFrame are included into the store. Multilevel columns are flattened into a single string name.

Parameters
  • dataframe (DataFrame) – The DataFrame to load.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • store_name (str) – The name of the store to create.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • types (Optional[Mapping[str, DataType]]) – Types for some or all columns of the store. Types for non specified columns will be inferred.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Return type

Store

Returns

The created store holding the content of the DataFrame.

read_parquet(path, *, keys=None, store_name=None, in_all_scenarios=True, partitioning=None, sampling_mode=None, watch=False, hierarchized_columns=None)

Read a Parquet file into a store.

Parameters
  • path (Union[str, Path]) – The path to the Parquet file or directory. If the path points to a directory, all the files in the directory and subdirectories will be loaded into the store and, as such, are expected to have the same schema as the store and to be Parquet files.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • store_name (Optional[str]) – The name of the store to create. Defaults to the final component of the given path.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • sampling_mode (Optional[SamplingMode]) – The sampling mode. Defaults to this session’s one.

  • watch (bool) – Whether the source file or directory at the given path should be watched for changes. When set to True, changes to the source will automatically be reflected in the store. If the source is a directory, new files will be loaded into the same store as the initial data and must therefore have the same schema as the initial data as well.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Return type

Store

Returns

The created store holding the content of the Parquet file(s).

read_spark(dataframe, store_name, *, keys=None, in_all_scenarios=True, partitioning=None, hierarchized_columns=None)

Read a Spark DataFrame into a store.

Parameters
  • dataframe – The DataFrame to load.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • store_name (str) – The name of the store to create.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Return type

Store

Returns

The created store holding the content of the DataFrame.

read_sql(url, query, *, username, password, driver=None, store_name, keys=None, partitioning=None, types=None, hierarchized_columns=None)

Create a store from the result of the passed SQL query.

Note

This method requires the atoti-sql plugin.

Parameters
  • url (Union[Path, str]) –

    The URL of the database. For instance:

    • mysql:localhost:7777/example

    • h2:/home/user/database/file/path

  • query (str) – A SQL query which result is used to build a store.

  • username (str) – The username used to connect to the database.

  • password (str) – The password used to connect to the database.

  • driver (Optional[str]) – The JDBC driver used to load the data. If None, the driver is inferred from the URL. Drivers can be found in the atoti_sql.drivers module.

  • store_name (str) – The name of the store to create.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • types (Optional[Mapping[str, DataType]]) – Types for some or all columns of the store. Types for non specified columns will be inferred from the SQL types.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Example

>>> store = session.read_sql(
...     f"h2:{RESOURCES}/h2-database",
...     "SELECT * FROM MYTABLE;",
...     username="root",
...     password="pass",
...     store_name="Cities",
...     keys=["ID"],
... )
Return type

Store

property scenarios

Collection of source scenarios of the session.

Return type

Collection[str]

property stores

Stores of the session.

Return type

Stores

property url

Public URL of the session.

Can be set in SessionConfiguration.

Return type

str

visualize(name=None)

Display an atoti widget to explore the session interactively.

Note

This method requires the atoti-jupyterlab plugin.

The widget state will be stored in the cell metadata. This state should not have to be edited but, if desired, it can be found in JupyterLab by opening the “Notebook tools” sidebar and expanding the the “Advanced Tools” section.

Parameters

name (Optional[str]) – The name to give to the widget.

wait()

Wait for the underlying server subprocess to terminate.

This will prevent the Python process to exit.

Return type

None

atoti.simulation module

class atoti.simulation.Scenario(name, _simulation, _java_api)

Bases: object

A scenario for a simulation.

append(*rows)

Add one or multiple rows to the scenario.

If a row with the same keys already exist in the scenario, it will be overridden by the passed one.

Parameters

rows (Union[Tuple[Any, …], Mapping[str, Any]]) –

The rows to add. Rows can either be:

  • Tuples of values in the correct order.

  • Column name to value mappings.

All rows must share the shame shape.

property columns

Columns of the scenario.

Return type

Sequence[str]

property columns_without_priority

Columns of the scenario (Priority column excluded).

Return type

Sequence[str]

head(n=5)

Return the first n rows of the scenario as a pandas DataFrame.

Return type

DataFrame

load_csv(path, *, sep=',')

Load a CSV into this scenario.

The expected columns are columns or columns_without_priority.

The name of the scenario is automatically added before the row is added to the simulation store.

If the value of a column is left empty (None), then it will be treated as a wildcard value. i.e. it will match all the values of the corresponding column when performing the simulation.

If a value for a column on a given row is empty, it will be treated as a wildcard, meaning that it will match all the values of the corresponding column when performing the simulation.

Parameters
  • path (Union[Path, str]) – The path to the CSV file.

  • sep (Optional[str]) – The CSV separator character. If None, it is inferred by pandas.

load_pandas(dataframe, **kwargs)

Load a pandas DataFrame into this scenario.

The expected columns are columns or columns_without_priority.

The name of the scenario is automatically added before the row is added to the simulation store.

If the value of a column is left empty (None), then it will be treated as a wildcard value. i.e. it will match all the values of the corresponding column when performing the simulation.

Parameters

dataframe (DataFrame) – The DataFrame to load.

name: str

Name of the scenario.

class atoti.simulation.Simulation(_name, _levels, _multiply, _replace, _add, _base_scenario, _cube, _java_api)

Bases: object

Represents a simulation.

property columns

Columns of the simulation.

Return type

Sequence[str]

property columns_without_priority

Columns of the simulation (Priority column excluded).

Return type

Sequence[str]

head(n=5)

Return the first n rows of the simulation as a pandas DataFrame.

Return type

DataFrame

property levels

Levels of the simulation.

Return type

Sequence[Level]

load_csv(path, *, sep=None, encoding='utf-8', process_quotes=True, watch=False, array_sep=None)

Load a CSV into this simulation.

The expected columns are columns or columns_without_priority.

The value provided for simulation_name is the name of the scenario the values will be loaded into.

If the value of a column is left empty (None), then it will be treated as a wildcard value, meaning that it will match all the values of the corresponding column when performing the simulation.

If a value for a specific field is left empty, it wil be treated as a wildcard value, meaning that it will match all the values of the corresponding column when performing the simulation.

Parameters
  • path (Union[Path, str]) –

    The path to the CSV file or directory to load.

    If a path pointing to a directory is provided, all of the files with the .csv extension in the directory and subdirectories will be loaded into the same store and, as such, they are all expected to share the same schema.

    .gz, .tar.gz and .zip files containing compressed CSV(s) are also supported.

    The path can contain glob parameters (e.g. path/to/directory/**.*.csv) and will be expanded correctly. Be careful, when using glob expressions in paths, all files which match the expression will be loaded, regardless of their extension. When the provided path is a directory, the default glob parameter of **.csv is used.

  • sep (Optional[str]) – The delimiter to use. If None, the separator will automatically be detected.

  • encoding (str) – The encoding to use to read the CSV.

  • process_quotes (bool) –

    Whether double quotes should be processed to follow the official CSV specification:

    • True:

      • Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

      • A double quote appearing inside a field must be escaped by preceding it with another double quote.

      • Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.

    • False: all double-quotes within a field will be treated as any regular character, following Excel’s behavior. In this mode, it is expected that fields are not enclosed in double quotes. It is also not possible to have a line break inside a field.

    • None: The behavior will be inferred from the first lines of the CSV file.

  • watch (bool) – Whether the source file or directory at the given path should be watched for changes. When set to True, changes to the source will automatically be reflected in the store. If the source is a directory, new files will be loaded into the same store as the initial data and must therefore have the same schema as the initial data as well. Any non-CSV files added to the directory will be ignored.

  • array_sep (Optional[str]) – The delimiter to use for arrays. Setting it to a non-None value will parse all the columns containing this separator as arrays.

load_pandas(dataframe, **kwargs)

Load a pandas DataFrame into this simulation.

The expected columns are columns or columns_without_priority.

The value provided for simulation_name is the name of the scenario the values will be loaded into.

If the value of a column is left empty (None), then it will be treated as a wildcard value, meaning that it will match all the values of the corresponding column when performing the simulation.

If the value of a column is left empty (None), it will be treated as a wildcard value, meaning that it will match all the values of the corresponding column when performing the simulation.

Parameters

dataframe (DataFrame) – The DataFrame to load.

property measure_columns

Measure columns of the simulation.

Return type

List[str]

property name

Name of the simulation.

Return type

str

property scenarios

Scenarios of the simulation.

Return type

SimulationScenarios

class atoti.simulation.SimulationScenarios(_simulation)

Bases: MutableMapping[str, atoti.simulation.Scenario]

Manage the scenarios of a simulation.

clear() → None. Remove all items from D.
get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() → an object providing a view on D’s values

atoti.simulations module

class atoti.simulations.Simulations(_java_api, _simulations=<factory>)

Bases: MutableMapping[str, atoti.simulation.Simulation]

Manage the simulations.

clear() → None. Remove all items from D.
get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() → an object providing a view on D’s values

atoti.store module

class atoti.store.Store(_name, _java_api, _scenario='Base', _columns=<factory>)

Bases: object

Represents a single store.

append(*rows, in_all_scenarios=False)

Add one or multiple rows to the store.

If a row with the same keys already exist in the store, it will be overridden by the passed one.

Parameters
  • rows (Union[Tuple[Any, …], Mapping[str, Any]]) –

    The rows to add. Rows can either be:

    • Tuples of values in the correct order.

    • Column name to value mappings.

    All rows must share the shame shape.

  • in_all_scenarios (bool) – Whether or not the data should be loaded into all of the store’s scenarios or only the current one.

property columns

Columns of the stores.

Return type

Sequence[str]

drop(*coordinates, in_all_scenarios=False)

Delete rows where the values for each column match those specified.

Each set of coordinates can only contain one value for each column. To specify mulitple values for one column, mulitple mappings must be passed.

Parameters
  • coordinates (Mapping[str, Any]) – Mappings between store columns and values. Rows which match the provided mappings will be deleted from the store.

  • in_all_scenarios (bool) – Whether or not the rows should be dropped on all of the store’s scenarios or just the current one.

head(n=5)

Return the first n rows of the store as a pandas DataFrame.

Return type

DataFrame

join(other, *, mapping=None)

Define a reference between this store and another.

There are two different possible situations when creating references:

  • All the key columns of the destination store are mapped: this is a normal reference.

  • Only some of the key columns of the destination store are mapped: this is a partial reference:

    • The columns from the source store used in the mapping must be attached to hierarchies.

    • The un-mapped key columns of the destination store will be converted into hierarchies.

Depending on the cube creation mode, the join will also generate different hierarchies and measures:

  • manual: The un-mapped keys of the destination store will become hierarchies.

  • no_measures: All of the non-numeric columns from the destination store, as well as those containing integers, will be converted into hierarchies. No measures will be created in this mode.

  • auto: The same hierarchies will be created as in the no_measures mode. Additionaly, columns containing numeric values, or arrays, except for columns which contain only integers, will be converted into measures.

Parameters
  • other (Store) – The other store to reference.

  • mapping (Optional[Mapping[str, str]]) – The column mapping of the reference. Defaults to the columns with the same names in the two stores.

property keys

Names of the key columns of the stores.

Return type

Sequence[str]

load_csv(path, *, sep=None, encoding='utf-8', process_quotes=True, in_all_scenarios=False, truncate=False, watch=False, array_sep=None)

Load a CSV into this scenario.

Parameters
  • path (Union[Path, str]) –

    The path to the CSV file or directory to load.

    If a path pointing to a directory is provided, all of the files with the .csv extension in the directory and subdirectories will be loaded into the same store and, as such, they are all expected to share the same schema.

    .gz, .tar.gz and .zip files containing compressed CSV(s) are also supported.

    The path can contain glob parameters (e.g. path/to/directory/**.*.csv) and will be expanded correctly. Be careful, when using glob expressions in paths, all files which match the expression will be loaded, regardless of their extension. When the provided path is a directory, the default glob parameter of **.csv is used.

  • sep (Optional[str]) – The delimiter to use. If None, the separator will automatically be detected.

  • encoding (str) – The encoding to use to read the CSV.

  • process_quotes (bool) –

    Whether double quotes should be processed to follow the official CSV specification:

    • True:

      • Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

      • A double quote appearing inside a field must be escaped by preceding it with another double quote.

      • Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.

    • False: all double-quotes within a field will be treated as any regular character, following Excel’s behavior. In this mode, it is expected that fields are not enclosed in double quotes. It is also not possible to have a line break inside a field.

    • None: The behavior will be inferred from the first lines of the CSV file.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • truncate (bool) – Whether to clear the store before loading the new data into it.

  • watch (bool) – Whether the source file or directory at the given path should be watched for changes. When set to True, changes to the source will automatically be reflected in the store. If the source is a directory, new files will be loaded into the same store as the initial data and must therefore have the same schema as the initial data as well. Any non-CSV files added to the directory will be ignored.

  • array_sep (Optional[str]) – The delimiter to use for arrays. Setting it to a non-None value will parse all the columns containing this separator as arrays.

load_kafka(bootstrap_server, topic, *, group_id, batch_duration=1000, consumer_config=None, deserializer=KafkaDeserializer(name='io.atoti.loading.kafka.impl.serialization.JsonDeserializer'))

Consume a Kafka topic and stream its records in the store.

Note

This method requires the atoti-kafka plugin.

The records’ key deserializer default to StringDeserializer.

Parameters
  • bootstrap_server (str) – host[:port] that the consumer should contact to bootstrap initial cluster metadata.

  • topic (str) – Topic to subscribe to.

  • group_id (str) – The name of the consumer group to join.

  • batch_duration (int) – Milliseconds spent batching received records before publishing them to the store. If 0, received records are immediately published to the store. Must not be negative.

  • consumer_config (Optional[Mapping[str, str]]) – Mapping containing optional parameters to set up the KafkaConsumer. The list of available params can be found here.

  • deserializer (KafkaDeserializer) – Deserialize Kafka records’ value to atoti store rows. Use atoti_kafka.create_deserializer() to create custom ones.

load_pandas(dataframe, *, in_all_scenarios=False, truncate=False, **kwargs)

Load a pandas DataFrame into this scenario.

Parameters
  • dataframe (DataFrame) – The DataFrame to load.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • truncate (bool) – Whether to clear the store before loading the new data into it.

load_parquet(path, *, in_all_scenarios=False, truncate=False, watch=False)

Load a Parquet file into this scenario.

Parameters
  • path (Union[Path, str]) – The path to the Parquet file or directory. If the path points to a directory, all the files in the directory and subdirectories will be loaded into the store and, as such, are expected to have the same schema as the store and to be Parquet files.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • truncate (bool) – Whether to clear the store before loading the new data into it.

  • watch (bool) – Whether the source file or directory at the given path should be watched for changes. When set to True, changes to the source will automatically be reflected in the store. If the source is a directory, new files will be loaded into the same store as the initial data and must therefore have the same schema as the initial data as well.

load_spark(dataframe, *, in_all_scenarios=False, truncate=False)

Load a Spark DataFrame into this scenario.

Parameters
  • dataframe – The dataframe to load.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • truncate (bool) – Whether to clear the store before loading the new data into it.

load_sql(url, query, *, username, password, driver=None, in_all_scenarios=False, truncate=False)

Load the result of the passed SQL query into the store.

Note

This method requires the atoti-sql plugin.

Parameters
  • url (Union[Path, str]) –

    The URL of the database. For instance:

    • mysql:localhost:7777/example

    • h2:/home/user/database/file/path

  • query (str) – A SQL query which result is used to build a store.

  • username (str) – The username used to connect to the database.

  • password (str) – The password used to connect to the database.

  • driver (Optional[str]) – The JDBC driver used to load the data. If None, the driver is inferred from the URL. Drivers can be found in the atoti_sql.drivers module.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • truncate (bool) – Whether to clear the store before loading the new data into it.

property loading_report

Store loading report.

Return type

StoreReport

property name

Name of the store.

Return type

str

property scenario

Scenario on which the store is.

Return type

NewType()(ScenarioName, str)

property scenarios

All the scenarios the store can be on.

Return type

StoreScenarios

property shape

Shape of the store.

Return type

Mapping[str, int]

property source_simulation_enabled

Whether source simulations are enabled on the store.

Return type

bool

class atoti.store.StoreScenarios(_java_api, _store)

Bases: object

Scenarios of a store.

load_csv(scenario_directory_path, *, sep=None, encoding='utf-8', process_quotes=True, truncate=False, watch=False, array_sep=None, pattern=None, base_scenario_directory='Base')

Load multiple CSV files into the store while automatically generating scenarios.

Loads the data from a directory into multiple scenarios, creating them as necessary, based on the directory’s structure. The contents of each sub-directory of the provided path will be loaded into a scenario with the same name. Here is an example of a valid directory structure:

ScenarioStore
├── Base
│   └── base_data.csv
├── Scenario1
│   └── scenario1_data.csv
└── Scenario2
│    └── scenario2_data.csv

With this structure:

  • The contents of the Base directory are loaded into the base scenario.

  • Two new scenarios are created: Scenario1 and Scenario2, containing respectively the data from scenario1_data.csv and scenario2_data.csv.

Parameters
  • scenario_directory_path (Union[Path, str]) – The path pointing to the directory containing all of the scenarios.

  • sep (Optional[str]) – The delimiter to use. If None, the separator will automatically be detected.

  • encoding (str) – The encoding to use to read the CSV.

  • process_quotes (bool) –

    Whether double quotes should be processed to follow the official CSV specification:

    • True:

      • Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

      • A double quote appearing inside a field must be escaped by preceding it with another double quote.

      • Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.

    • False: all double-quotes within a field will be treated as any regular character, following Excel’s behavior. In this mode, it is expected that fields are not enclosed in double quotes. It is also not possible to have a line break inside a field.

    • None: The behavior will be inferred from the first lines of the CSV file.

  • truncate (bool) – Whether to clear the store before loading the new data into it.

  • watch (bool) – Whether the source file or directory at the given path should be watched for changes. When set to True, changes to the source will automatically be reflected in the store. If the source is a directory, new files will be loaded into the same store as the initial data and must therefore have the same schema as the initial data as well. Any non-CSV files added to the directory will be ignored.

  • array_sep (Optional[str]) – The delimiter to use for arrays. Setting it to a non-None value will parse all the columns containing this separator as arrays.

  • pattern (Optional[str]) – A glob pattern used to specify which files to load in each scenario directory. If no pattern is provided, all files with the .csv extension will be loaded by default.

  • base_scenario_directory (str) – The data from a scenario directory with this name will be loaded into the base scenario and not a new scenario with the original name of the directory.

atoti.stores module

class atoti.stores.Stores(java_api, mapping)

Bases: atoti._mappings.ImmutableMapping[str, atoti.store.Store]

Manage the stores.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
property schema

Datastore schema as an SVG graph.

Note

Graphviz is required to display the graph. It can be installed with Conda: conda install graphviz or by following the download instructions.

Return type

Any

Returns

An SVG image in IPython and a Path to the SVG file otherwise.

values() → an object providing a view on D’s values

atoti.type module

class atoti.type.DataType(java_type, nullable)

Bases: object

atoti Type.

java_type: str

Name of the associated Java literal type.

nullable: bool

Whether the objects of this type can be None.

Elements within array types cannot be None and must share the same scalar type.

atoti.type.local_date(java_format, nullable=False)

Create a date type with the given Java date format.

atoti.type.local_date_time(java_format, nullable=False)

Create a datetime type with the given Java datetime format.

Module contents

atoti.at(measure, coordinates)

Return a measure equal to the passed measure at some other coordinates of the cube.

Parameters
  • measure (Measure) – The measure to take at other coordinates.

  • coordinates (Mapping[Level, Any]) –

    A {level_to shift_on: value_to_shift_to} mapping. Values can either be:

    • A literal matching an existing member of the key level:

      # Return the value of Quantity for France on each member of the Country level.
      atoti.at(m["Quantity"], {lvl["Country"]: "France"})
      
    • Another level whose current member the key level will be shifted to:

      # Return the value of Quantity for the current member
      # of the Target Country and Target City levels.
      atoti.at(m["Quantity"], {
          lvl["Country"]: lvl["Target Country"],
          lvl["City"]: lvl["Target City"],
      })
      

      If this other level is not expressed, the shifting will not be done.

atoti.create_session(self, name='Unnamed', *, config=None, **kwargs)

Create a session.

Parameters
  • name (str) – The name of the session.

  • config (Union[SessionConfiguration, Path, str, None]) –

    The session configuration regrouping all the aspects of the session that might change dependending on where it is deployed. It can be passed either as:

    • A Python object created with atoti.config.create_config().

    • A path to a YAML file, enabling the config to be changed without modifying the project’s code. Environment variables can be referenced (even recursively) in this file:

      >>> yaml_config = '''
      ... url_pattern: ${{ env.SOME_ENVIRONMENT_VARIABLE }}
      ... '''
      

Return type

Session

atoti.date_diff(from_date, to_date, *, unit='days')

Return a measure equal to the difference between two dates.

If one of the date is N/A then None is returned.

Parameters
  • from_date (Union[Measure, MeasureConvertible, date, datetime]) – The first date measure or object.

  • to_date (Union[Measure, MeasureConvertible, date, datetime]) – The second date measure or object.

  • unit (Literal[‘seconds’, ‘minutes’, ‘hours’, ‘days’, ‘weeks’, ‘months’, ‘years’]) – The difference unit. Seconds, minutes and hours are only allowed if the dates contain time information.

Example

>>> df = pd.DataFrame(
...     columns=["From", "To"],
...     data=[
...         ("2020-01-01", "2020-01-02"),
...         ("2020-02-01", "2020-02-21"),
...         ("2020-03-20", None),
...         ("2020-05-15", "2020-04-15"),
...     ],
... )
>>> store = session.read_pandas(df, store_name="Dates")
>>> cube = session.create_cube(store)
>>> lvl, m = cube.levels, cube.measures
>>> m["Diff"] = tt.date_diff(lvl["From"], lvl["To"])
>>> cube.query(
...     m["Diff"], m["contributors.COUNT"], levels=[lvl["From"], lvl["To"]]
... )
                    Diff contributors.COUNT
From       To
2020-01-01 2020-01-02     1                  1
2020-02-01 2020-02-21    20                  1
2020-03-20 N/A         <NA>                  1
2020-05-15 2020-04-15   -30                  1
Return type

Measure

atoti.date_shift(measure, on, offset, *, method='exact')

Return a measure equal to the passed mesure shifted to another date.

Parameters
  • measure (Measure) – The measure to shift.

  • on (Hierarchy) – The hierarchy to shift on. Only hierarchies with a single level of type date (or datetime) are supported. If one of the member of the hierarchy is N/A their shifted value will always be None.

  • offset (str) – The offset of the form xxDxxWxxMxxQxxY to shift by. Only the D, W, M, Q, and Y offset aliases are supported. Offset aliases have the same meaning as Pandas’.

  • method (Literal[‘exact’, ‘previous’, ‘following’, ‘interpolate’]) –

    Determine the value to use when there is no member at the shifted date:

    • exact: None.

    • previous: Value at the previous existing date.

    • following: Value at the following existing date.

    • interpolate: Linear interpolation of the values at the previous and following existing dates:

      Example:

      m2 = atoti.date_shift("m1", on=h["date"], offset="1M", method="interpolate")
      

      date

      m1

      m2

      explanation

      2000-01-05

      15

      10.79

      linear interpolation of 2000-02-03’s 10 and 2000-03-03’s 21 for 2000-02-05

      2000-02-03

      10

      21

      exact match at 2000-03-03: no need to interpolate

      2000-03-03

      21

      9.73

      linear interpolation of 2000-03-03’s 21 and 2000-04-05’s 9 for 2000-04-03

      2000-04-05

      9

      no record after 2000-04-05: cannot interpolate

Return type

Measure

atoti.filter(measure, condition)

Return a filtered measure.

The new measure is equal to the passed one where the condition is True and to None elsewhere.

Different types of conditions are supported:

  • Levels compared to literals of the same type:

    lvl["city"] == "Paris"
    lvl["date"] > datetime.date(2020,1,1)
    lvl["age"] <= 18
    
  • A conjunction of conditions using the & operator:

    (lvl["source"] == lvl["destination"]) & (lvl["city"] == "Paris")
    
Parameters
Return type

Measure

atoti.open_query_session(self, url, name=None, *, auth=None)

Open an existing session to query it.

This can be used to connect to:

  • Other sessions with another atoti process.

  • ActivePivot cubes built with a classic Java project, if version >= 5.7.0.

Parameters
Return type

QuerySession

atoti.parent_value(measure, on, *, apply_filters=False, total_value=None, degrees=None)

Return a measure equal to the passed measure at the parent member on the given hierarchies.

Example

Measure definitions:

m1 = parent_value(m["Quantity.SUM"], h["Date"]) = parent_value(m["Quantity.SUM"], h["Date"], degrees={h["Date"]: 1})
m2 = parent_value(m["Quantity.SUM"], h["Date"], degrees={h["Date"]: 3})
m3 = parent_value(m["Quantity.SUM"], h["Date"], degrees={h["Date"]: 3}, total_value=m["Quantity.SUM"]))
m4 = parent_value(m["Quantity.SUM"], h["Date"], degrees={h["Date"]: 3}, total_value=m["Other.SUM"]))

Considering a non slicing hierarchy Date with three levels Years, Month and Day:

Year

Month

Day

Quantity.SUM

Other.SUM

m1

m2

m3

m4

2019

75

1000

110

null

110

1500

7

35

750

75

null

110

1500

1

15

245

35

110

110

110

2

20

505

35

110

110

110

6

40

250

75

null

110

1500

1

25

115

40

110

110

110

2

15

135

40

110

110

110

2018

35

500

110

null

110

1500

7

15

200

35

null

110

1500

1

5

55

15

110

110

110

2

10

145

15

110

110

110

6

20

300

35

null

110

1500

1

15

145

20

110

110

110

2

5

155

20

110

110

110

Considering a slicing hierarchy Date with three levels Years, Month and Day:

Year

Month

Day

Quantity.SUM

Other.SUM

m1

m2

m3

m4

2019

75

1000

75

null

75

1000

7

35

750

75

null

75

1000

1

15

245

35

75

75

75

2

20

505

35

75

75

75

6

40

250

75

null

75

1000

1

25

115

40

75

75

75

2

15

135

40

75

75

75

2018

35

500

35

null

35

500

7

15

200

35

null

35

500

1

5

55

15

35

35

35

2

10

145

15

35

35

35

6

20

300

35

null

35

500

1

15

145

20

35

35

35

2

5

155

20

35

35

35

Parameters

See also

atoti.total() to take the value at the top level member on each given hierarchy.

Return type

Measure

atoti.rank(measure, hierarchy, ascending=True, apply_filters=True)

Return a measure equal to the rank of a hierarchy’s members according to a reference measure.

Members with equal values are further ranked using the level comparator.

Example:

m2 = atoti.rank(m1, hierarchy["date"])

Year

Month

Day

m1

m2

Comments

2000

90

1

01

25

2

01

15

1

02

10

2

02

50

1

01

30

1

same value as 2000/02/05 but this member comes first

03

20

3

05

30

2

same value as 2000/02/01 but this member comes last

04

15

3

05

5

2

05

10

1

Parameters
  • measure (Measure) – The measure on which the ranking is done.

  • hierarchy (Hierarchy) – The hierarchy containing the members to rank.

  • ascending (bool) – When set to False, the 1st place goes to the member with greatest value.

  • apply_filters (bool) – When True, query filters will be applied before ranking members. When False, query filters will be applied after the ranking, resulting in “holes” in the ranks.

Return type

Measure

atoti.shift(measure, on, *, offset=1)

Return a measure equal to the passed measure shifted to another member.

Parameters
  • measure (Measure) – The measure to shift.

  • on (Level) – The level to shift on.

  • offset (int) – The amount of members to shift by.

Return type

Measure

atoti.total(measure, *hierarchies)

Return a measure equal to the passed measure at the top level member on each given hierarchy.

It ignores the filters on this hierarchy.

If the hierarchy is not slicing, total is equal to the value for all the members. If the hierarchy is slicing, total is equal to the value on the first level.

Example

Considering a hierarchy Date with three levels Year, Month and Day. In the first case Date is not slicing. In the second case Date is slicing.

Year

Month

Day

Price

total(Price) NON SLICING

total(Price) SLICING

2019

75.0

110.0

75.0

7

35.0

110.0

75.0

1

15.0

110.0

75.0

2

20.0

110.0

75.0

6

40.0

110.0

75.0

1

25.0

110.0

75.0

2

15.0

110.0

75.0

2018

35.0

110.0

35.0

7

15.0

110.0

35.0

1

5.0

110.0

35.0

2

10.0

110.0

35.0

6

20.0

110.0

35.0

1

15.0

110.0

35.0

2

5.0

110.0

35.0

Parameters
  • measure (Measure) – The measure to take the total of.

  • hierarchies (Hierarchy) – The hierarchies on which to find the top-level member.

Return type

Measure

atoti.value(column, *, levels=None)

Return a measure equal to the value of the given store column.

By default, the measure will be None if the levels corresponding to the store keys are not expressed. This can be changed by specifying another collection of levels above which the measure will be None. If all members of a level have the same value, then this value will propagate to the parent level in the query.

Example

Considering this dataset with, for each region and product (as store keys), the product price:

RegionId

ProductId

Price

R1

P1

10

R1

P2

10

R2

P1

10

R2

P2

8

The Product Price measure can be defined and behaves like this:

m["Product Price"] = atoti.value(store["Price"])

RegionId

ProductId

Product Price

ALL

R1

P1

10

P2

10

R2

P1

10

P2

8

To propagate similar values to the RegionId level when applicable, the measure can instead be defined as follows:

m["Product Price"] = atoti.value(store["Price"], levels=lvl["RegionId"])

RegionId

ProductId

Product Price

ALL

R1

10

P1

10

P2

10

R2

P1

10

P2

8

With this definition, if all products of a region have the same price then the region inherits that price. Note that the opposite is not true:

ProductId

RegionId

Product Price

ALL

P1

R1

10

R2

10

P2

R1

10

R2

8

Finally, to propagate the value on all levels when possible, pass an empty collection to levels:

m["Product Price"] = atoti.value(store["Price"], levels=[]])

ProductId

RegionId

Product Price

ALL

P1

10

R1

10

R2

10

P2

R1

10

R2

8

Parameters
  • column (Column) – The store column to aggregate.

  • levels (Optional[Collection[Level]]) – The levels that must be expressed for this measure to possibly be non-null.

Return type

Measure

atoti.where(condition, true_measure, false_measure=None)

Return a conditional measure.

This function is like an if-then-else statement:

  • Where the condition is True, the new measure will be equal to true_measure.

  • Where the condition is False, the new measure will be equal to false_measure.

If one of the values compared in the condition is None, the condition will be considered False.

Different types of conditions are supported:

  • Measures compared to anything measure-like:

    m["Test"] == 20
    
  • Levels compared to levels, (if the level is not expressed, it is considered None):

    lvl["source"] == lvl["destination"]
    
  • Levels compared to literals of the same type:

    lvl["city"] == "Paris"
    lvl["date"] > datetime.date(2020,1,1)
    lvl["age"] <= 18
    
  • A conjunction or disjunction of conditions using the & operator or | operator:

    (m["Test"] == 20) & (lvl["city"] == "Paris")
    (lvl["Country"] == "USA") | (lvl["Currency"] == "USD")
    
Parameters

Example

>>> df = pd.DataFrame(
...     columns=["Id", "City", "Value"],
...     data=[
...         (0, "Paris", 1.0),
...         (1, "Paris", 2.0),
...         (2, "London", 3.0),
...         (3, "London", 4.0),
...         (4, "Paris", 5.0),
...     ],
... )
>>> store = session.read_pandas(df, store_name="Cities", keys=["Id"])
>>> cube = session.create_cube(store)
>>> lvl, m = cube.levels, cube.measures
>>> m["Paris value"] = tt.where(lvl["City"] == "Paris", m["Value.SUM"], 0)
>>> cube.query(m["Paris value"], levels=lvl["City"])
        Paris value
City
London  .00
Paris   8.00
Return type

Measure