atoti.session module

class atoti.session.Session(name, *, config, **kwargs)

Bases: atoti._local_session.LocalSession[atoti.cubes.Cubes]

Holds a connection to the Java gateway.

close()

Close this session and free all the associated resources.

Return type

None

property closed

Return whether the session is closed or not.

Return type

bool

create_cube(base_store, name=None, *, mode='auto')

Create a cube based on the passed store.

Parameters
  • base_store (Store) – The base store of the cube.

  • name (Optional[str]) – The name of the created cube. Defaults to the name of the base store.

  • mode (Literal[‘auto’, ‘manual’, ‘no_measures’]) –

    The cube creation mode:

    • auto: Creates hierarchies for every non-numeric column, and measures for every numeric column.

    • manual: Does not create any hierarchy or measure (except from the count).

    • no_measures: Creates the hierarchies like auto but does not create any measures.

    For stores with hierarchized_columns specified, these will be converted into hierarchies regardless of the cube creation mode.

See also

Hierarchies and measures created by a join().

Return type

Cube

create_scenario(name, *, origin='Base')

Create a new source scenario in the datastore.

Parameters
  • name (str) – The name of the scenario.

  • origin (str) – The scenario to fork.

create_store(types, store_name, *, keys=None, partitioning=None, sampling_mode=None, hierarchized_columns=None)

Create a store from a schema.

Parameters
  • types (Mapping[str, DataType]) – Types for all columns of the store. This defines the columns which will be expected in any future data loaded into the store.

  • store_name (str) – The name of the store to create.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • sampling_mode (Optional[SamplingMode]) – The sampling mode. Defaults to this session’s one.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Return type

Store

property cubes

Cubes of the session.

Return type

Cubes

delete_scenario(scenario)

Delete the source scenario with the provided name if it exists.

Return type

None

endpoint(route, *, method='GET')

Create a custom endpoint at f"{session.url}/atoti/pyapi/{route}".

The decorated function must take three arguments with types User, HttpRequest and Session and return a response body as a Python data structure that can be converted to JSON. DELETE, POST, and PUT requests can have a body but it must be JSON.

Path parameters can be configured by wrapping their name in curly braces in the route.

Example:

@session.endpoint("simple_get")
def callback(request: HttpRequest, user: User, session: Session):
    return "something that will be in response.data"


@session.endpoint(f"simple_post/{store_name}", method="POST")
def callback(request: HttpRequest, user: User, session: Session):
    return request.path_parameters.store_name
Parameters
  • route (str) – The path suffix after /atoti/pyapi/. For instance, if custom/search is passed, a request to /atoti/pyapi/custom/search?query=test#results will match. The route should not contain the query (?) or fragment (#).

  • method (Literal[‘POST’, ‘GET’, ‘PUT’, ‘DELETE’]) – The HTTP method the request must be using to trigger this endpoint.

Return type

Any

property excel_url

URL of the Excel endpoint.

To connect to the session in Excel, create a new connection to an Analysis Services. Use this URL for the server field and choose to connect with “User Name and Password”:

  • Without authentication, leave these fields blank.

  • With Basic authentication, fill them with your username and password.

  • Other authentication types (such as Auth0) are not supported by Excel.

Return type

str

explain_mdx_query(mdx, *, timeout=30)

Explain an MDX query.

Parameters
  • mdx (str) – The MDX SELECT query to execute.

  • keep_totals – Whether the returned DataFrame should contain, if they are present in the query result, the grand total and subtotals.

  • timeout (int) – The query timeout in seconds.

Return type

QueryAnalysis

export_translations_template(path)

Export a template containing all translatable values in the session’s cubes.

Parameters

path (Union[str, Path]) – The path at which to write the template.

load_all_data()

Trigger the full loading of the data.

Calling this method will change the sampling mode to atoti.sampling.FULL which triggers the loading of all the data. All subsequent loads, including new stores, will not be sampled.

When building a project, this method should be called as late as possible.

property logs_path

Path to the session logs file.

Return type

Path

logs_tail(n=20)

Return the n last lines of the logs or all the lines if n <= 0.

Return type

Logs

property name

Name of the session.

Return type

str

property port

Port on which the session is exposed.

Can be set in SessionConfiguration.

Return type

int

query_mdx(mdx, *, keep_totals=False, timeout=30)

Execute an MDX query and return its result as a pandas DataFrame.

Parameters
  • mdx (str) – The MDX SELECT query to execute.

  • keep_totals (bool) – Whether the returned DataFrame should contain, if they are present in the query result, the grand total and subtotals.

  • timeout (int) – The query timeout in seconds.

  • keep_totals – Whether the resulting DataFrame should contain, if they are present in the query result, the grand total and subtotals. Totals can be useful but they make the DataFrame harder to work with since its index will have some empty values.

Example

An MDX query that would be displayed as this pivot table:

Country

Total Price.SUM

2018-01-01

2019-01-01

2019-01-02

2019-01-05

Price.SUM

Price.SUM

Price.SUM

Price.SUM

Total Country

2,280.00

840.00

1,860.00

810.00

770.00

China

760.00

410.00

350.00

France

1,800.00

480.00

500.00

400.00

420.00

India

760.00

360.00

400.00

UK

960.00

960.00

will return this DataFrame:

Date

Country

Price.SUM

2019-01-02

China

410.0

2019-01-05

China

350.0

2018-01-01

France

480.0

2019-01-01

France

500.0

2019-01-02

France

400.0

2019-01-05

France

420.0

2018-01-01

India

360.0

2019-01-01

India

400.0

2019-01-01

UK

960.0

Return type

QueryResult

read_csv(path, *, keys=None, store_name=None, in_all_scenarios=True, sep=None, encoding='utf-8', process_quotes=None, partitioning=None, types=None, watch=False, array_sep=None, sampling_mode=None, hierarchized_columns=None)

Read a CSV file into a store.

Parameters
  • path (Union[str, Path]) –

    The path to the CSV file or directory to load.

    If a path pointing to a directory is provided, all of the files with the .csv extension in the directory and subdirectories will be loaded into the same store and, as such, they are all expected to share the same schema.

    .gz, .tar.gz and .zip files containing compressed CSV(s) are also supported.

    The path can contain glob parameters (e.g. path/to/directory/**.*.csv) and will be expanded correctly. Be careful, when using glob expressions in paths, all files which match the expression will be loaded, regardless of their extension. When the provided path is a directory, the default glob parameter of **.csv is used.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • store_name (Optional[str]) – The name of the store to create. Defaults to the final component of the given path.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • sep (Optional[str]) – The delimiter to use. If None, the separator will automatically be detected.

  • encoding (str) – The encoding to use to read the CSV.

  • process_quotes (Optional[bool]) –

    Whether double quotes should be processed to follow the official CSV specification:

    • True:

      • Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

      • A double quote appearing inside a field must be escaped by preceding it with another double quote.

      • Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.

    • False: all double-quotes within a field will be treated as any regular character, following Excel’s behavior. In this mode, it is expected that fields are not enclosed in double quotes. It is also not possible to have a line break inside a field.

    • None: The behavior will be inferred from the first lines of the CSV file.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • types (Optional[Mapping[str, DataType]]) – Types for some or all columns of the store. Types for non specified columns will be inferred from the first 1,000 lines.

  • watch (bool) – Whether the source file or directory at the given path should be watched for changes. When set to True, changes to the source will automatically be reflected in the store. If the source is a directory, new files will be loaded into the same store as the initial data and must therefore have the same schema as the initial data as well. Any non-CSV files added to the directory will be ignored.

  • array_sep (Optional[str]) – The delimiter to use for arrays. Setting it to a non-None value will parse all the columns containing this separator as arrays.

  • sampling_mode (Optional[SamplingMode]) – The sampling mode. Defaults to this session’s one.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Return type

Store

Returns

The created store holding the content of the CSV file(s).

read_numpy(array, columns, store_name, *, keys=None, in_all_scenarios=True, partitioning=None, hierarchized_columns=None, **kwargs)

Read a NumPy 2D array into a new store.

Parameters
  • array (ndarray) – The NumPy 2D ndarray to read the data from.

  • columns (Sequence[str]) – The names to use for the store’s columns. They must be in the same order as the values in the NumPy array.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • store_name (str) – The name of the store to create.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Return type

Store

Returns

The created store holding the content of the array.

read_pandas(dataframe, store_name, *, keys=None, in_all_scenarios=True, partitioning=None, types=None, hierarchized_columns=None, **kwargs)

Read a pandas DataFrame into a store.

All the named indices of the DataFrame are included into the store. Multilevel columns are flattened into a single string name.

Parameters
  • dataframe (DataFrame) – The DataFrame to load.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • store_name (str) – The name of the store to create.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • types (Optional[Mapping[str, DataType]]) – Types for some or all columns of the store. Types for non specified columns will be inferred.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Return type

Store

Returns

The created store holding the content of the DataFrame.

read_parquet(path, *, keys=None, store_name=None, in_all_scenarios=True, partitioning=None, sampling_mode=None, watch=False, hierarchized_columns=None)

Read a Parquet file into a store.

Parameters
  • path (Union[str, Path]) – The path to the Parquet file or directory. If the path points to a directory, all the files in the directory and subdirectories will be loaded into the store and, as such, are expected to have the same schema as the store and to be Parquet files.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • store_name (Optional[str]) – The name of the store to create. Defaults to the final component of the given path.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • sampling_mode (Optional[SamplingMode]) – The sampling mode. Defaults to this session’s one.

  • watch (bool) – Whether the source file or directory at the given path should be watched for changes. When set to True, changes to the source will automatically be reflected in the store. If the source is a directory, new files will be loaded into the same store as the initial data and must therefore have the same schema as the initial data as well.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Return type

Store

Returns

The created store holding the content of the Parquet file(s).

read_spark(dataframe, store_name, *, keys=None, in_all_scenarios=True, partitioning=None, hierarchized_columns=None)

Read a Spark DataFrame into a store.

Parameters
  • dataframe – The DataFrame to load.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • store_name (str) – The name of the store to create.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Return type

Store

Returns

The created store holding the content of the DataFrame.

read_sql(url, query, *, username, password, driver=None, store_name, keys=None, partitioning=None, types=None, hierarchized_columns=None)

Create a store from the result of the passed SQL query.

Note

This method requires the atoti-sql plugin.

Parameters
  • url (Union[Path, str]) –

    The URL of the database. For instance:

    • mysql:localhost:7777/example

    • h2:/home/user/database/file/path

  • query (str) – A SQL query which result is used to build a store.

  • username (str) – The username used to connect to the database.

  • password (str) – The password used to connect to the database.

  • driver (Optional[str]) – The JDBC driver used to load the data. If None, the driver is inferred from the URL. Drivers can be found in the atoti_sql.drivers module.

  • store_name (str) – The name of the store to create.

  • keys (Optional[Collection[str]]) – The columns that will become keys of the store.

  • partitioning (Optional[str]) –

    The description of how the data will be split across partitions of the store.

    Default rules:

    • Only non-referenced base stores are automatically partitioned.

    • Base stores are automatically partitioned by hashing their key fields. If there are no key fields, all the dictionarized fields are hashed.

    • Referenced stores can only use a sub-partitionning of the store referencing them.

    • Automatic partitioning is done modulo the number of available processors.

    For instance, hash4(country) split the data across 4 partitions based on the country column’s hash value.

    Only key columns can be used in the partitioning description.

  • types (Optional[Mapping[str, DataType]]) – Types for some or all columns of the store. Types for non specified columns will be inferred from the SQL types.

  • hierarchized_columns (Optional[Collection[str]]) –

    The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.

    The different behaviors based on the passed value are:

    • None: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.

    • Empty collection: no columns are converted into hierarchies.

    • Non-empty collection: only the columns in the collection will be converted into hierarchies.

    For partial joins, the un-mapped key columns of the target store are always converted into hierarchies, regardless of the value of this parameter.

Example

>>> store = session.read_sql(
...     f"h2:{RESOURCES}/h2-database",
...     "SELECT * FROM MYTABLE;",
...     username="root",
...     password="pass",
...     store_name="Cities",
...     keys=["ID"],
... )
Return type

Store

property scenarios

Collection of source scenarios of the session.

Return type

Collection[str]

start_transaction()

Start a transaction to batch several store operations.

  • It is more efficient than doing each store operation one after the other.

  • It avoids possibly incorrect intermediate states (e.g. if loading some new data first requires to drop some existing one).

Note

Some operations are not allowed during a transaction:

  • Long-running operations such as load_kafka() or load_csv() where watch=True is used.

  • Operations changing the structure of the session’s stores such as join() or read_parquet().

  • Operations not related to data loading or dropping such as defining a new measure.

Example

>>> df = pd.DataFrame(
...     columns=["City", "Price"],
...     data=[
...         ("Berlin", 150.0),
...         ("London", 240.0),
...         ("New York", 270.0),
...         ("Paris", 200.0),
...     ],
... )
>>> store = session.read_pandas(
...     df, keys=["City"], store_name="start_transaction example"
... )
>>> cube = session.create_cube(store)
>>> extra_df = pd.DataFrame(
...     columns=["City", "Price"],
...     data=[
...         ("Singapore", 250.0),
...     ],
... )
>>> with session.start_transaction():
...     store += ("New York", 100.0)
...     store.drop({"City": "Paris"})
...     store.load_pandas(extra_df)
...
>>> store.head(10)
           Price
City
Berlin     150.0
London     240.0
New York   100.0
Singapore  250.0
Return type

Transaction

property stores

Stores of the session.

Return type

Stores

property url

Public URL of the session.

Can be set in SessionConfiguration.

Return type

str

visualize(name=None)

Display an atoti widget to explore the session interactively.

Note

This method requires the atoti-jupyterlab plugin.

The widget state will be stored in the cell metadata. This state should not have to be edited but, if desired, it can be found in JupyterLab by opening the “Notebook tools” sidebar and expanding the the “Advanced Tools” section.

Parameters

name (Optional[str]) – The name to give to the widget.

wait()

Wait for the underlying server subprocess to terminate.

This will prevent the Python process to exit.

Return type

None