atoti.session module¶
- class atoti.session.Session(name, *, config, detached_process)¶
Holds a connection to the Java gateway.
- create_cube(base_table, name=None, *, mode='auto')¶
Create a cube based on the passed table.
- Parameters
base_table (
Table
) – The base table of the cube.name (
Optional
[str
]) – The name of the created cube. Defaults to the name of the base table.mode (
Literal
[‘auto’, ‘manual’, ‘no_measures’]) –The cube creation mode:
auto
: Creates hierarchies for every key column or non-numeric column of the table, and measures for every numeric column.manual
: Does not create any hierarchy or measure (except from the count).no_measures
: Creates the hierarchies likeauto
but does not create any measures.
For tables with
hierarchized_columns
specified, these will be converted into hierarchies regardless of the cube creation mode.
Example
>>> table = session.create_table( ... "Table", ... types={"id": tt.type.STRING, "value": tt.type.NULLABLE_DOUBLE}, ... ) >>> cube_auto = session.create_cube(table) >>> sorted(cube_auto.measures) ['contributors.COUNT', 'update.TIMESTAMP', 'value.MEAN', 'value.SUM'] >>> list(cube_auto.hierarchies) [('Table', 'id')] >>> cube_no_measures = session.create_cube(table, mode="no_measures") >>> sorted(cube_no_measures.measures) ['contributors.COUNT', 'update.TIMESTAMP'] >>> list(cube_no_measures.hierarchies) [('Table', 'id')] >>> cube_manual = session.create_cube(table, mode="manual") >>> sorted(cube_manual.measures) ['contributors.COUNT', 'update.TIMESTAMP'] >>> list(cube_manual.hierarchies) []
See also
Hierarchies and measures created by a
join()
.- Return type
- create_scenario(name, *, origin='Base')¶
Create a new source scenario.
- create_table(name, *, types, keys=(), partitioning=None, hierarchized_columns=None, **kwargs)¶
Create a table from a schema.
- Parameters
name (
str
) – The name of the table to create.types (
Mapping
[str
,DataType
]) – Types for all columns of the table. This defines the columns which will be expected in any future data loaded into the table.keys (
Iterable
[str
]) – The columns that will become keys of the table.partitioning (
Optional
[str
]) –The description of how the data will be split across partitions of the table.
Joined tables can only use a sub-partitioning of the table referencing them.
Example
hash4(country)
splits the data across 4 partitions based on the country column’s hash value.hierarchized_columns (
Optional
[Iterable
[str
]]) –The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.
The different behaviors based on the passed value are:
None
: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.Empty collection: no columns are converted into hierarchies.
Non-empty collection: only the columns in the collection will be converted into hierarchies.
For partial joins, the un-mapped key columns of the target table are always converted into hierarchies, regardless of the value of this parameter.
Example
>>> from datetime import date >>> table = session.create_table( ... "Product", ... types={"Date": tt.type.LOCAL_DATE, "Product": tt.type.STRING, "Quantity": tt.type.NULLABLE_DOUBLE}, ... keys=["Date"], ... ) >>> table.head() Empty DataFrame Columns: [Product, Quantity] Index: [] >>> table.append((date(2021, 5, 19), "TV", 15.0)) >>> table.head() Product Quantity Date 2021-05-19 TV 15.0
- Return type
- delete_scenario(scenario)¶
Delete the source scenario with the provided name if it exists.
- Return type
- endpoint(route, *, method='GET')¶
Create a custom endpoint at
/atoti/pyapi/{route}"
.This is useful to reuse atoti’s built-in server instead of adding a FastAPI or Flask server to the project. This way, when deploying the project in a container or a VM, only one port (the one of the atoti server) can be exposed instead of two. Since custom endpoints are exposed by atoti’s server, they automatically inherit from the configured
atoti.config.session_config.SessionConfig.authentication
andatoti.config.session_config.SessionConfig.https
parameters.The decorated function must take three parameters with types
User
,HttpRequest
, andSession
and return a response body as a Python data structure that can be converted to JSON.- Parameters
route (
str
) –The path suffix after
/atoti/pyapi/
. For instance, ifcustom/search
is passed, a request to/atoti/pyapi/custom/search?query=test#results
will match. The route should not contain the query (?
) or fragment (#
).Path parameters can be configured by wrapping their name in curly braces in the route.
method (
Literal
[‘POST’, ‘GET’, ‘PUT’, ‘DELETE’]) – The HTTP method the request must be using to trigger this endpoint.DELETE
,POST
, andPUT
requests can have a body but it must be JSON.
Example
>>> import requests >>> df = pd.DataFrame( ... columns=["Year", "Month", "Day", "Quantity"], ... data=[ ... (2019, 7, 1, 15), ... (2019, 7, 2, 20), ... ], ... ) >>> table = session.read_pandas(df, table_name="Quantity") >>> table.head() Year Month Day Quantity 0 2019 7 1 15 1 2019 7 2 20 >>> endpoints_base_url = f"http://localhost:{session.port}/atoti/pyapi" >>> @session.endpoint("tables/{table_name}/size", method="GET") ... def get_table_size(request, user, session): ... table_name = request.path_parameters["table_name"] ... return len(session.tables[table_name]) >>> requests.get(f"{endpoints_base_url}/tables/Quantity/size").json() 2 >>> @session.endpoint("tables/{table_name}/rows", method="POST") ... def append_rows_to_table(request, user, session): ... rows = request.body ... table_name = request.path_parameters["table_name"] ... session.tables[table_name].append(*rows) >>> requests.post( ... f"{endpoints_base_url}/tables/Quantity/rows", ... json=[ ... {"Year": 2021, "Month": 5, "Day": 19, "Quantity": 50}, ... {"Year": 2021, "Month": 5, "Day": 20, "Quantity": 6}, ... ], ... ).status_code 200 >>> requests.get(f"{endpoints_base_url}/tables/Quantity/size").json() 4 >>> table.head() Year Month Day Quantity 0 2019 7 1 15 1 2019 7 2 20 2 2021 5 19 50 3 2021 5 20 6
- explain_mdx_query(mdx, *, timeout=30)¶
Run the query but return an explanation of how the query was executed instead of its result.
See also
query_mdx()
for the roles of the parameters.- Return type
QueryAnalysis
- Returns
An explanation containing a summary, global timings, and the query plan with all the retrievals.
- export_translations_template(path)¶
Export a template containing all translatable values in the session’s cubes.
- link(*, path='')¶
Display a link to this session.
Clicking on the link will open it in a new browser tab.
Note
This method requires the
atoti-jupyterlab
plugin.The extension will try to access the session through (in that order):
Jupyter Server Proxy if it is enabled.
f"{session_protocol}//{jupyter_server_hostname}:{session.port}"
forSession
andsession.url
forQuerySession
.
- Parameters
path (
str
) – The path to append to the session base URL. Defaults to the session home page.
Example
Pointing directly to an existing dashboard:
dashboard_id = "92i" session.link(path=f"#/dashboard/{dashboard_id}")
- Return type
- query_mdx(mdx, *, keep_totals=False, timeout=30, mode='pretty')¶
Execute an MDX query and return its result as a pandas DataFrame.
- Parameters
mdx (
str
) – The MDXSELECT
query to execute. Regardless of the axes on which levels and measures appear in the MDX, the returned DataFrame will have all levels on rows and measures on columns.keep_totals (
bool
) – Whether the resulting DataFrame should contain, if they are present in the query result, the grand total and subtotals. Totals can be useful but they make the DataFrame harder to work with since its index will have some empty values.timeout (
int
) – The query timeout in seconds.mode (
Literal
[‘pretty’, ‘raw’]) –The query mode.
"pretty"
is best for queries returning small results:A
QueryResult
will be returned and its rows will be sorted according to the level comparators.
"raw"
is best for benchmarks or large exports:
A faster and more efficient endpoint reducing the data transfer from Java to Python will be used.
A classic
pandas.DataFrame
will be returned.include_totals="True"
will not be allowed.The Convert to Widget Below action provided by the
atoti-jupyterlab
plugin will not be available.
Example
>>> from datetime import date >>> df = pd.DataFrame( ... columns=["Country", "Date", "Price"], ... data=[ ... ("China", date(2020, 3, 3), 410.0), ... ("China", date(2020, 4, 4), 350.0), ... ("France", date(2020, 1, 1), 480.0), ... ("France", date(2020, 2, 2), 500.0), ... ("France", date(2020, 3, 3), 400.0), ... ("France", date(2020, 4, 4), 420.0), ... ("India", date(2020, 1, 1), 360.0), ... ("India", date(2020, 2, 2), 400.0), ... ("UK", date(2020, 2, 2), 960.0), ... ], ... ) >>> table = session.read_pandas( ... df, keys=["Country", "Date"], table_name="Prices" ... ) >>> cube = session.create_cube(table)
This MDX:
>>> mdx = ( ... "SELECT" ... " NON EMPTY Hierarchize(" ... " DrilldownLevel(" ... " [Prices].[Country].[ALL].[AllMember]" ... " )" ... " ) ON ROWS," ... " NON EMPTY Crossjoin(" ... " [Measures].[Price.SUM]," ... " Hierarchize(" ... " DrilldownLevel(" ... " [Prices].[Date].[ALL].[AllMember]" ... " )" ... " )" ... " ) ON COLUMNS" ... " FROM [Prices]" ... )
would display this pivot table:
Country
Price.sum
Total
2020-01-01
2020-02-02
2020-03-03
2020-04-04
Total
2,280.00
840.00
1,860.00
810.00
770.00
China
760.00
410.00
350.00
France
1,800.00
480.00
500.00
400.00
420.00
India
760.00
360.00
400.00
UK
960.00
960.00
but will return this DataFrame:
>>> session.query_mdx(mdx).sort_index() Price.SUM Date Country 2020-01-01 France 480.0 India 360.0 2020-02-02 France 500.0 India 400.0 UK 960.0 2020-03-03 China 410.0 France 400.0 2020-04-04 China 350.0 France 420.0
- Return type
- read_csv(path, *, keys=(), table_name=None, separator=None, encoding='utf-8', process_quotes=None, partitioning=None, types={}, array_separator=None, hierarchized_columns=None, date_patterns={}, client_side_encryption=None, **kwargs)¶
Read a CSV file into a table.
- Parameters
The path to the CSV file to load.
.gz
,.tar.gz
and.zip
files containing compressed CSV(s) are also supported.The path can also be a glob pattern (e.g.
path/to/directory/**.*.csv
).keys (
Iterable
[str
]) – The columns that will become keys of the table.table_name (
Optional
[str
]) – The name of the table to create. Required when path is a glob pattern. Otherwise, defaults to the final component of the path argument.separator (
Optional
[str
]) – The character separating the values of each line. the separator will be detected automatically.encoding (
str
) – The encoding to use to read the CSV.process_quotes (
Optional
[bool
]) –Whether double quotes should be processed to follow the official CSV specification:
True
:Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.
A double quote appearing inside a field must be escaped by preceding it with another double quote.
Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.
False
: all double-quotes within a field will be treated as any regular character, following Excel’s behavior. In this mode, it is expected that fields are not enclosed in double quotes. It is also not possible to have a line break inside a field.None
: The behavior will be inferred from the first lines of the CSV file.
partitioning (
Optional
[str
]) –The description of how the data will be split across partitions of the table.
Joined tables can only use a sub-partitioning of the table referencing them.
Example
hash4(country)
splits the data across 4 partitions based on the country column’s hash value.types (
Mapping
[str
,DataType
]) – Types for some or all columns of the table. Types for non specified columns will be inferred from the first 1,000 lines.array_separator (
Optional
[str
]) – The character separating array elements. Setting it to a non-None
value will parse all the columns containing this separator as arrays.hierarchized_columns (
Optional
[Iterable
[str
]]) –The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.
The different behaviors based on the passed value are:
None
: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.Empty collection: no columns are converted into hierarchies.
Non-empty collection: only the columns in the collection will be converted into hierarchies.
For partial joins, the un-mapped key columns of the target table are always converted into hierarchies, regardless of the value of this parameter.
date_patterns (
Mapping
[str
,str
]) – A column name to date pattern mapping that can be used when the built-in date parsers fail to recognize the formatted dates in the passed files.client_side_encryption (
Optional
[ClientSideEncryption
]) – The client side encryption configuration to use when loading data.
- Return type
- Returns
The created table holding the content of the CSV file(s).
- read_numpy(array, *, columns, table_name, keys=(), partitioning=None, types={}, hierarchized_columns=None, **kwargs)¶
Read a NumPy 2D array into a new table.
- Parameters
array (
ndarray
) – The NumPy 2D ndarray to read the data from.columns (
Sequence
[str
]) – The names to use for the table’s columns. They must be in the same order as the values in the NumPy array.table_name (
str
) – The name of the table to create.keys (
Iterable
[str
]) – The columns that will become keys of the table.partitioning (
Optional
[str
]) –The description of how the data will be split across partitions of the table.
Joined tables can only use a sub-partitioning of the table referencing them.
Example
hash4(country)
splits the data across 4 partitions based on the country column’s hash value.types (
Mapping
[str
,DataType
]) – Types for some or all columns of the table. Types for non specified columns will be inferred from numpy data types.hierarchized_columns (
Optional
[Iterable
[str
]]) –The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.
The different behaviors based on the passed value are:
None
: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.Empty collection: no columns are converted into hierarchies.
Non-empty collection: only the columns in the collection will be converted into hierarchies.
For partial joins, the un-mapped key columns of the target table are always converted into hierarchies, regardless of the value of this parameter.
- Return type
- Returns
The created table holding the content of the array.
- read_pandas(dataframe, *, table_name, keys=(), partitioning=None, types={}, hierarchized_columns=None, **kwargs)¶
Read a pandas DataFrame into a table.
All the named indices of the DataFrame are included into the table. Multilevel columns are flattened into a single string name.
- Parameters
dataframe (
DataFrame
) – The DataFrame to load.table_name (
str
) – The name of the table to create.keys (
Iterable
[str
]) – The columns that will become keys of the table.partitioning (
Optional
[str
]) –The description of how the data will be split across partitions of the table.
Joined tables can only use a sub-partitioning of the table referencing them.
Example
hash4(country)
splits the data across 4 partitions based on the country column’s hash value.types (
Mapping
[str
,DataType
]) – Types for some or all columns of the table. Types for non specified columns will be inferred from pandas dtypes.hierarchized_columns (
Optional
[Iterable
[str
]]) –The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.
The different behaviors based on the passed value are:
None
: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.Empty collection: no columns are converted into hierarchies.
Non-empty collection: only the columns in the collection will be converted into hierarchies.
For partial joins, the un-mapped key columns of the target table are always converted into hierarchies, regardless of the value of this parameter.
- Return type
- Returns
The created table holding the content of the DataFrame.
- read_parquet(path, *, keys=(), table_name=None, partitioning=None, hierarchized_columns=None, client_side_encryption=None, **kwargs)¶
Read a Parquet file into a table.
- Parameters
path (
Union
[str
,Path
]) – The path to the Parquet file. If a path pointing to a directory is provided, all of the files with the.parquet
extension in the directory will be loaded into the same table and, as such, they are all expected to share the same schema. The path can also be a glob pattern (e.g.path/to/directory/**.*.parquet
).keys (
Iterable
[str
]) – The columns that will become keys of the table.table_name (
Optional
[str
]) – The name of the table to create. Required when path is a glob pattern. Otherwise, defaults to the final component of the path argument.partitioning (
Optional
[str
]) –The description of how the data will be split across partitions of the table.
Joined tables can only use a sub-partitioning of the table referencing them.
Example
hash4(country)
splits the data across 4 partitions based on the country column’s hash value.hierarchized_columns (
Optional
[Iterable
[str
]]) –The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.
The different behaviors based on the passed value are:
None
: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.Empty collection: no columns are converted into hierarchies.
Non-empty collection: only the columns in the collection will be converted into hierarchies.
For partial joins, the un-mapped key columns of the target table are always converted into hierarchies, regardless of the value of this parameter.
client_side_encryption (
Optional
[ClientSideEncryption
]) – The client side encryption configuration to use when loading data.
- Return type
- Returns
The created table holding the content of the Parquet file(s).
- read_spark(dataframe, *, table_name, keys=(), partitioning=None, hierarchized_columns=None, **kwargs)¶
Read a Spark DataFrame into a table.
- Parameters
dataframe – The DataFrame to load.
table_name (
str
) – The name of the table to create.keys (
Iterable
[str
]) – The columns that will become keys of the table.partitioning (
Optional
[str
]) –The description of how the data will be split across partitions of the table.
Joined tables can only use a sub-partitioning of the table referencing them.
Example
hash4(country)
splits the data across 4 partitions based on the country column’s hash value.hierarchized_columns (
Optional
[Iterable
[str
]]) –The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.
The different behaviors based on the passed value are:
None
: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.Empty collection: no columns are converted into hierarchies.
Non-empty collection: only the columns in the collection will be converted into hierarchies.
For partial joins, the un-mapped key columns of the target table are always converted into hierarchies, regardless of the value of this parameter.
- Return type
- Returns
The created table holding the content of the DataFrame.
- read_sql(query, *, url, table_name, driver=None, keys=(), partitioning=None, types={}, hierarchized_columns=None)¶
Create a table from the result of the passed SQL query.
Note
This method requires the
atoti-sql
plugin.- Parameters
query (
str
) – The result of this SQL query will be loaded into the table.url (
str
) –The JDBC connection URL of the database. The
jdbc:
prefix is optional but the database specific part (such ash2:
ormysql:
) is mandatory. For instance:h2:file:/home/user/database/file/path;USER=username;PASSWORD=passwd
mysql://localhost:7777/example?user=username&password=passwd
postgresql://postgresql.db.server:5430/example?user=username&password=passwd
More examples can be found here.
driver (
Optional
[str
]) – The JDBC driver used to load the data. IfNone
, the driver is inferred from the URL. Drivers can be found in theatoti_sql.drivers
module.table_name (
str
) – The name of the table to create.keys (
Iterable
[str
]) – The columns that will become keys of the table.partitioning (
Optional
[str
]) –The description of how the data will be split across partitions of the table.
Joined tables can only use a sub-partitioning of the table referencing them.
Example
hash4(country)
splits the data across 4 partitions based on the country column’s hash value.types (
Mapping
[str
,DataType
]) – Types for some or all columns of the table. Types for non specified columns will be inferred from the SQL types.hierarchized_columns (
Optional
[Iterable
[str
]]) –The list of columns which will automatically be converted into hierarchies no matter which creation mode is used for the cube.
The different behaviors based on the passed value are:
None
: all non-numeric columns are converted into hierarchies, depending on the cube’s creation mode.Empty collection: no columns are converted into hierarchies.
Non-empty collection: only the columns in the collection will be converted into hierarchies.
For partial joins, the un-mapped key columns of the target table are always converted into hierarchies, regardless of the value of this parameter.
Example
>>> table = session.read_sql( ... "SELECT * FROM MYTABLE;", ... url=f"h2:file:{RESOURCES}/h2-database;USER=root;PASSWORD=pass", ... table_name="Cities", ... keys=["ID"], ... ) >>> len(table) 5
- Return type
- property scenarios: Sequence[str]¶
Collection of source scenarios of the session.
- start_transaction(scenario_name='Base')¶
Start a transaction to batch several table operations.
It is more efficient than doing each table operation one after the other.
It avoids possibly incorrect intermediate states (e.g. if loading some new data requires dropping existing rows first).
Note
Some operations are not allowed during a transaction:
Long-running operations such as
load_kafka()
.Operations changing the structure of the session’s tables such as
join()
orread_parquet()
.Operations not related to data loading or dropping such as defining a new measure.
Operations on parameter tables created from
create_parameter_hierarchy_from_members()
andcreate_parameter_simulation()
.Operations on other source scenarios than the one the transaction is started on.
- Parameters
scenario_name (
str
) – The name of the source scenario impacted by all the table operations inside the transaction.
Example
>>> df = pd.DataFrame( ... columns=["City", "Price"], ... data=[ ... ("Berlin", 150.0), ... ("London", 240.0), ... ("New York", 270.0), ... ("Paris", 200.0), ... ], ... ) >>> table = session.read_pandas( ... df, keys=["City"], table_name="start_transaction example" ... ) >>> cube = session.create_cube(table) >>> extra_df = pd.DataFrame( ... columns=["City", "Price"], ... data=[ ... ("Singapore", 250.0), ... ], ... ) >>> with session.start_transaction(): ... table += ("New York", 100.0) ... table.drop({"City": "Paris"}) ... table.load_pandas(extra_df) ... >>> table.head().sort_index() Price City Berlin 150.0 London 240.0 New York 100.0 Singapore 250.0
- Return type
Transaction
- visualize(name=None)¶
Display an atoti widget to explore the session interactively.
Note
This method requires the
atoti-jupyterlab
plugin.The widget state will be stored in the cell metadata. This state should not have to be edited but, if desired, it can be found in JupyterLab by opening the “Notebook tools” sidebar and expanding the “Advanced Tools” section.