atoti.session module#

class atoti.Session#

The primary entry point for Atoti applications.

A session is a process holding data in tables and aggregating it in cubes. It also serves a web app for data exploration accessible with link.

__init__(*, name='Unnamed', app_extensions={}, authentication=None, branding=None, client_certificate=None, extra_jars=(), https=None, i18n=None, java_options=(), jwt=None, logging=None, port=0, same_site=None, user_content_storage=None, **kwargs)#

Create a session.

Parameters:
  • name (str | None) –

    The name of the session.

    For a better prototyping experience in notebooks, creating a session with the same name as an already running session will close the latter one. Pass None to opt out of this behavior.

  • app_extensions (Mapping[str, str | Path]) –

    Mapping from the name of an extension (i.e. name property in their package.json) to the path of its dist directory.

    Note

    This feature is not part of the community edition: it needs to be unlocked.

    Extensions can enhance the app in many ways such as:

    • Adding new type of widgets.

    • Attaching custom menu items or titlebar buttons to a set of widgets.

    • Providing other React contexts to the components rendered by the app.

    The app extension template can be used as a starting point.

    See also

    Available extensions in atoti.app_extension.

  • authentication (BasicAuthenticationConfig | KerberosConfig | LdapConfig | OidcConfig | None) –

    The configuration to enable authentication on the session.

    Note

    This feature is not part of the community edition: it needs to be unlocked.

  • branding (BrandingConfig | None) – The config to customize some elements of the UI to change its appearance.

  • client_certificate (ClientCertificateConfig | None) – The config to enable client certificate based authentication on the session.

  • extra_jars (Iterable[str | Path]) – The paths to the JARs to add to the classpath of the Java process when starting the session.

  • https (HttpsConfig | None) – The config providing certificates to enable HTTPS on the session.

  • i18n (I18nConfig | None) – The config to internationalize the session.

  • java_options (Iterable[str]) –

    The additional options to pass when starting the Java process (e.g. for optimization or debugging purposes).

    In particular, the -Xmx option can be set to increase the amount of RAM that the session can use.

    If this option is not specified, the JVM default memory setting is used which is 25% of the machine memory.

  • jwt (JwtConfig | None) – The config to set the key pair used to validate JWTs when authenticating with the session.

  • logging (LoggingConfig | None) – The config describing how to handle session logs.

  • port (int) –

    The port on which the session will be exposed.

    Defaults to a random available port.

  • same_site (Literal['none', 'strict'] | None) –

    The value to use for the SameSite attribute of the HTTP cookie sent by the session when authentication is configured.

    Note

    This feature is not part of the community edition: it needs to be unlocked.

    See https://web.dev/samesite-cookies-explained for more information.

    Setting it to none requires the session to be served through HTTPS.

    Defaults to lax.

  • user_content_storage (Path | str | UserContentStorageConfig | None) – The location of the database where the user content will be stored. The user content is what is not part of the data sources, such as the dashboards, widgets, and filters saved in the application. If a path to a directory is given, it will be created if needed. When None, the user content is kept in memory and is thus lost when the session is closed.

add_external_table(external_table, /, table_name=None, *, columns={}, options=None)#

Add a table from an external database to the session.

Parameters:
  • external_table (ExternalTableT_co) – The external database table from which to build the session table. Instances of such tables are obtained through an external database connection.

  • table_name (str | None) – The name to give to the table in the session. If None, the name of the external table is used.

  • columns (Mapping[str, str]) – Mapping from external column names to local column names. If empty, the local columns will share the names of the external columns.

  • options (ExternalTableOptions[ExternalTableT_co] | None) – The database specific options to read the table. Each DirectQuery plugin has its own *TableOptions class.

Return type:

Table

close()#

Close this session and free all the associated resources.

property closed: bool#

Return whether the session is closed or not.

connect_to_external_database(connection_info, /)#

Connect to an external database using DirectQuery.

Parameters:

connection_info (ExternalDatabaseConnectionInfo[ExternalDatabaseConnectionT, ExternalTableT_co]) – Information needed to connect to the external database. Each DirectQuery plugin has its own *ConnectionInfo class.

Return type:

ExternalDatabaseConnectionT

create_cube(base_table, name=None, *, mode='auto')#

Create a cube based on the passed table.

Parameters:
  • base_table (Table) – The base table of the cube.

  • name (str | None) – The name of the created cube. Defaults to the name of the base table.

  • mode (Literal['auto', 'manual', 'no_measures']) –

    The cube creation mode:

    • auto: Creates hierarchies for every key column or non-numeric column of the table, and measures for every numeric column.

    • manual: Does not create any hierarchy or measure (except from the count).

    • no_measures: Creates the hierarchies like auto but does not create any measures.

Return type:

Cube

Example

>>> table = session.create_table(
...     "Table",
...     types={"id": tt.STRING, "value": tt.DOUBLE},
... )
>>> cube_auto = session.create_cube(table)
>>> sorted(cube_auto.measures)
['contributors.COUNT', 'update.TIMESTAMP', 'value.MEAN', 'value.SUM']
>>> list(cube_auto.hierarchies)
[('Table', 'id')]
>>> cube_no_measures = session.create_cube(table, mode="no_measures")
>>> sorted(cube_no_measures.measures)
['contributors.COUNT', 'update.TIMESTAMP']
>>> list(cube_no_measures.hierarchies)
[('Table', 'id')]
>>> cube_manual = session.create_cube(table, mode="manual")
>>> sorted(cube_manual.measures)
['contributors.COUNT', 'update.TIMESTAMP']
>>> list(cube_manual.hierarchies)
[]
create_scenario(name, *, origin='Base')#

Create a new source scenario.

Parameters:
  • name (str) – The name of the scenario.

  • origin (str) – The scenario to fork.

create_table(name, *, types, keys=(), partitioning=None, default_values={}, **kwargs)#

Create a table from a schema.

Parameters:
  • name (str) – The name of the table to create.

  • types (Mapping[str, DataType]) –

    Types for all columns of the table. This defines the columns which will be expected in any future data loaded into the table.

    See also

    atoti.type for data type constants.

  • keys (Iterable[str]) –

    The columns that will become keys of the table.

    Inserting a row containing key values equal to the ones of an existing row will replace the existing row with the new one.

    Key columns cannot have None as their default_value.

  • partitioning (str | None) –

    The description of how the data will be split across partitions of the table.

    Default rules:

    • Only non-joined tables are automatically partitioned.

    • Tables are automatically partitioned by hashing their key columns. If there are no key columns, all the dictionarized columns are hashed.

    • Joined tables can only use a sub-partitioning of the table referencing them.

    • Automatic partitioning is done modulo the number of available cores.

    Example

    hash4(country) splits the data across 4 partitions based on the country column’s hash value.

  • default_values (Mapping[str, ConstantValue | None]) – Mapping from column name to column default_value.

Return type:

Table

Example

>>> from datetime import date
>>> table = session.create_table(
...     "Product",
...     types={"Date": tt.LOCAL_DATE, "Product": tt.STRING, "Quantity": tt.DOUBLE},
...     keys=["Date", "Product"],
... )
>>> table.head()
Empty DataFrame
Columns: [Quantity]
Index: []
>>> table.append(
...     (date(2021, 5, 19), "TV", 15.0),
...     (date(2022, 8, 17), "Car", 2.0),
... )
>>> table.head()
                    Quantity
Date       Product
2021-05-19 TV           15.0
2022-08-17 Car           2.0

Inserting a row with the same key values as an existing row replaces the latter:

>>> table += (date(2021, 5, 19), "TV", 8.0)
>>> table.head()
                    Quantity
Date       Product
2021-05-19 TV            8.0
2022-08-17 Car           2.0
property cubes: Cubes#

Cubes of the session.

delete_scenario(name)#

Delete the source scenario with the provided name if it exists.

endpoint(route, *, method='GET')#

Create a custom endpoint at /atoti/pyapi/{route}".

This is useful to reuse Atoti’s built-in server instead of adding a FastAPI or Flask server to the project. This way, when deploying the project in a container or a VM, only one port (the one of the Atoti server) can be exposed instead of two. Since custom endpoints are exposed by Atoti’s server, they automatically inherit from the configured atoti.Session()’s authentication and https parameter.

The decorated function must take three parameters with types atoti.pyapi.User, atoti.pyapi.HttpRequest, and atoti.Session and return a response body as a Python data structure that can be converted to JSON.

Parameters:
  • route (str) –

    The path suffix after /atoti/pyapi/. For instance, if custom/search is passed, a request to /atoti/pyapi/custom/search?query=test#results will match. The route should not contain the query (?) or fragment (#).

    Path parameters can be configured by wrapping their name in curly braces in the route.

  • method (Literal['POST', 'GET', 'PUT', 'DELETE']) – The HTTP method the request must be using to trigger this endpoint. DELETE, POST, and PUT requests can have a body but it must be JSON.

Return type:

Callable[[Callable[[…], Any]], Callable[[…], Any]]

Example

>>> import requests
>>> df = pd.DataFrame(
...     columns=["Year", "Month", "Day", "Quantity"],
...     data=[
...         (2019, 7, 1, 15),
...         (2019, 7, 2, 20),
...     ],
... )
>>> table = session.read_pandas(df, table_name="Quantity")
>>> table.head()
Year  Month  Day  Quantity
0  2019      7    1        15
1  2019      7    2        20
>>> endpoints_base_url = f"http://localhost:{session.port}/atoti/pyapi"
>>> @session.endpoint("tables/{table_name}/size", method="GET")
... def get_table_size(request, user, session):
...     table_name = request.path_parameters["table_name"]
...     return len(session.tables[table_name])
...
>>> requests.get(f"{endpoints_base_url}/tables/Quantity/size").json()
2
>>> @session.endpoint("tables/{table_name}/rows", method="POST")
... def append_rows_to_table(request, user, session):
...     rows = request.body
...     table_name = request.path_parameters["table_name"]
...     session.tables[table_name].append(*rows)
...
>>> requests.post(
...     f"{endpoints_base_url}/tables/Quantity/rows",
...     json=[
...         {"Year": 2021, "Month": 5, "Day": 19, "Quantity": 50},
...         {"Year": 2021, "Month": 5, "Day": 20, "Quantity": 6},
...     ],
... ).status_code
200
>>> requests.get(f"{endpoints_base_url}/tables/Quantity/size").json()
4
>>> table.head()
Year  Month  Day  Quantity
0  2019      7    1        15
1  2019      7    2        20
2  2021      5   19        50
3  2021      5   20         6
explain_mdx_query(mdx, *, timeout=datetime.timedelta(seconds=30))#

Run the query but, instead of returning its result, return the explanation of how it was executed containing a summary, global timings, and the query plan with all the retrievals.

See also

query_mdx() for the roles of the parameters.

Return type:

QueryAnalysis

export_translations_template(path)#

Export a template containing all translatable values in the session’s cubes.

Parameters:

path (str | Path) – The path at which to write the template.

Display a link to this session.

If the atoti-jupyterlab3 plugin is:

  • not enabled, the session’s local URL will be used so the link may not be reachable if Atoti is running on another machine. In that situation, the session may be reached from f{public_ip_or_domain_of_machine_hosting_atoti}:{session.port} (see atoti.Session.port()).

  • enabled, the JupyterLab extension will try to access the session through (in this order):

    1. Jupyter Server Proxy if it is enabled.

    2. f"{session_protocol}//{jupyter_server_hostname}:{session.port}" for Session and session.url for QuerySession.

Parameters:

path (str) – The path to append to the session base URL. Defaults to the session home page.

Return type:

object

Example

Linking to an existing dashboard:

dashboard_id = "92i"
session.link(path=f"#/dashboard/{dashboard_id}")
property logs_path: Path#

Path to the session logs file.

property name: str | None#

Name of the session.

property port: int#

Port on which the session is exposed.

Can be configured with atoti.Session()’s port parameter.

See also

atoti.Session.link() to display a link to this session.

query_mdx(mdx, *, keep_totals=False, timeout=datetime.timedelta(seconds=30), mode='pretty', context={})#

Execute an MDX query and return its result as a pandas DataFrame.

Parameters:
  • mdx (str) –

    The MDX SELECT query to execute.

    Regardless of the axes on which levels and measures appear in the MDX, the returned DataFrame will have all levels on rows and measures on columns.

    Example

    >>> from datetime import date
    >>> df = pd.DataFrame(
    ...     columns=["Country", "Date", "Price"],
    ...     data=[
    ...         ("China", date(2020, 3, 3), 410.0),
    ...         ("France", date(2020, 1, 1), 480.0),
    ...         ("France", date(2020, 2, 2), 500.0),
    ...         ("France", date(2020, 3, 3), 400.0),
    ...         ("India", date(2020, 1, 1), 360.0),
    ...         ("India", date(2020, 2, 2), 400.0),
    ...         ("UK", date(2020, 2, 2), 960.0),
    ...     ],
    ... )
    >>> table = session.read_pandas(
    ...     df, keys=["Country", "Date"], table_name="Prices"
    ... )
    >>> cube = session.create_cube(table)
    

    This MDX:

    >>> mdx = (
    ...     "SELECT"
    ...     "  NON EMPTY Hierarchize("
    ...     "    DrilldownLevel("
    ...     "      [Prices].[Country].[ALL].[AllMember]"
    ...     "    )"
    ...     "  ) ON ROWS,"
    ...     "  NON EMPTY Crossjoin("
    ...     "    [Measures].[Price.SUM],"
    ...     "    Hierarchize("
    ...     "      DrilldownLevel("
    ...     "        [Prices].[Date].[ALL].[AllMember]"
    ...     "      )"
    ...     "    )"
    ...     "  ) ON COLUMNS"
    ...     "  FROM [Prices]"
    ... )
    

    Returns this DataFrame:

    >>> session.query_mdx(mdx, keep_totals=True)
                       Price.SUM
    Date       Country
    Total               3,510.00
    2020-01-01            840.00
    2020-02-02          1,860.00
    2020-03-03            810.00
               China      410.00
    2020-01-01 China
    2020-02-02 China
    2020-03-03 China      410.00
               France   1,380.00
    2020-01-01 France     480.00
    2020-02-02 France     500.00
    2020-03-03 France     400.00
               India      760.00
    2020-01-01 India      360.00
    2020-02-02 India      400.00
    2020-03-03 India
               UK         960.00
    2020-01-01 UK
    2020-02-02 UK         960.00
    2020-03-03 UK
    

    But, if it was displayed into a pivot table, would look like this:

    Country

    Price.sum

    Total

    2020-01-01

    2020-02-02

    2020-03-03

    Total

    3,510.00

    840.00

    1,860.00

    810.00

    China

    410.00

    410.00

    France

    1,380.00

    480.00

    500.00

    400.00

    India

    760.00

    360.00

    400.00

    UK

    960.00

    960.00

  • keep_totals (bool) – Whether the resulting DataFrame should contain, if they are present in the query result, the grand total and subtotals. Totals can be useful but they make the DataFrame harder to work with since its index will have some empty values.

  • timeout (timedelta) – The amount of time the query execution can take before aborting it.

  • mode (Literal['pretty', 'raw']) –

    The query mode.

    • "pretty" is best for queries returning small results:

      • A QueryResult will be returned and its rows will be sorted according to the level order.

    • "raw" is best for benchmarks or large exports:

      • A faster and more efficient endpoint reducing the data transfer from Java to Python will be used.

      • A classic pandas.DataFrame will be returned.

      • include_totals="True" will not be allowed.

      • The Convert to Widget Below action provided by the atoti-jupyterlab3 plugin will not be available.

  • context (Mapping[str, Any]) – Context values to use when executing the query.

Return type:

DataFrame

read_arrow(table, /, *, table_name, keys=(), partitioning=None, types={}, default_values={}, **kwargs)#

Read an Arrow Table into a table.

Parameters:
  • table (Table) – The Arrow Table to load.

  • table_name (str) – The name of the table to create.

  • keys (Iterable[str]) –

    The columns that will become keys of the table.

    Inserting a row containing key values equal to the ones of an existing row will replace the existing row with the new one.

    Key columns cannot have None as their default_value.

  • partitioning (str | None) –

    The description of how the data will be split across partitions of the table.

    Default rules:

    • Only non-joined tables are automatically partitioned.

    • Tables are automatically partitioned by hashing their key columns. If there are no key columns, all the dictionarized columns are hashed.

    • Joined tables can only use a sub-partitioning of the table referencing them.

    • Automatic partitioning is done modulo the number of available cores.

    Example

    hash4(country) splits the data across 4 partitions based on the country column’s hash value.

  • types (Mapping[str, DataType]) – Types for some or all columns of the table. Types for non specified columns will be inferred from arrow DataTypes.

  • default_values (Mapping[str, ConstantValue | None]) – Mapping from column name to column default_value.

Return type:

Table

Example

>>> import pyarrow as pa
>>> arrow_table = pa.Table.from_arrays(
...     [
...         pa.array(["phone", "headset", "watch"]),
...         pa.array([600.0, 80.0, 250.0]),
...     ],
...     names=["Product", "Price"],
... )
>>> arrow_table
pyarrow.Table
Product: string
Price: double
----
Product: [["phone","headset","watch"]]
Price: [[600,80,250]]
>>> table = session.read_arrow(
...     arrow_table, keys=["Product"], table_name="Arrow"
... )
>>> table.head().sort_index()
         Price
Product
headset   80.0
phone    600.0
watch    250.0
read_csv(path, /, *, keys=(), table_name=None, separator=',', encoding='utf-8', process_quotes=True, partitioning=None, types={}, columns={}, array_separator=None, date_patterns={}, default_values={}, client_side_encryption=None, **kwargs)#

Read a CSV file into a table.

Parameters:
  • path (str | Path) –

    The path to the CSV file to load.

    .gz, .tar.gz and .zip files containing compressed CSV(s) are also supported.

    The path can also be a glob pattern (e.g. path/to/directory/**.*.csv).

  • keys (Iterable[str]) –

    The columns that will become keys of the table.

    Inserting a row containing key values equal to the ones of an existing row will replace the existing row with the new one.

    Key columns cannot have None as their default_value.

  • table_name (str | None) – The name of the table to create. Required when path is a glob pattern. Otherwise, defaults to the capitalized final component of the path argument.

  • separator (str | None) –

    The character separating the values of each line.

    If None, the separator will be inferred in a preliminary partial read.

  • encoding (str) – The encoding to use to read the CSV.

  • process_quotes (bool | None) –

    Whether double quotes should be processed to follow the official CSV specification:

    • True:

      Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

      • A double quote appearing inside a field must be escaped by preceding it with another double quote.

      • Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.

    • False: all double-quotes within a field will be treated as any regular character, following Excel’s behavior. In this mode, it is expected that fields are not enclosed in double quotes. It is also not possible to have a line break inside a field.

    • None: the behavior will be inferred in a preliminary partial read.

  • partitioning (str | None) –

    The description of how the data will be split across partitions of the table.

    Default rules:

    • Only non-joined tables are automatically partitioned.

    • Tables are automatically partitioned by hashing their key columns. If there are no key columns, all the dictionarized columns are hashed.

    • Joined tables can only use a sub-partitioning of the table referencing them.

    • Automatic partitioning is done modulo the number of available cores.

    Example

    hash4(country) splits the data across 4 partitions based on the country column’s hash value.

  • types (Mapping[str, DataType]) – Types for some or all columns of the table. Types for non specified columns will be inferred from the first 1,000 lines.

  • columns (Mapping[str, str]) – Mapping from file column names to table column names. When the mapping is not empty, columns of the file absent from the mapping keys will not be loaded. Other parameters accepting column names expect to be passed table column names (i.e. values of this mapping) and not file column names.

  • array_separator (str | None) –

    The character separating array elements.

    If not None, any field containing this separator will be parsed as an array.

  • date_patterns (Mapping[str, str]) – A column name to date pattern mapping that can be used when the built-in date parsers fail to recognize the formatted dates in the passed files.

  • default_values (Mapping[str, ConstantValue | None]) – Mapping from column name to column default_value.

  • client_side_encryption (ClientSideEncryptionConfig | None) – The client side encryption configuration to use when loading data.

Return type:

Table

read_numpy(array, /, *, columns, table_name, keys=(), partitioning=None, types={}, default_values={})#

Read a NumPy 2D array into a new table.

Parameters:
  • array (ndarray[Any, dtype[Any]]) – The NumPy 2D ndarray to read the data from.

  • columns (Sequence[str]) – The names to use for the table’s columns. They must be in the same order as the values in the NumPy array.

  • table_name (str) – The name of the table to create.

  • keys (Iterable[str]) –

    The columns that will become keys of the table.

    Inserting a row containing key values equal to the ones of an existing row will replace the existing row with the new one.

    Key columns cannot have None as their default_value.

  • partitioning (str | None) –

    The description of how the data will be split across partitions of the table.

    Default rules:

    • Only non-joined tables are automatically partitioned.

    • Tables are automatically partitioned by hashing their key columns. If there are no key columns, all the dictionarized columns are hashed.

    • Joined tables can only use a sub-partitioning of the table referencing them.

    • Automatic partitioning is done modulo the number of available cores.

    Example

    hash4(country) splits the data across 4 partitions based on the country column’s hash value.

  • types (Mapping[str, DataType]) – Types for some or all columns of the table. Types for non specified columns will be inferred from numpy data types.

  • default_values (Mapping[str, ConstantValue | None]) – Mapping from column name to column default_value.

Return type:

Table

read_pandas(dataframe, /, *, table_name, keys=(), partitioning=None, types={}, default_values={}, **kwargs)#

Read a pandas DataFrame into a table.

All the named indices of the DataFrame are included into the table. Multilevel columns are flattened into a single string name.

Parameters:
  • dataframe (DataFrame) – The DataFrame to load.

  • table_name (str) – The name of the table to create.

  • keys (Iterable[str]) –

    The columns that will become keys of the table.

    Inserting a row containing key values equal to the ones of an existing row will replace the existing row with the new one.

    Key columns cannot have None as their default_value.

  • partitioning (str | None) –

    The description of how the data will be split across partitions of the table.

    Default rules:

    • Only non-joined tables are automatically partitioned.

    • Tables are automatically partitioned by hashing their key columns. If there are no key columns, all the dictionarized columns are hashed.

    • Joined tables can only use a sub-partitioning of the table referencing them.

    • Automatic partitioning is done modulo the number of available cores.

    Example

    hash4(country) splits the data across 4 partitions based on the country column’s hash value.

  • types (Mapping[str, DataType]) – Types for some or all columns of the table. Types for non specified columns will be inferred from pandas dtypes.

  • default_values (Mapping[str, ConstantValue | None]) – Mapping from column name to column default_value.

Return type:

Table

Example

>>> dataframe = pd.DataFrame(
...     columns=["Product", "Price"],
...     data=[
...         ("phone", 600.0),
...         ("headset", 80.0),
...         ("watch", 250.0),
...     ],
... )
>>> table = session.read_pandas(
...     dataframe, keys=["Product"], table_name="Pandas"
... )
>>> table.head().sort_index()
         Price
Product
headset   80.0
phone    600.0
watch    250.0
read_parquet(path, /, *, keys=(), columns={}, table_name=None, partitioning=None, default_values={}, client_side_encryption=None)#

Read a Parquet file into a table.

Parameters:
  • path (str | Path) – The path to the Parquet file. If a path pointing to a directory is provided, all of the files with the .parquet extension in the directory will be loaded into the same table and, as such, they are all expected to share the same schema. The path can also be a glob pattern (e.g. path/to/directory/**.*.parquet).

  • keys (Iterable[str]) –

    The columns that will become keys of the table.

    Inserting a row containing key values equal to the ones of an existing row will replace the existing row with the new one.

    Key columns cannot have None as their default_value.

  • columns (Mapping[str, str]) – Mapping from file column names to table column names. When the mapping is not empty, columns of the file absent from the mapping keys will not be loaded. Other parameters accepting column names expect to be passed table column names (i.e. values of this mapping) and not file column names.

  • table_name (str | None) – The name of the table to create. Required when path is a glob pattern. Otherwise, defaults to the capitalized final component of the path argument.

  • partitioning (str | None) –

    The description of how the data will be split across partitions of the table.

    Default rules:

    • Only non-joined tables are automatically partitioned.

    • Tables are automatically partitioned by hashing their key columns. If there are no key columns, all the dictionarized columns are hashed.

    • Joined tables can only use a sub-partitioning of the table referencing them.

    • Automatic partitioning is done modulo the number of available cores.

    Example

    hash4(country) splits the data across 4 partitions based on the country column’s hash value.

  • default_values (Mapping[str, ConstantValue | None]) – Mapping from column name to column default_value.

  • client_side_encryption (ClientSideEncryptionConfig | None) – The client side encryption configuration to use when loading data.

Return type:

Table

read_spark(dataframe, /, *, table_name, keys=(), partitioning=None, default_values={})#

Read a Spark DataFrame into a table.

Parameters:
  • dataframe (object) – The DataFrame to load.

  • table_name (str) – The name of the table to create.

  • keys (Iterable[str]) –

    The columns that will become keys of the table.

    Inserting a row containing key values equal to the ones of an existing row will replace the existing row with the new one.

    Key columns cannot have None as their default_value.

  • partitioning (str | None) –

    The description of how the data will be split across partitions of the table.

    Default rules:

    • Only non-joined tables are automatically partitioned.

    • Tables are automatically partitioned by hashing their key columns. If there are no key columns, all the dictionarized columns are hashed.

    • Joined tables can only use a sub-partitioning of the table referencing them.

    • Automatic partitioning is done modulo the number of available cores.

    Example

    hash4(country) splits the data across 4 partitions based on the country column’s hash value.

  • default_values (Mapping[str, ConstantValue | None]) – Mapping from column name to column default_value.

Return type:

Table

read_sql(sql, /, *, url, table_name, driver=None, keys=(), partitioning=None, types={}, default_values={})#

Create a table from the result of the passed SQL query.

Note

This method requires the atoti-sql plugin.

Parameters:
  • sql (str) – The result of this SQL query will be loaded into the table.

  • url (str) –

    The JDBC connection string of the database. The jdbc: prefix is optional but the database specific part (such as h2: or mysql:) is mandatory. For instance:

    • h2:file:/home/user/database/file/path;USER=username;PASSWORD=passwd

    • mysql://localhost:7777/example?user=username&password=passwd

    • postgresql://postgresql.db.server:5430/example?user=username&password=passwd

    More examples can be found here.

  • driver (str | None) – The JDBC driver used to load the data. If None, the driver is inferred from the URL. Drivers can be found in the atoti_sql.drivers module.

  • table_name (str) – The name of the table to create.

  • keys (Iterable[str]) –

    The columns that will become keys of the table.

    Inserting a row containing key values equal to the ones of an existing row will replace the existing row with the new one.

    Key columns cannot have None as their default_value.

  • partitioning (str | None) –

    The description of how the data will be split across partitions of the table.

    Default rules:

    • Only non-joined tables are automatically partitioned.

    • Tables are automatically partitioned by hashing their key columns. If there are no key columns, all the dictionarized columns are hashed.

    • Joined tables can only use a sub-partitioning of the table referencing them.

    • Automatic partitioning is done modulo the number of available cores.

    Example

    hash4(country) splits the data across 4 partitions based on the country column’s hash value.

  • types (Mapping[str, DataType]) – Types for some or all columns of the table. Types for non specified columns will be inferred from the SQL types.

  • default_values (Mapping[str, ConstantValue | None]) – Mapping from column name to column default_value.

Return type:

Table

Example

>>> table = session.read_sql(
...     "SELECT * FROM MYTABLE;",
...     url=f"h2:file:{RESOURCES}/h2-database;USER=root;PASSWORD=pass",
...     table_name="Cities",
...     keys=["ID"],
... )
>>> len(table)
5
property scenarios: Sequence[str]#

Names of the source scenarios of the session.

property security: Security#
start_transaction(scenario_name='Base')#

Start a transaction to batch several table operations.

  • It is more efficient than doing each table operation one after the other.

  • It avoids possibly incorrect intermediate states (e.g. if loading some new data requires dropping existing rows first).

Note

Some operations are not allowed during a transaction:

Parameters:

scenario_name (str) – The name of the source scenario impacted by all the table operations inside the transaction.

Return type:

Transaction

Example

>>> df = pd.DataFrame(
...     columns=["City", "Price"],
...     data=[
...         ("Berlin", 150.0),
...         ("London", 240.0),
...         ("New York", 270.0),
...         ("Paris", 200.0),
...     ],
... )
>>> table = session.read_pandas(
...     df, keys=["City"], table_name="start_transaction example"
... )
>>> cube = session.create_cube(table)
>>> extra_df = pd.DataFrame(
...     columns=["City", "Price"],
...     data=[
...         ("Singapore", 250.0),
...     ],
... )
>>> with session.start_transaction():
...     table += ("New York", 100.0)
...     table.drop(table["City"] == "Paris")
...     table.load_pandas(extra_df)
...
>>> table.head().sort_index()
           Price
City
Berlin     150.0
London     240.0
New York   100.0
Singapore  250.0
property tables: Tables#

Tables of the session.

visualize(name=None)#

Display an Atoti widget to explore the session interactively.

Note

This method requires the atoti-jupyterlab3 plugin.

The widget state will be stored in the cell metadata. This state should not have to be edited but, if desired, it can be found in JupyterLab by opening the “Notebook tools” sidebar and expanding the “Advanced Tools” section.

Parameters:

name (str | None) – The name to give to the widget.

Return type:

object

wait()#

Wait for the underlying server subprocess to terminate.

This will prevent the Python process to exit.