atoti.table module#

class atoti.table.TableScenarios#

Scenarios of a table.

class atoti.Table#

Represents a single table.

append(*rows)#

Add one or multiple rows to the table.

If a row with the same keys already exist in the table, it will be overridden by the passed one.

Parameters:

rows (tuple[Any, ...] | Mapping[str, Any]) –

The rows to add. Rows can either be:

  • Tuples of values in the correct order.

  • Column name to value mappings.

All rows must share the shame shape.

property columns: Sequence[str]#

Columns of the table.

drop(filter=None, /)#

Delete some of the table’s rows.

Parameters:

filter (Optional[Condition[ColumnIdentifier, ConditionComparisonOperatorBound, Optional[Constant], ConditionCombinationOperatorBound]]) – Rows where this condition evaluates to True will be deleted. If None, all the rows will be deleted.

Example

>>> df = pd.DataFrame(
...     columns=["City", "Price"],
...     data=[
...         ("London", 240.0),
...         ("New York", 270.0),
...         ("Paris", 200.0),
...     ],
... )
>>> table = session.read_pandas(df, keys=["City"], table_name="Cities")
>>> table.head().sort_index()
          Price
City
London    240.0
New York  270.0
Paris     200.0
>>> table.drop((table["City"] == "Paris") | (table["Price"] <= 250.0))
>>> table.head().sort_index()
          Price
City
New York  270.0
>>> table.drop()
>>> table.head()
Empty DataFrame
Columns: [Price]
Index: []
head(n=5)#

Return at most n random rows of the table as a pandas DataFrame.

Return type:

DataFrame

join(other, mapping=None, /)#

Define a join between this table and another.

There are two kinds of joins:

  • full join if all the key columns of the other table are mapped and the joined tables share the same locality (either both Table or both ExternalTable).

  • partial join otherwise.

Depending on the cube creation mode, the join will also generate different hierarchies and measures:

  • manual: No hierarchy is automatically created. For partial joins, creating a hierarchy for each mapped key column is necessary before creating hierarchies for the other columns. Once these required hierarchies exist, hierarchies for the un-mapped key columns of the other table will automatically be created.

  • no_measures: All the key columns and non-numeric columns of the other table will be converted into hierarchies. No measures will be created in this mode.

  • auto: The same hierarchies as in the no_measures mode will be created. Additionally, columns of the base table containing numeric values (including arrays), except for columns which are keys, will be converted into measures. Columns of the other table with these types will not be converted into measures.

Parameters:
  • other (Table) – The other table to join.

  • mapping (Optional[Condition[ColumnIdentifier, Literal['eq'], ColumnIdentifier, Optional[Literal['and']]]]) – An equality-based condition from columns of this table to columns of the other table. If None, the key columns of the other table with the same name as columns in this table will be used.

Example

>>> sales_table = session.create_table(
...     "Sales",
...     types={"ID": tt.STRING, "Product ID": tt.STRING, "Price": tt.INT},
... )
>>> products_table = session.create_table(
...     "Products",
...     types={"ID": tt.STRING, "Name": tt.STRING, "Category": tt.STRING},
... )
>>> sales_table.join(
...     products_table, sales_table["Product ID"] == products_table["ID"]
... )
property keys: Sequence[str]#

Names of the key columns of the table.

load_arrow(table, /)#

Load an Arrow Table into this scenario.

Parameters:

table (Table) – The Arrow Table to load.

load_csv(path, /, *, columns={}, separator=',', encoding='utf-8', process_quotes=True, array_separator=None, date_patterns={}, client_side_encryption=None, **kwargs)#

Load a CSV into this scenario.

Parameters:
  • path (Path | str) –

    The path to the CSV file to load.

    .gz, .tar.gz and .zip files containing compressed CSV(s) are also supported.

    The path can also be a glob pattern (e.g. path/to/directory/**.*.csv).

  • columns (Mapping[str, str]) – Mapping from file column names to table column names. When the mapping is not empty, columns of the file absent from the mapping keys will not be loaded. Other parameters accepting column names expect to be passed table column names (i.e. values of this mapping) and not file column names.

  • separator (str | None) –

    The character separating the values of each line.

    If None, the separator will be inferred in a preliminary partial read.

  • encoding (str) – The encoding to use to read the CSV.

  • process_quotes (bool | None) –

    Whether double quotes should be processed to follow the official CSV specification:

    • True:

      Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

      • A double quote appearing inside a field must be escaped by preceding it with another double quote.

      • Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.

    • False: all double-quotes within a field will be treated as any regular character, following Excel’s behavior. In this mode, it is expected that fields are not enclosed in double quotes. It is also not possible to have a line break inside a field.

    • None: the behavior will be inferred in a preliminary partial read.

  • array_separator (str | None) –

    The character separating array elements.

    If not None, any field containing this separator will be parsed as an array.

  • date_patterns (Mapping[str, str]) – A column name to date pattern mapping that can be used when the built-in date parsers fail to recognize the formatted dates in the passed files.

  • client_side_encryption (ClientSideEncryptionConfig | None) – The client side encryption configuration to use when loading data.

load_kafka(bootstrap_server, topic, *, group_id, batch_duration=1000, consumer_config={})#

Consume a Kafka topic and stream its records in the table.

Note

This method requires the atoti-kafka plugin.

The records’ key deserializer default to StringDeserializer.

The records’ message must be a JSON object with columns’ name as keys.

Parameters:
  • bootstrap_server (str) – host[:port] that the consumer should contact to bootstrap initial cluster metadata.

  • topic (str) – Topic to subscribe to.

  • group_id (str) – The name of the consumer group to join.

  • batch_duration (int) – Milliseconds spent batching received records before publishing them to the table. If 0, received records are immediately published to the table. Must not be negative.

  • consumer_config (Mapping[str, str]) – Mapping containing optional parameters to set up the KafkaConsumer. The list of available params can be found here.

load_numpy(array, /)#

Load a NumPy 2D array into this scenario.

Parameters:

array (ndarray[Any, dtype[Any]]) – The 2D array to load.

load_pandas(dataframe, /)#

Load a pandas DataFrame into this scenario.

Parameters:

dataframe (DataFrame) – The DataFrame to load.

load_parquet(path, /, *, columns={}, client_side_encryption=None)#

Load a Parquet file into this scenario.

Parameters:
  • path (Path | str) – The path to the Parquet file. If a path pointing to a directory is provided, all of the files with the .parquet extension in the directory will be loaded into the same table and, as such, they are all expected to share the same schema. The path can also be a glob pattern (e.g. path/to/directory/**.*.parquet).

  • columns (Mapping[str, str]) – Mapping from file column names to table column names. When the mapping is not empty, columns of the file absent from the mapping keys will not be loaded. Other parameters accepting column names expect to be passed table column names (i.e. values of this mapping) and not file column names.

  • client_side_encryption (ClientSideEncryptionConfig | None) – The client side encryption configuration to use when loading data.

load_spark(dataframe, /)#

Load a Spark DataFrame into this scenario.

Parameters:

dataframe – The dataframe to load.

load_sql(sql, *, url, driver=None)#

Load the result of the passed SQL query into the table.

Note

This method requires the atoti-sql plugin.

Parameters:
  • sql (str) – The result of this SQL query will be loaded into the table.

  • url (str) –

    The JDBC connection string of the database. The jdbc: prefix is optional but the database specific part (such as h2: or mysql:) is mandatory. For instance:

    • h2:file:/home/user/database/file/path;USER=username;PASSWORD=passwd

    • mysql://localhost:7777/example?user=username&password=passwd

    • postgresql://postgresql.db.server:5430/example?user=username&password=passwd

    More examples can be found here.

  • driver (str | None) – The JDBC driver used to load the data. If None, the driver is inferred from the URL. Drivers can be found in the atoti_sql.drivers module.

Example

>>> table = session.create_table("Cities", types={"ID": tt.INT, "CITY": tt.STRING, "MY_VALUE": tt.DOUBLE}, keys=["ID"])
>>> table.load_sql(
...     "SELECT * FROM MYTABLE;",
...     url=f"h2:file:{RESOURCES}/h2-database;USER=root;PASSWORD=pass",
... )
>>> len(table)
5
property name: str#

Name of the table.

property scenario: str#

Scenario on which the table is.

property scenarios: TableScenarios#

All the scenarios the table can be on.