atoti.store module

class atoti.store.Store(_name, _java_api, _scenario='Base', _columns=<factory>)

Bases: atoti._repr_utils.ReprJsonable

Represents a single store.

append(*rows, in_all_scenarios=False)

Add one or multiple rows to the store.

If a row with the same keys already exist in the store, it will be overridden by the passed one.

Parameters
  • rows (Union[Tuple[Any, …], Mapping[str, Any]]) –

    The rows to add. Rows can either be:

    • Tuples of values in the correct order.

    • Column name to value mappings.

    All rows must share the shame shape.

  • in_all_scenarios (bool) – Whether or not the data should be loaded into all of the store’s scenarios or only the current one.

property columns

Columns of the stores.

Return type

Sequence[str]

drop(*coordinates, in_all_scenarios=False)

Delete rows where the values for each column match those specified.

Each set of coordinates can only contain one value for each column. To specify mulitple values for one column, mulitple mappings must be passed.

Parameters
  • coordinates (Mapping[str, Any]) – Mappings between store columns and values. Rows which match the provided mappings will be deleted from the store.

  • in_all_scenarios (bool) – Whether or not the rows should be dropped on all of the store’s scenarios or just the current one.

head(n=5)

Return n rows of the store as a pandas DataFrame.

Return type

DataFrame

join(other, *, mapping=None)

Define a reference between this store and another.

There are two different possible situations when creating references:

  • All the key columns of the other store are mapped: this is a normal reference.

  • Only some of the key columns of the other store are mapped: this is a partial reference:

    • The columns from the base store used in the mapping must be attached to hierarchies.

    • The un-mapped key columns of the other store will be converted into hierarchies.

Depending on the cube creation mode, the join will also generate different hierarchies and measures:

  • manual: The un-mapped keys of the other store will become hierarchies.

  • no_measures: All of the non-numeric columns from the other store, as well as those containing integers, will be converted into hierarchies. No measures will be created in this mode.

  • auto: The same hierarchies will be created as in the no_measures mode. Additionally, columns of the base store containing numeric values, or arrays, except for columns which contain only integers, will be converted into measures. Columns of the other store with these types will not be converted into measures.

Parameters
  • other (Store) – The other store to reference.

  • mapping (Optional[Mapping[str, str]]) – The column mapping of the reference. Defaults to the columns with the same names in the two stores.

property keys

Names of the key columns of the stores.

Return type

Sequence[str]

load_csv(path, *, sep=None, encoding='utf-8', process_quotes=True, in_all_scenarios=False, truncate=False, watch=False, array_sep=None)

Load a CSV into this scenario.

Parameters
  • path (Union[Path, str]) –

    The path to the CSV file or directory to load.

    If a path pointing to a directory is provided, all of the files with the .csv extension in the directory and subdirectories will be loaded into the same store and, as such, they are all expected to share the same schema.

    .gz, .tar.gz and .zip files containing compressed CSV(s) are also supported.

    The path can contain glob parameters (e.g. path/to/directory/**.*.csv) and will be expanded correctly. Be careful, when using glob expressions in paths, all files which match the expression will be loaded, regardless of their extension. When the provided path is a directory, the default glob parameter of **.csv is used.

  • sep (Optional[str]) – The delimiter to use. If None, the separator will automatically be detected.

  • encoding (str) – The encoding to use to read the CSV.

  • process_quotes (bool) –

    Whether double quotes should be processed to follow the official CSV specification:

    • True:

      • Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

      • A double quote appearing inside a field must be escaped by preceding it with another double quote.

      • Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.

    • False: all double-quotes within a field will be treated as any regular character, following Excel’s behavior. In this mode, it is expected that fields are not enclosed in double quotes. It is also not possible to have a line break inside a field.

    • None: The behavior will be inferred from the first lines of the CSV file.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • truncate (bool) – Whether to clear the store before loading the new data into it.

  • watch (bool) – Whether the source file or directory at the given path should be watched for changes. When set to True, changes to the source will automatically be reflected in the store. If the source is a directory, new files will be loaded into the same store as the initial data and must therefore have the same schema as the initial data as well. Any non-CSV files added to the directory will be ignored.

  • array_sep (Optional[str]) – The delimiter to use for arrays. Setting it to a non-None value will parse all the columns containing this separator as arrays.

load_kafka(bootstrap_server, topic, *, group_id, batch_duration=1000, consumer_config=None, deserializer=KafkaDeserializer(name='io.atoti.loading.kafka.impl.serialization.JsonDeserializer'))

Consume a Kafka topic and stream its records in the store.

Note

This method requires the atoti-kafka plugin.

The records’ key deserializer default to StringDeserializer.

Parameters
  • bootstrap_server (str) – host[:port] that the consumer should contact to bootstrap initial cluster metadata.

  • topic (str) – Topic to subscribe to.

  • group_id (str) – The name of the consumer group to join.

  • batch_duration (int) – Milliseconds spent batching received records before publishing them to the store. If 0, received records are immediately published to the store. Must not be negative.

  • consumer_config (Optional[Mapping[str, str]]) – Mapping containing optional parameters to set up the KafkaConsumer. The list of available params can be found here.

  • deserializer (KafkaDeserializer) – Deserialize Kafka records’ value to atoti store rows. Use atoti_kafka.create_deserializer() to create custom ones.

load_pandas(dataframe, *, in_all_scenarios=False, truncate=False, **kwargs)

Load a pandas DataFrame into this scenario.

Parameters
  • dataframe (DataFrame) – The DataFrame to load.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • truncate (bool) – Whether to clear the store before loading the new data into it.

load_parquet(path, *, in_all_scenarios=False, truncate=False, watch=False)

Load a Parquet file into this scenario.

Parameters
  • path (Union[Path, str]) – The path to the Parquet file or directory. If the path points to a directory, all the files in the directory and subdirectories will be loaded into the store and, as such, are expected to have the same schema as the store and to be Parquet files.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • truncate (bool) – Whether to clear the store before loading the new data into it.

  • watch (bool) – Whether the source file or directory at the given path should be watched for changes. When set to True, changes to the source will automatically be reflected in the store. If the source is a directory, new files will be loaded into the same store as the initial data and must therefore have the same schema as the initial data as well.

load_spark(dataframe, *, in_all_scenarios=False, truncate=False)

Load a Spark DataFrame into this scenario.

Parameters
  • dataframe – The dataframe to load.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • truncate (bool) – Whether to clear the store before loading the new data into it.

load_sql(url, query, *, username, password, driver=None, in_all_scenarios=False, truncate=False)

Load the result of the passed SQL query into the store.

Note

This method requires the atoti-sql plugin.

Parameters
  • url (Union[Path, str]) –

    The URL of the database. For instance:

    • mysql:localhost:7777/example

    • h2:/home/user/database/file/path

  • query (str) – A SQL query which result is used to build a store.

  • username (str) – The username used to connect to the database.

  • password (str) – The password used to connect to the database.

  • driver (Optional[str]) – The JDBC driver used to load the data. If None, the driver is inferred from the URL. Drivers can be found in the atoti_sql.drivers module.

  • in_all_scenarios (bool) – Whether to load the data in all existing scenarios.

  • truncate (bool) – Whether to clear the store before loading the new data into it.

property loading_report

Store loading report.

Return type

StoreReport

property name

Name of the store.

Return type

str

property scenario

Scenario on which the store is.

Return type

NewType()(ScenarioName, str)

property scenarios

All the scenarios the store can be on.

Return type

StoreScenarios

property shape

Shape of the store.

Return type

Mapping[str, int]

property source_simulation_enabled

Whether source simulations are enabled on the store.

Return type

bool

class atoti.store.StoreScenarios(_java_api, _store)

Bases: object

Scenarios of a store.

load_csv(scenario_directory_path, *, sep=None, encoding='utf-8', process_quotes=True, truncate=False, watch=False, array_sep=None, pattern=None, base_scenario_directory='Base')

Load multiple CSV files into the store while automatically generating scenarios.

Loads the data from a directory into multiple scenarios, creating them as necessary, based on the directory’s structure. The contents of each sub-directory of the provided path will be loaded into a scenario with the same name. Here is an example of a valid directory structure:

ScenarioStore
├── Base
│   └── base_data.csv
├── Scenario1
│   └── scenario1_data.csv
└── Scenario2
│    └── scenario2_data.csv

With this structure:

  • The contents of the Base directory are loaded into the base scenario.

  • Two new scenarios are created: Scenario1 and Scenario2, containing respectively the data from scenario1_data.csv and scenario2_data.csv.

Parameters
  • scenario_directory_path (Union[Path, str]) – The path pointing to the directory containing all of the scenarios.

  • sep (Optional[str]) – The delimiter to use. If None, the separator will automatically be detected.

  • encoding (str) – The encoding to use to read the CSV.

  • process_quotes (bool) –

    Whether double quotes should be processed to follow the official CSV specification:

    • True:

      • Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

      • A double quote appearing inside a field must be escaped by preceding it with another double quote.

      • Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.

    • False: all double-quotes within a field will be treated as any regular character, following Excel’s behavior. In this mode, it is expected that fields are not enclosed in double quotes. It is also not possible to have a line break inside a field.

    • None: The behavior will be inferred from the first lines of the CSV file.

  • truncate (bool) – Whether to clear the store before loading the new data into it.

  • watch (bool) – Whether the source file or directory at the given path should be watched for changes. When set to True, changes to the source will automatically be reflected in the store. If the source is a directory, new files will be loaded into the same store as the initial data and must therefore have the same schema as the initial data as well. Any non-CSV files added to the directory will be ignored.

  • array_sep (Optional[str]) – The delimiter to use for arrays. Setting it to a non-None value will parse all the columns containing this separator as arrays.

  • pattern (Optional[str]) – A glob pattern used to specify which files to load in each scenario directory. If no pattern is provided, all files with the .csv extension will be loaded by default.

  • base_scenario_directory (str) – The data from a scenario directory with this name will be loaded into the base scenario and not a new scenario with the original name of the directory.