atoti.table module¶
- class atoti.table.TableScenarios¶
Scenarios of a table.
- class atoti.Table¶
Represents a single table.
- append(*rows)¶
Add one or multiple rows to the table.
If a row with the same keys already exist in the table, it will be overridden by the passed one.
- drop(*coordinates)¶
Delete rows where the values for each column match those specified.
Each set of coordinates can only contain one value for each column. To specify multiple values for one column, multiple mappings must be passed.
- Parameters
coordinates (
Mapping
[str
,Any
]) – Mappings between table columns and values. Rows matching the provided mappings will be deleted from the table. IfNone
, all the rows of the table will be deleted.
Example
>>> df = pd.DataFrame( ... columns=["City", "Price"], ... data=[ ... ("London", 240.0), ... ("New York", 270.0), ... ("Paris", 200.0), ... ], ... ) >>> table = session.read_pandas(df, keys=["City"], table_name="Cities") >>> table.head() Price City London 240.0 New York 270.0 Paris 200.0 >>> table.drop({"City": "Paris"}) >>> table.head() Price City London 240.0 New York 270.0 >>> table.drop() >>> table.head() Empty DataFrame Columns: [Price] Index: []
- Return type
- join(other, *, mapping=None)¶
Define a reference between this table and another.
There are two different possible situations when creating references:
All the key columns of the other table are mapped: this is a normal reference.
Only some of the key columns of the other table are mapped: this is a partial reference:
The columns from the base table used in the mapping must be attached to hierarchies.
The un-mapped key columns of the other table will be converted into hierarchies.
Depending on the cube creation mode, the join will also generate different hierarchies and measures:
manual
: The un-mapped keys of the other table will become hierarchies.no_measures
: All of the key columns and non-numeric columns from the other table will be converted into hierarchies. No measures will be created in this mode.auto
: The same hierarchies will be created as in theno_measures
mode. Additionally, columns of the base table containing numeric values (including arrays), except for columns which are keys, will be converted into measures. Columns of the other table with these types will not be converted into measures.
- load_csv(path, *, columns={}, separator=None, encoding='utf-8', process_quotes=True, array_separator=None, date_patterns={}, client_side_encryption=None)¶
Load a CSV into this scenario.
- Parameters
The path to the CSV file to load.
.gz
,.tar.gz
and.zip
files containing compressed CSV(s) are also supported.The path can also be a glob pattern (e.g.
path/to/directory/**.*.csv
).columns (
Mapping
[str
,str
]) – Mapping from file column names to table column names. When the mapping is not empty, columns of the file absent from the mapping keys will not be loaded. Other parameters accepting column names expect to be passed table column names (i.e. values of this mapping) and not file column names.separator (
Optional
[str
]) – The character separating the values of each line. the separator will be detected automatically.encoding (
str
) – The encoding to use to read the CSV.process_quotes (
bool
) –Whether double quotes should be processed to follow the official CSV specification:
True
:Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.
A double quote appearing inside a field must be escaped by preceding it with another double quote.
Fields containing line breaks, double quotes, and commas should be enclosed in double-quotes.
False
: all double-quotes within a field will be treated as any regular character, following Excel’s behavior. In this mode, it is expected that fields are not enclosed in double quotes. It is also not possible to have a line break inside a field.None
: The behavior will be inferred from the first lines of the CSV file.
array_separator (
Optional
[str
]) – The character separating array elements. Setting it to a non-None
value will parse all the columns containing this separator as arrays.date_patterns (
Mapping
[str
,str
]) – A column name to date pattern mapping that can be used when the built-in date parsers fail to recognize the formatted dates in the passed files.client_side_encryption (
Optional
[ClientSideEncryptionConfig
]) – The client side encryption configuration to use when loading data.
- Return type
- load_kafka(bootstrap_server, topic, *, group_id, batch_duration=1000, consumer_config={}, deserializer=KafkaDeserializer(name='io.atoti.loading.kafka.impl.serialization.JsonDeserializer'))¶
Consume a Kafka topic and stream its records in the table.
Note
This method requires the
atoti-kafka
plugin.The records’ key deserializer default to StringDeserializer.
- Parameters
bootstrap_server (
str
) –host[:port]
that the consumer should contact to bootstrap initial cluster metadata.topic (
str
) – Topic to subscribe to.group_id (
str
) – The name of the consumer group to join.batch_duration (
int
) – Milliseconds spent batching received records before publishing them to the table. If0
, received records are immediately published to the table. Must not be negative.consumer_config (
Mapping
[str
,str
]) – Mapping containing optional parameters to set up the KafkaConsumer. The list of available params can be found here.deserializer (
KafkaDeserializer
) – Deserialize Kafka records’ value to atoti table rows.
- Return type
- load_numpy(array)¶
Load a NumPy 2D array into this scenario.
- load_pandas(dataframe)¶
Load a pandas DataFrame into this scenario.
- load_parquet(path, *, columns={}, client_side_encryption=None)¶
Load a Parquet file into this scenario.
- Parameters
path (
Union
[Path
,str
]) – The path to the Parquet file. If a path pointing to a directory is provided, all of the files with the.parquet
extension in the directory will be loaded into the same table and, as such, they are all expected to share the same schema. The path can also be a glob pattern (e.g.path/to/directory/**.*.parquet
).columns (
Mapping
[str
,str
]) – Mapping from file column names to table column names. When the mapping is not empty, columns of the file absent from the mapping keys will not be loaded. Other parameters accepting column names expect to be passed table column names (i.e. values of this mapping) and not file column names.client_side_encryption (
Optional
[ClientSideEncryptionConfig
]) – The client side encryption configuration to use when loading data.
- Return type
- load_spark(dataframe)¶
Load a Spark DataFrame into this scenario.
- Parameters
dataframe – The dataframe to load.
- Return type
- load_sql(query, *, url, driver=None)¶
Load the result of the passed SQL query into the table.
Note
This method requires the
atoti-sql
plugin.- Parameters
query (
str
) – The result of this SQL query will be loaded into the table.url (
str
) –The JDBC connection URL of the database. The
jdbc:
prefix is optional but the database specific part (such ash2:
ormysql:
) is mandatory. For instance:h2:file:/home/user/database/file/path;USER=username;PASSWORD=passwd
mysql://localhost:7777/example?user=username&password=passwd
postgresql://postgresql.db.server:5430/example?user=username&password=passwd
More examples can be found here.
driver (
Optional
[str
]) – The JDBC driver used to load the data. IfNone
, the driver is inferred from the URL. Drivers can be found in theatoti_sql.drivers
module.
Example
>>> table = session.create_table("Cities", types={"ID": tt.type.INT, "CITY": tt.type.STRING, "MY_VALUE": tt.type.NULLABLE_DOUBLE}, keys=["ID"]) >>> table.load_sql( ... "SELECT * FROM MYTABLE;", ... url=f"h2:file:{RESOURCES}/h2-database;USER=root;PASSWORD=pass", ... ) >>> len(table) 5
- Return type
- property loading_report: atoti.report.TableReport¶
Table loading report.
- Return type
- property scenarios: atoti.table.TableScenarios¶
All the scenarios the table can be on.
- Return type