atoti.Session.read_parquet()#

Session.read_parquet(path, /, *, keys=(), columns={}, table_name=None, partitioning=None, default_values={}, client_side_encryption=None)#

Read a Parquet file into a table.

Parameters:
  • path (str | Path) – The path to the Parquet file. If a path pointing to a directory is provided, all of the files with the .parquet extension in the directory will be loaded into the same table and, as such, they are all expected to share the same schema. The path can also be a glob pattern (e.g. path/to/directory/**.*.parquet).

  • keys (Iterable[str]) –

    The columns that will become keys of the table.

    Inserting a row containing key values equal to the ones of an existing row will replace the existing row with the new one.

    Key columns cannot have None as their default_value.

  • columns (Mapping[str, str]) – Mapping from file column names to table column names. When the mapping is not empty, columns of the file absent from the mapping keys will not be loaded. Other parameters accepting column names expect to be passed table column names (i.e. values of this mapping) and not file column names.

  • table_name (str | None) – The name of the table to create. Required when path is a glob pattern. Otherwise, defaults to the capitalized final component of the path argument.

  • partitioning (str | None) –

    The description of how the data will be split across partitions of the table.

    Default rules:

    • Only non-joined tables are automatically partitioned.

    • Tables are automatically partitioned by hashing their key columns. If there are no key columns, all the dictionarized columns are hashed.

    • Joined tables can only use a sub-partitioning of the table referencing them.

    • Automatic partitioning is done modulo the number of available cores.

    Example

    hash4(country) splits the data across 4 partitions based on the country column’s hash value.

  • default_values (Mapping[str, ConstantValue | None]) – Mapping from column name to column default_value.

  • client_side_encryption (ClientSideEncryptionConfig | None) – The client side encryption configuration to use when loading data.

Return type:

Table