atoti.Table.load()#
- Table.load(data, /)#
Load data into the table.
This is a blocking operation: the method will not return until all the data is loaded.
Example
>>> from datetime import date >>> table = session.create_table( ... "Sales", ... data_types={ ... "ID": "String", ... "Product": "String", ... "Price": "int", ... "Quantity": "int", ... "Date": "LocalDate", ... }, ... keys={"ID"}, ... )
Loading an Arrow table:
>>> import pyarrow as pa >>> arrow_table = pa.Table.from_pydict( ... { ... "ID": pa.array(["ab", "cd"]), ... "Product": pa.array(["phone", "watch"]), ... "Price": pa.array([699, 349]), ... "Quantity": pa.array([1, 2]), ... "Date": pa.array([date(2024, 3, 5), date(2024, 12, 12)]), ... } ... ) >>> table.load(arrow_table) >>> table.head().sort_index() Product Price Quantity Date ID ab phone 699 1 2024-03-05 cd watch 349 2 2024-12-12
Loading a pandas DataFrame:
>>> import pandas as pd >>> pandas_dataframe = pd.DataFrame( ... { ... "ID": ["ef", "gh"], ... "Product": ["laptop", "book"], ... "Price": [2599, 19], ... "Quantity": [3, 5], ... "Date": [date(2023, 8, 10), date(2024, 1, 13)], ... } ... ) >>> table.load(pandas_dataframe) >>> table.head().sort_index() Product Price Quantity Date ID ab phone 699 1 2024-03-05 cd watch 349 2 2024-12-12 ef laptop 2599 3 2023-08-10 gh book 19 5 2024-01-13
Loading a NumPy array by converting it to a pandas DataFrame:
>>> import numpy as np >>> numpy_array = np.asarray( ... [ ... ["ij", "watch", 299, 1, date(2022, 7, 20)], ... ["kl", "keyboard", 69, 1, date(2023, 5, 8)], ... ], ... dtype=object, ... ) >>> table.load(pd.DataFrame(numpy_array, columns=list(table))) >>> table.head(10).sort_index() Product Price Quantity Date ID ab phone 699 1 2024-03-05 cd watch 349 2 2024-12-12 ef laptop 2599 3 2023-08-10 gh book 19 5 2024-01-13 ij watch 299 1 2022-07-20 kl keyboard 69 1 2023-05-08
Loading a Spark DataFrame by converting it to a pandas DataFrame:
>>> from pyspark.sql import Row, SparkSession >>> spark = SparkSession.builder.getOrCreate() >>> spark_dataframe = spark.createDataFrame( ... [ ... Row( ... ID="mn", ... Product="glasses", ... Price=129, ... Quantity=2, ... Date=date(2021, 3, 3), ... ), ... Row( ... ID="op", ... Product="battery", ... Price=49, ... Quantity=2, ... Date=date(2024, 11, 7), ... ), ... ] ... ) >>> table.load(spark_dataframe.toPandas()) >>> spark.stop() >>> table.head(10).sort_index() Product Price Quantity Date ID ab phone 699 1 2024-03-05 cd watch 349 2 2024-12-12 ef laptop 2599 3 2023-08-10 gh book 19 5 2024-01-13 ij watch 299 1 2022-07-20 kl keyboard 69 1 2023-05-08 mn glasses 129 2 2021-03-03 op battery 49 2 2024-11-07
The += operator is available as syntax sugar to load a single row expressed either as a
tuple
or aMapping
:>>> table += ("qr", "mouse", 29, 3, date(2024, 11, 7)) >>> table.head(10).sort_index() Product Price Quantity Date ID ab phone 699 1 2024-03-05 cd watch 349 2 2024-12-12 ef laptop 2599 3 2023-08-10 gh book 19 5 2024-01-13 ij watch 299 1 2022-07-20 kl keyboard 69 1 2023-05-08 mn glasses 129 2 2021-03-03 op battery 49 2 2024-11-07 qr mouse 29 3 2024-11-07 >>> table += { # The order of the keys does not matter. ... "Product": "screen", ... "Quantity": 1, ... "Price": 599, ... "Date": date(2023, 5, 8), ... "ID": "st", ... } >>> table.head(10).sort_index() Product Price Quantity Date ID ab phone 699 1 2024-03-05 cd watch 349 2 2024-12-12 ef laptop 2599 3 2023-08-10 gh book 19 5 2024-01-13 ij watch 299 1 2022-07-20 kl keyboard 69 1 2023-05-08 mn glasses 129 2 2021-03-03 op battery 49 2 2024-11-07 qr mouse 29 3 2024-11-07 st screen 599 1 2023-05-08
See also
DataLoad
,load_async()
,data_transaction()
, andinfer_data_types()
.