Watch local files¶
This shows how to watch a local directory and load files in a table as they are added to the directory.
atoti is designed to efficiently handle data updates in its tables. When tables are updated, the new data will automatically be taken into account when querying the session.
Let’s take the example of an application analysing sales of a company. Sales data is stored in one CSV file for each day. Each day, a new CSV containing the sales of the day will be created and we want to load it into the application.
Creating the session¶
Let’s load the CSV files of the
current folder in an atoti session:
import atoti as tt session = tt.Session() sales_table = session.read_csv("data/current/*.csv", table_name="Sales", keys=["ID"]) cube = session.create_cube(sales_table) l, m = cube.levels, cube.measures
At the beginning, there are only 7 rows in the table:
assert len(sales_table) == 7
and just 2 dates:
Starting the file watcher¶
We’ll use the popular watchdog library to watch our directory:
from watchdog.observers.polling import PollingObserver from watchdog.events import FileSystemEventHandler, FileCreatedEvent class AtotiWatcher(FileSystemEventHandler): def on_created(self, event: FileCreatedEvent): try: sales_table.load_csv(event.src_path) except Exception as error: print(error) observer = PollingObserver() observer.schedule(AtotiWatcher(), "data/current") observer.start()
Simulating the arrival of a new file¶
from shutil import copy # Copy the new file ... copy( "data/next/sales_2021_05_03.csv", "data/current", ) # ... and briefly wait until we see that new lines have been added to the table while len(sales_table) <= 7: ...
That’s it! The new file has been loaded and queries on the cube will reflect that as the third date now shows up:
Seeing widgets change in real time¶
We can redo the same operation with the app opened on one side with a pivot table making a real-time query. The widget will rerender automatically to display the new date too:
If you need to preprocess the watched files before loading them into the atoti session, you can do it inside the
on_created()method of the event handler by first reading the file with pd.read_csv(), editing the resulting dataframe, and then passing it to Table.load_pandas().
If you want to do multiple actions (e.g. dropping rows before loading new ones) when a file is updated, you can use Session.start_transaction().