atoti.sampling module

Sampling modes describe how data is loaded into stores.

atoti can handle very large volumes of data while still providing fast answers to queries. However, loading a large amount of data during the modeling phase of the application is rarely a good idea because creating stores, joins, cubes, hierarchies and measures are all operations that take more time when there is more data.

atoti speeds up the sampling process by incoporating an automated sampling mechanism.

For instance, datasets can be automatically sampled on their first lines while working on the model and then switched to the full dataset when the project is ready to be shared with other users.

By reducing the amount of data, sampling is a way to have immediate feedback for each cell run in a notebook and keep the modeling phase as snappy as possible.

As a rule of thumb:

  • sampling is always recommended while building a project.

  • load_all_data() should be called as late as possible.

atoti.sampling.FULL = SamplingMode(name='full', parameters=[])

Load all the data in all the stores.

class atoti.sampling.SamplingMode(name, parameters)

Bases: atoti.config._utils.Configuration

Mode of source loading.

name: str

Name of the sampling mode.

parameters: List[Any]

Sampling parameters (number of lines, number of files, …).

atoti.sampling.first_files(limit)

Mode to load only the first files of the source.

Parameters

limit (int) – The maximum number of files to read.

Return type

SamplingMode

atoti.sampling.first_lines(limit)

Mode to load only the first lines of the source.

Parameters

limit (int) – The maximum number of lines to read.

Return type

SamplingMode