Advanced Store Manipulations

[1]:
import atoti as tt

session = tt.create_session()
store = session.read_csv("data/example.csv", keys=["ID"], store_name="MyStore")
cube = session.create_cube(store, "FirstCube")
cube.query()
Welcome to atoti 0.4.0!

By using this community edition, you agree with the license available at https://www.atoti.io/eula.
Browse the official documentation at https://docs.atoti.io.
Join the community at https://www.atoti.io/register.

You can hide this message by setting the ATOTI_HIDE_EULA_MESSAGE environment variable to True.
[1]:
Price.MEAN Price.SUM Quantity.MEAN Quantity.SUM contributors.COUNT
0 428.0 4280.0 2270.0 22700.0 10

Store description

The lengh of a store is the number of rows in the store

[2]:
len(store)
[2]:
10

You can get the shape of the store, which is the number of columns and rows of the store

[3]:
store.shape
[3]:
{'rows': 10, 'columns': 8}

The description of the columns and types of the store

[4]:
store
[4]:
  • MyStore
    • ID
      • key: True
      • nullable: True
      • type: int
    • Date
      • key: False
      • nullable: True
      • type: LocalDate[yyyy-MM-dd]
    • Continent
      • key: False
      • nullable: True
      • type: string
    • Country
      • key: False
      • nullable: True
      • type: string
    • City
      • key: False
      • nullable: True
      • type: string
    • Color
      • key: False
      • nullable: True
      • type: string
    • Quantity
      • key: False
      • nullable: True
      • type: double
    • Price
      • key: False
      • nullable: True
      • type: double

Sampling mode

When loading large datasets into a store it is possible to sample the data. There are currently 3 mode supported for sampling:

  • first_lines : keeps only the given amount of lines.

  • first_files : keeps only the given amount of files.

  • FULL : Loads everything

Sampling the data helps having a very responsive project during the design phase, when this design is over it is possible to load everything by calling session.load_all_data

Sampling mode can be defined in the session for all the stores or store by store.

[5]:
from atoti.sampling import first_lines

sample_store = session.read_csv(
    "data/example.csv", keys=["ID"], store_name="sampled", sampling_mode=first_lines(5)
)
len(sample_store)
The store has been sampled because there are more than 5 lines in the files to load. Call Session.load_all_data() to trigger the full load of the data.
[5]:
5
[6]:
# Do some design such as adding a cube, some measures...
cube = session.create_cube(sample_store, "Sampling")
cube.query()
[6]:
Price.MEAN Price.SUM Quantity.MEAN Quantity.SUM contributors.COUNT
0 452.0 2260.0 2100.0 10500.0 5
[7]:
session.load_all_data()
len(sample_store)
[7]:
10
[8]:
len(sample_store)
cube.query()
[8]:
Price.MEAN Price.SUM Quantity.MEAN Quantity.SUM contributors.COUNT
0 428.0 4280.0 2270.0 22700.0 10

Specify other types for the columns

atoti automatically detects the type of the columns. It is possible to bypass this behaviour and specify the types of some columns manually.

[9]:
types = {
    "ID": tt.types.DOUBLE,
    "City": tt.types.STRING,
    "Quantity": tt.types.STRING,
    "Price": tt.types.FLOAT,
}
[10]:
custom_store = session.read_csv(
    "data/example.csv", keys=["ID"], store_name="Custom", types=types
)
custom_store
[10]:
  • Custom
    • ID
      • key: True
      • nullable: False
      • type: double
    • Date
      • key: False
      • nullable: True
      • type: LocalDate[yyyy-MM-dd]
    • Continent
      • key: False
      • nullable: True
      • type: string
    • Country
      • key: False
      • nullable: True
      • type: string
    • City
      • key: False
      • nullable: True
      • type: string
    • Color
      • key: False
      • nullable: True
      • type: string
    • Quantity
      • key: False
      • nullable: True
      • type: string
    • Price
      • key: False
      • nullable: False
      • type: float

Insert new rows

New records can be inserted into the store. If a record has the same key columns as an existing record, the previous record will be overriden.

[11]:
# New key
store.append((11, "2019-03-01", "Europe", "Germany", "Berlin", "yellow", 1000, 400))

# Existing key
store += (1, "2019-03-01", "Europe", "France", "Paris", "red", 2000, 600)

store.head()
[11]:
Date Continent Country City Color Quantity Price
ID
1 2019-03-01 Europe France Paris red 2000.0 600.0
2 2019-01-02 Europe France Lyon red 2000.0 400.0
3 2019-01-05 Europe France Paris blue 3000.0 420.0
4 2018-01-01 Europe France Bordeaux blue 1500.0 480.0
5 2019-01-01 Europe UK London green 3000.0 460.0

Append data to a store

A CSV file with the same structure as the initial source can be appended into an existing store:

[12]:
store.load_csv("data/additional_example.csv")
store.head()
[12]:
Date Continent Country City Color Quantity Price
ID
21 2017-12-25 Europe UK London red 1000.0 500.0
1 2019-03-01 Europe France Paris red 2000.0 600.0
22 2017-12-25 Europe France Lyon blue 2000.0 400.0
2 2019-01-02 Europe France Lyon red 2000.0 400.0
23 2017-12-25 Europe France Paris green 1600.0 500.0
[13]:
len(store)
[13]:
14

Join stores

Stores can be joined together using a mapping between columns. The mapping columns of the target store must be included in its key columns. If no mapping is specified, the columns with the same names are used.

[14]:
capital_store = session.read_csv(
    "data/capitals.csv", keys=["Country name"], store_name="Capital"
)
capital_store.head()
[14]:
Capital
Country name
France Paris
UK London
China Beijing
India Dehli
[15]:
store.join(capital_store, mapping={"Country": "Country name"})
[16]:
cube = session.create_cube(store, "Cube")

See the store schema

It is possible to display the schema of the stores used by a cube, starting from its base store:

[17]:
cube.schema
[17]:
../_images/tutorial_04-Advanced-Store-Manipulations_26_0.svg

and the schema of the all the stores of the session:

[18]:
session.stores.schema
[18]:
../_images/tutorial_04-Advanced-Store-Manipulations_28_0.svg

Automatic scenario creation

If the data you use to feed a store respects a specific directory structure, we are able to automatically split it into different source scenarios. You can control this behavior by using the load_csv method on the store’s scenarios attribute.

[19]:
# Start by creating an empty store
scenario_store = session.create_store(
    {
        "ID": tt.types.INT,
        "Date": tt.types.LOCAL_DATE,
        "Continent": tt.types.STRING,
        "Country": tt.types.STRING,
        "Color": tt.types.STRING,
        "Quantity": tt.types.DOUBLE,
        "Price": tt.types.DOUBLE,
    },
    store_name="ScenarioStore",
)

scenario_store.scenarios.load_csv(
    "data/scenario_directory/", base_scenario_directory="base",
)

As you can see, because the provided directory had two subdiretories, Base and Scenario1, the Base scenario was fed with the data in the base sub directory and a new scenario was created with the data from the Scenario1 sub directory.

[20]:
session.scenarios
[20]:
['scenario1', 'Base']