Advanced Store Manipulations¶

[1]:

import atoti as tt

session = tt.create_session()
store = session.read_csv("data/example.csv", keys=["ID"], store_name="MyStore")
cube = session.create_cube(store, "FirstCube")
cube.query()

[1]:

	Price.AVG	Price.SUM	Quantity.AVG	Quantity.SUM	contributors.COUNT
0	428.0	4280.0	2270.0	22700.0	10

Store description¶

The lengh of a store is the number of rows in the store

[2]:

len(store)

[2]:

You can get the shape of the store, which is the number of columns and rows of the store

[3]:

store.shape

[3]:

{'rows': 10, 'columns': 8}

The description of the columns and types of the store

[4]:

store

[4]:

MyStore
- ID
  - key: True
  - nullable: True
  - type: int
- Date
  - key: False
  - nullable: True
  - type: LocalDate[yyyy-MM-dd]
- Continent
  - key: False
  - nullable: True
  - type: string
- Country
  - key: False
  - nullable: True
  - type: string
- City
  - key: False
  - nullable: True
  - type: string
- Color
  - key: False
  - nullable: True
  - type: string
- Quantity
  - key: False
  - nullable: True
  - type: double
- Price
  - key: False
  - nullable: True
  - type: double

Sampling mode¶

When loading large datasets into a store it is possible to sample the data. There are currently 3 mode supported for sampling:

first_lines : keeps only the given amount of lines.
first_files : keeps only the given amount of files.
FULL : Loads everything

Sampling the data helps having a very responsive project during the design phase, when this design is over it is possible to load everything by calling session.load_all_data

Sampling mode can be defined in the session for all the stores or store by store.

[5]:

import atoti as tt

from atoti.sampling import first_lines

sample_store = session.read_csv(
    "data/example.csv", keys=["ID"], store_name="sampled", sampling_mode=first_lines(5)
)
len(sample_store)

The store has been sampled because there is more than 5 lines in the files to load. To trigger the full load of your data call `load_all_data` on your session.

[5]:

[6]:

# Do some design such as adding a cube, some measures...
cube = session.create_cube(sample_store, "Sampling")
cube.query()

[6]:

	Price.AVG	Price.SUM	Quantity.AVG	Quantity.SUM	contributors.COUNT
0	452.0	2260.0	2100.0	10500.0	5

[7]:

session.load_all_data()
len(sample_store)

[7]:

[8]:

len(sample_store)
cube.query()

[8]:

	Price.AVG	Price.SUM	Quantity.AVG	Quantity.SUM	contributors.COUNT
0	428.0	4280.0	2270.0	22700.0	10

Specify other types for the columns¶

Atoti automatically detects the type of the columns. It is possible to bypass this behaviour and specify the types of some columns manually.

[9]:

types = {
    "ID": tt.types.DOUBLE,
    "City": tt.types.STRING,
    "Quantity": tt.types.STRING,
    "Price": tt.types.FLOAT,
}

[10]:

custom_store = session.read_csv(
    "data/example.csv", keys=["ID"], store_name="Custom", types=types
)
custom_store

[10]:

Custom
- ID
  - key: True
  - nullable: False
  - type: double
- Date
  - key: False
  - nullable: True
  - type: LocalDate[yyyy-MM-dd]
- Continent
  - key: False
  - nullable: True
  - type: string
- Country
  - key: False
  - nullable: True
  - type: string
- City
  - key: False
  - nullable: True
  - type: string
- Color
  - key: False
  - nullable: True
  - type: string
- Quantity
  - key: False
  - nullable: True
  - type: string
- Price
  - key: False
  - nullable: False
  - type: float

Insert new rows¶

New records can be inserted into the store. If a record has the same key columns as an existing record, the previous record will be overriden.

[11]:

# New key
store.insert_rows(
    (11, "2019-03-01", "Europe", "Germany", "Berlin", "yellow", 1000, 400)
)

# Existing key
store += (1, "2019-03-01", "Europe", "France", "Paris", "red", 2000, 600)

store.head()

[11]:

	Date	Continent	Country	City	Color	Quantity	Price
ID
1	2019-03-01	Europe	France	Paris	red	2000.0	600.0
2	2019-01-02	Europe	France	Lyon	red	2000.0	400.0
3	2019-01-05	Europe	France	Paris	blue	3000.0	420.0
4	2018-01-01	Europe	France	Bordeaux	blue	1500.0	480.0
5	2019-01-01	Europe	UK	London	green	3000.0	460.0

Append data to a store¶

A CSV file with the same structure as the initial source can be appended into an existing store:

[12]:

store.load_csv("data/additional_example.csv")
store.head()

[12]:

	Date	Continent	Country	City	Color	Quantity	Price
ID
1	2019-03-01	Europe	France	Paris	red	2000.0	600.0
2	2019-01-02	Europe	France	Lyon	red	2000.0	400.0
3	2019-01-05	Europe	France	Paris	blue	3000.0	420.0
4	2018-01-01	Europe	France	Bordeaux	blue	1500.0	480.0
5	2019-01-01	Europe	UK	London	green	3000.0	460.0

[13]:

len(store)

[13]:

Join stores¶

Stores can be joined together using a mapping between columns. The mapping columns of the target store must be included in its key columns. If no mapping is specified, the columns with the same names are used.

[14]:

capital_store = session.read_csv(
    "data/capitals.csv", keys=["Country name"], store_name="Capital"
)
capital_store.head()

[14]:

	Capital
Country name
France	Paris
UK	London
China	Beijing
India	Dehli

[15]:

store.join(capital_store, mapping={"Country": "Country name"})

[16]:

cube = session.create_cube(store, "Cube")