Advanced Store Manipulations

[1]:
import atoti as tt

session = tt.create_session()
store = session.read_csv("data/example.csv", keys=["ID"], store_name="MyStore")
cube = session.create_cube(store, "FirstCube")
cube.query()
[1]:
Price.AVG Price.SUM Quantity.AVG Quantity.SUM contributors.COUNT
0 428.0 4280.0 2270.0 22700.0 10

Store description

The lengh of a store is the number of rows in the store

[2]:
len(store)
[2]:
10

You can get the shape of the store, which is the number of columns and rows of the store

[3]:
store.shape
[3]:
{'rows': 10, 'columns': 8}

The description of the columns and types of the store

[4]:
store
[4]:
  • MyStore
    • ID
      • key: True
      • nullable: True
      • type: int
    • Date
      • key: False
      • nullable: True
      • type: LocalDate[yyyy-MM-dd]
    • Continent
      • key: False
      • nullable: True
      • type: string
    • Country
      • key: False
      • nullable: True
      • type: string
    • City
      • key: False
      • nullable: True
      • type: string
    • Color
      • key: False
      • nullable: True
      • type: string
    • Quantity
      • key: False
      • nullable: True
      • type: double
    • Price
      • key: False
      • nullable: True
      • type: double

Sampling mode

When loading large datasets into a store it is possible to sample the data. There are currently 3 mode supported for sampling:

  • first_lines : keeps only the given amount of lines.

  • first_files : keeps only the given amount of files.

  • FULL : Loads everything

Sampling the data helps having a very responsive project during the design phase, when this design is over it is possible to load everything by calling session.load_all_data

Sampling mode can be defined in the session for all the stores or store by store.

[5]:
import atoti as tt

from atoti.sampling import first_lines

sample_store = session.read_csv(
    "data/example.csv", keys=["ID"], store_name="sampled", sampling_mode=first_lines(5)
)
len(sample_store)
The store has been sampled because there is more than 5 lines in the files to load. To trigger the full load of your data call `load_all_data` on your session.
[5]:
5
[6]:
# Do some design such as adding a cube, some measures...
cube = session.create_cube(sample_store, "Sampling")
cube.query()
[6]:
Price.AVG Price.SUM Quantity.AVG Quantity.SUM contributors.COUNT
0 452.0 2260.0 2100.0 10500.0 5
[7]:
session.load_all_data()
len(sample_store)
[7]:
10
[8]:
len(sample_store)
cube.query()
[8]:
Price.AVG Price.SUM Quantity.AVG Quantity.SUM contributors.COUNT
0 428.0 4280.0 2270.0 22700.0 10

Specify other types for the columns

Atoti automatically detects the type of the columns. It is possible to bypass this behaviour and specify the types of some columns manually.

[9]:
types = {
    "ID": tt.types.DOUBLE,
    "City": tt.types.STRING,
    "Quantity": tt.types.STRING,
    "Price": tt.types.FLOAT,
}
[10]:
custom_store = session.read_csv(
    "data/example.csv", keys=["ID"], store_name="Custom", types=types
)
custom_store
[10]:
  • Custom
    • ID
      • key: True
      • nullable: False
      • type: double
    • Date
      • key: False
      • nullable: True
      • type: LocalDate[yyyy-MM-dd]
    • Continent
      • key: False
      • nullable: True
      • type: string
    • Country
      • key: False
      • nullable: True
      • type: string
    • City
      • key: False
      • nullable: True
      • type: string
    • Color
      • key: False
      • nullable: True
      • type: string
    • Quantity
      • key: False
      • nullable: True
      • type: string
    • Price
      • key: False
      • nullable: False
      • type: float

Insert new rows

New records can be inserted into the store. If a record has the same key columns as an existing record, the previous record will be overriden.

[11]:
# New key
store.insert_rows(
    (11, "2019-03-01", "Europe", "Germany", "Berlin", "yellow", 1000, 400)
)

# Existing key
store += (1, "2019-03-01", "Europe", "France", "Paris", "red", 2000, 600)

store.head()
[11]:
Date Continent Country City Color Quantity Price
ID
1 2019-03-01 Europe France Paris red 2000.0 600.0
2 2019-01-02 Europe France Lyon red 2000.0 400.0
3 2019-01-05 Europe France Paris blue 3000.0 420.0
4 2018-01-01 Europe France Bordeaux blue 1500.0 480.0
5 2019-01-01 Europe UK London green 3000.0 460.0

Append data to a store

A CSV file with the same structure as the initial source can be appended into an existing store:

[12]:
store.load_csv("data/additional_example.csv")
store.head()
[12]:
Date Continent Country City Color Quantity Price
ID
1 2019-03-01 Europe France Paris red 2000.0 600.0
2 2019-01-02 Europe France Lyon red 2000.0 400.0
3 2019-01-05 Europe France Paris blue 3000.0 420.0
4 2018-01-01 Europe France Bordeaux blue 1500.0 480.0
5 2019-01-01 Europe UK London green 3000.0 460.0
[13]:
len(store)
[13]:
14

Join stores

Stores can be joined together using a mapping between columns. The mapping columns of the target store must be included in its key columns. If no mapping is specified, the columns with the same names are used.

[14]:
capital_store = session.read_csv(
    "data/capitals.csv", keys=["Country name"], store_name="Capital"
)
capital_store.head()
[14]:
Capital
Country name
France Paris
UK London
China Beijing
India Dehli
[15]:
store.join(capital_store, mapping={"Country": "Country name"})
[16]:
cube = session.create_cube(store, "Cube")