Advanced Store Manipulations¶
[1]:
import atoti as tt
session = tt.create_session()
store = session.read_csv("data/example.csv", keys=["ID"], store_name="MyStore")
cube = session.create_cube(store, "FirstCube")
cube.query()
[1]:
Price.AVG | Price.SUM | Quantity.AVG | Quantity.SUM | contributors.COUNT | |
---|---|---|---|---|---|
0 | 428.0 | 4280.0 | 2270.0 | 22700.0 | 10 |
Store description¶
The lengh of a store is the number of rows in the store
[2]:
len(store)
[2]:
10
You can get the shape of the store, which is the number of columns and rows of the store
[3]:
store.shape
[3]:
{'rows': 10, 'columns': 8}
The description of the columns and types of the store
[4]:
store
[4]:
- MyStore
- ID
- key: True
- nullable: True
- type: int
- Date
- key: False
- nullable: True
- type: LocalDate[yyyy-MM-dd]
- Continent
- key: False
- nullable: True
- type: string
- Country
- key: False
- nullable: True
- type: string
- City
- key: False
- nullable: True
- type: string
- Color
- key: False
- nullable: True
- type: string
- Quantity
- key: False
- nullable: True
- type: double
- Price
- key: False
- nullable: True
- type: double
- ID
Sampling mode¶
When loading large datasets into a store it is possible to sample the data. There are currently 3 mode supported for sampling:
first_lines
: keeps only the given amount of lines.first_files
: keeps only the given amount of files.FULL
: Loads everything
Sampling the data helps having a very responsive project during the design phase, when this design is over it is possible to load everything by calling session.load_all_data
Sampling mode can be defined in the session for all the stores or store by store.
[5]:
import atoti as tt
from atoti.sampling import first_lines
sample_store = session.read_csv(
"data/example.csv", keys=["ID"], store_name="sampled", sampling_mode=first_lines(5)
)
len(sample_store)
The store has been sampled because there is more than 5 lines in the files to load. To trigger the full load of your data call `load_all_data` on your session.
[5]:
5
[6]:
# Do some design such as adding a cube, some measures...
cube = session.create_cube(sample_store, "Sampling")
cube.query()
[6]:
Price.AVG | Price.SUM | Quantity.AVG | Quantity.SUM | contributors.COUNT | |
---|---|---|---|---|---|
0 | 452.0 | 2260.0 | 2100.0 | 10500.0 | 5 |
[7]:
session.load_all_data()
len(sample_store)
[7]:
10
[8]:
len(sample_store)
cube.query()
[8]:
Price.AVG | Price.SUM | Quantity.AVG | Quantity.SUM | contributors.COUNT | |
---|---|---|---|---|---|
0 | 428.0 | 4280.0 | 2270.0 | 22700.0 | 10 |
Specify other types for the columns¶
Atoti automatically detects the type of the columns. It is possible to bypass this behaviour and specify the types of some columns manually.
[9]:
types = {
"ID": tt.types.DOUBLE,
"City": tt.types.STRING,
"Quantity": tt.types.STRING,
"Price": tt.types.FLOAT,
}
[10]:
custom_store = session.read_csv(
"data/example.csv", keys=["ID"], store_name="Custom", types=types
)
custom_store
[10]:
- Custom
- ID
- key: True
- nullable: False
- type: double
- Date
- key: False
- nullable: True
- type: LocalDate[yyyy-MM-dd]
- Continent
- key: False
- nullable: True
- type: string
- Country
- key: False
- nullable: True
- type: string
- City
- key: False
- nullable: True
- type: string
- Color
- key: False
- nullable: True
- type: string
- Quantity
- key: False
- nullable: True
- type: string
- Price
- key: False
- nullable: False
- type: float
- ID
Insert new rows¶
New records can be inserted into the store. If a record has the same key columns as an existing record, the previous record will be overriden.
[11]:
# New key
store.insert_rows(
(11, "2019-03-01", "Europe", "Germany", "Berlin", "yellow", 1000, 400)
)
# Existing key
store += (1, "2019-03-01", "Europe", "France", "Paris", "red", 2000, 600)
store.head()
[11]:
Date | Continent | Country | City | Color | Quantity | Price | |
---|---|---|---|---|---|---|---|
ID | |||||||
1 | 2019-03-01 | Europe | France | Paris | red | 2000.0 | 600.0 |
2 | 2019-01-02 | Europe | France | Lyon | red | 2000.0 | 400.0 |
3 | 2019-01-05 | Europe | France | Paris | blue | 3000.0 | 420.0 |
4 | 2018-01-01 | Europe | France | Bordeaux | blue | 1500.0 | 480.0 |
5 | 2019-01-01 | Europe | UK | London | green | 3000.0 | 460.0 |
Append data to a store¶
A CSV file with the same structure as the initial source can be appended into an existing store:
[12]:
store.load_csv("data/additional_example.csv")
store.head()
[12]:
Date | Continent | Country | City | Color | Quantity | Price | |
---|---|---|---|---|---|---|---|
ID | |||||||
1 | 2019-03-01 | Europe | France | Paris | red | 2000.0 | 600.0 |
2 | 2019-01-02 | Europe | France | Lyon | red | 2000.0 | 400.0 |
3 | 2019-01-05 | Europe | France | Paris | blue | 3000.0 | 420.0 |
4 | 2018-01-01 | Europe | France | Bordeaux | blue | 1500.0 | 480.0 |
5 | 2019-01-01 | Europe | UK | London | green | 3000.0 | 460.0 |
[13]:
len(store)
[13]:
14
Join stores¶
Stores can be joined together using a mapping between columns. The mapping columns of the target store must be included in its key columns. If no mapping is specified, the columns with the same names are used.
[14]:
capital_store = session.read_csv(
"data/capitals.csv", keys=["Country name"], store_name="Capital"
)
capital_store.head()
[14]:
Capital | |
---|---|
Country name | |
France | Paris |
UK | London |
China | Beijing |
India | Dehli |
[15]:
store.join(capital_store, mapping={"Country": "Country name"})
[16]:
cube = session.create_cube(store, "Cube")