Bucketing¶
[1]:
import atoti as tt
session = tt.create_session()
store = session.read_csv("data/example.csv", keys=["ID"], store_name="First store")
cube = session.create_cube(store, "FirstCube")
store
[1]:
- First store
- ID
- key: True
- nullable: True
- type: int
- Date
- key: False
- nullable: True
- type: LocalDate[yyyy-MM-dd]
- Continent
- key: False
- nullable: True
- type: string
- Country
- key: False
- nullable: True
- type: string
- City
- key: False
- nullable: True
- type: string
- Color
- key: False
- nullable: True
- type: string
- Quantity
- key: False
- nullable: True
- type: double
- Price
- key: False
- nullable: True
- type: double
- ID
With Atoti, it’s possible to create buckets:
[2]:
buckets = [
["red", "hot", 1.0],
["blue", "cold", 1.0],
["green", "hot", 0.5],
["green", "cold", 0.5],
]
cube.create_bucketing("First Bucket", ["Color"], buckets)
[2]:
- First Bucket
- Color
- key: True
- nullable: True
- type: string
- First Bucket
- key: True
- nullable: True
- type: string
- First Bucket_weight
- key: False
- nullable: True
- type: double
- Color
[3]:
cube.query()
[3]:
First Bucket_weight | Price.AVG | Price.SUM | Quantity.AVG | Quantity.SUM | contributors.COUNT | |
---|---|---|---|---|---|---|
0 | None | None | None | None | None | 10 |
[4]:
cube.query(levels=cube.levels["First Bucket"])
[4]:
First Bucket_weight | Price.AVG | Price.SUM | Quantity.AVG | Quantity.SUM | contributors.COUNT | |
---|---|---|---|---|---|---|
First Bucket | ||||||
cold | None | 420.000000 | 2520.0 | 2500.0 | 15000.0 | 6 |
hot | None | 428.333333 | 2570.0 | 2450.0 | 14700.0 | 6 |
[5]:
cube.query(levels=[cube.levels["First Bucket"], cube.levels["Color"]])
[5]:
First Bucket_weight | Price.AVG | Price.SUM | Quantity.AVG | Quantity.SUM | contributors.COUNT | ||
---|---|---|---|---|---|---|---|
First Bucket | Color | ||||||
cold | blue | 1.0 | 427.5 | 1710.0 | 2000.0 | 8000.0 | 4 |
green | 0.5 | 405.0 | 810.0 | 3500.0 | 7000.0 | 2 | |
hot | green | 0.5 | 405.0 | 810.0 | 3500.0 | 7000.0 | 2 |
red | 1.0 | 440.0 | 1760.0 | 1925.0 | 7700.0 | 4 |