Array measures¶

atoti is designed to handle array data efficiently.

⚠️ This is an advanced tutorial, make sure to learn the basics first.

Loading arrays from CSV¶

atoti can load array from CSV files. The separator for array elements must be provided to the read_csv method, and the CSV columns must use another separator. All the arrays in a column must have the same length.

[1]:

import atoti as tt

session = tt.create_session()

[2]:

store = session.read_csv(
    "data/arrays.csv", keys=["Trade ID"], store_name="Store with arrays", array_sep=";"
)
store.head()

[2]:

	Continent	Country	City	PnL	PnL array	Int array
Trade ID
0	Europe	France	Paris	3.469516	doubleVector[10]{-0.46511920664962714, ...}	intVector[10]{1, ...}
1	Europe	France	Paris	92.851919	doubleVector[10]{10.587927935842618, ...}	intVector[10]{11, ...}
2	Europe	France	Lyon	425.866214	doubleVector[10]{68.43655816283167, ...}	intVector[10]{19, ...}
3	Europe	France	Bordeaux	454.127060	doubleVector[10]{-149.61978026139195, ...}	intVector[10]{14, ...}
4	Europe	UK	London	187.015613	doubleVector[10]{11.449089651224922, ...}	intVector[10]{13, ...}

[3]:

cube = session.create_cube(store, "Cube")

Arrays default aggregations¶

As for scalar measures, atoti provides the default SUM and MEAN aggregations on array measures. They are applied element by element on the array.

[4]:

lvl = cube.levels
m = cube.measures
cube.query(m["PnL array.SUM"], levels=lvl["Continent"])

[4]:

	PnL array.SUM
Continent
Asia	[-14.696414629727794, -35.38205147348333, -62....
Europe	[6.711474315042935, -1.4346671713308226, -72.3...

[5]:

cube.query(m["PnL array.MEAN"], levels=lvl["Continent"])

[5]:

	PnL array.MEAN
Continent
Asia	[-2.939282925945559, -7.076410294696666, -12.5...
Europe	[1.1185790525071557, -0.23911119522180374, -12...

Additional array functions¶

Arithmetic operations¶

[6]:

m["PnL +10 array"] = m["PnL array.SUM"] + 10.0
m["PnL -10 array"] = m["PnL array.SUM"] - 10.0
m["PnL x10 array"] = m["PnL array.SUM"] * 10.0
m["PnL /10 array"] = m["PnL array.SUM"] / 10.0
cube.query(
    m["PnL +10 array"], m["PnL -10 array"], m["PnL x10 array"], m["PnL /10 array"]
)

[6]:

	PnL +10 array	PnL -10 array	PnL x10 array	PnL /10 array
0	[2.0150596853151406, -26.816718644814152, -125...	[-17.98494031468486, -46.81671864481415, -145....	[-79.8494031468486, -368.16718644814154, -1352...	[-0.798494031468486, -3.681671864481415, -13.5...

Sum, mean, min or max of all the array elements¶

[7]:

m["PnL sum"] = tt.array.sum(m["PnL array.SUM"])
m["PnL mean"] = tt.array.mean(m["PnL array.SUM"])
m["min PnL"] = tt.array.min(m["PnL array.SUM"])
m["max PnL"] = tt.array.max(m["PnL array.SUM"])
cube.query(
    m["PnL sum"], m["PnL mean"], m["min PnL"], m["max PnL"], levels=lvl["Continent"]
)

[7]:

	PnL sum	PnL mean	min PnL	max PnL
Continent
Asia	357.870873	35.787087	-222.821477	232.607605
Europe	49.101546	4.910155	-86.079550	195.824737

Length¶

[8]:

m["PnL array length"] = tt.array.len(m["PnL array.SUM"])
cube.query(m["PnL array length"])

[8]:

	PnL array length
0	10

Variance and Standard Deviation¶

[9]:

m["PnL array variance"] = tt.array.var(m["PnL array.SUM"])
m["PnL array standard deviation"] = tt.array.std(m["PnL array.SUM"])
cube.query(
    m["PnL array variance"], m["PnL array standard deviation"], levels=lvl["Continent"]
)

[9]:

	PnL array variance	PnL array standard deviation
Continent
Asia	16149.335147	127.080034
Europe	6952.275020	83.380304

Sort¶

[10]:

m["Sorted PnL array"] = tt.array.sort(m["PnL array.SUM"])
cube.query(m["Sorted PnL array"], levels=lvl["Continent"])

[10]:

	Sorted PnL array
Continent
Asia	[-222.82147745688468, -62.89214785725346, -35....
Europe	[-86.07955036608166, -72.30864853533133, -71.1...

Quantile¶

[11]:

m["95 quantile"] = tt.array.quantile(m["PnL array.SUM"], 0.95, mode="simple")
m["95 exc quantile"] = tt.array.quantile(m["PnL array.SUM"], 0.95, mode="exc")
m["95 inc quantile"] = tt.array.quantile(m["PnL array.SUM"], 0.95, mode="inc")
m["95 centered quantile"] = tt.array.quantile(m["PnL array.SUM"], 0.95, mode="centered")
cube.query(
    m["95 quantile"],
    m["95 exc quantile"],
    m["95 inc quantile"],
    m["95 centered quantile"],
    levels=[lvl["Continent"], lvl["Country"]],
)

[11]:

		95 quantile	95 exc quantile	95 inc quantile	95 centered quantile
Continent	Country
Asia	China	195.220094	256.014079	201.299492	256.014079
Asia	India	22.352927	25.270438	22.644678	25.270438
Europe	France	128.173379	163.731662	131.729208	163.731662
Europe	UK	112.243790	146.715692	115.690980	146.715692

[12]:

m["95 linear quantile"] = tt.array.quantile(
    m["PnL array.SUM"], 0.95, mode="inc", interpolation="linear"
)
m["95 lower quantile"] = tt.array.quantile(
    m["PnL array.SUM"], 0.95, mode="inc", interpolation="lower"
)
m["95 higher quantile"] = tt.array.quantile(
    m["PnL array.SUM"], 0.95, mode="inc", interpolation="higher"
)
m["95 nearest quantile"] = tt.array.quantile(
    m["PnL array.SUM"], 0.95, mode="inc", interpolation="nearest"
)
m["95 midpoint quantile"] = tt.array.quantile(
    m["PnL array.SUM"], 0.95, mode="inc", interpolation="midpoint"
)
cube.query(
    m["95 linear quantile"],
    m["95 lower quantile"],
    m["95 higher quantile"],
    m["95 nearest quantile"],
    m["95 midpoint quantile"],
)

[12]:

	95 linear quantile	95 lower quantile	95 higher quantile	95 nearest quantile	95 midpoint quantile
0	299.288892	290.080802	306.822784	306.822784	298.451793

n greatest / n lowest¶

Returns an array with the n greatest/lowest values of a another array.

[13]:

m["Top 3 PnL array"] = tt.array.n_greatest(m["PnL array.SUM"], 3)
m["Bottom 2 PnL array"] = tt.array.n_lowest(m["PnL array.SUM"], 2)
cube.query(m["Top 3 PnL array"], m["Bottom 2 PnL array"])

[13]:

	Top 3 PnL array	Bottom 2 PnL array
0	[290.0808017776343, 306.8227843309065, 195.349...	[-135.20079639258478, -308.90102782296634]

nth greatest value / nth lowest value¶

Returns nth greatest or lowest value of a vector

[14]:

m["3rd greatest PnL"] = tt.array.nth_greatest(m["PnL array.SUM"], 3)
m["2nd lowest PnL"] = tt.array.nth_lowest(m["PnL array.SUM"], 2)
cube.query(m["3rd greatest PnL"], m["2nd lowest PnL"])

[14]:

	3rd greatest PnL	2nd lowest PnL
0	195.349055	-135.200796

Element at index¶

Extract the element at a given index:

[15]:

m["First element"] = m["PnL array.SUM"][0]
cube.query(m["First element"], m["PnL array.SUM"])

[15]:

	First element	PnL array.SUM
0	-7.98494	[-7.984940314684859, -36.81671864481415, -135....

With the create_parameter_hierarchy function, it is possible to create a hierarchy corresponding to the indices of the array. This hierarchy can then be used to “slice” this array and create a measure which depends on the selected index.

[16]:

cube.create_parameter_hierarchy("Index", list(range(0, 10)))
m["PnL at index"] = m["PnL array.SUM"][lvl["Index"]]
cube.query(m["PnL at index"], levels=lvl["Index"])

[16]:

	PnL at index
Index
0	-7.984940
1	-36.816719
2	-135.200796
3	30.371913
4	195.349055
5	306.822784
6	96.072594
7	-308.901028
8	290.080802
9	-22.821246

You can also build non-integer hierarchies and map each member to its index in the hierarchy using the index_measure argument:

[17]:

from datetime import date, timedelta

cube.create_parameter_hierarchy(
    "Dates",
    [date(2020, 1, 1) + timedelta(days=x) for x in range(0, 10)],
    index_measure="Date index",
)
m["PnL at date"] = m["PnL array.SUM"][m["Date index"]]
cube.query(m["Date index"], m["PnL at date"], levels=lvl["Dates"])

[17]:

	Date index	PnL at date
Dates
2020-01-01	0	-7.984940
2020-01-02	1	-36.816719
2020-01-03	2	-135.200796
2020-01-04	3	30.371913
2020-01-05	4	195.349055
2020-01-06	5	306.822784
2020-01-07	6	96.072594
2020-01-08	7	-308.901028
2020-01-09	8	290.080802
2020-01-10	9	-22.821246

In cases the indices need to be of arbitrary order or range, it is also possible to manually provide them as a list.

[18]:

cube.create_parameter_hierarchy(
    "Custom dates",
    [date(2020, 1, 1) + timedelta(days=x) for x in range(0, 10)],
    indices=[9, 8, 7, 6, 5, 0, 1, 2, 3, 4],
    index_measure="Custom date index",
)
m["PnL at custom date"] = m["PnL array.SUM"][m["Custom date index"]]
cube.query(m["Custom date index"], m["PnL at custom date"], levels=lvl["Custom dates"])

[18]:

	Custom date index	PnL at custom date
Custom dates
2020-01-01	9	-22.821246
2020-01-02	8	290.080802
2020-01-03	7	-308.901028
2020-01-04	6	96.072594
2020-01-05	5	306.822784
2020-01-06	0	-7.984940
2020-01-07	1	-36.816719
2020-01-08	2	-135.200796
2020-01-09	3	30.371913
2020-01-10	4	195.349055

Array slices¶

Extract a slice of the array:

[19]:

m["First 2 elements"] = m["PnL array.SUM"][0:2]
cube.query(m["First 2 elements"], m["PnL array.SUM"])

[19]:

	First 2 elements	PnL array.SUM
0	[-7.984940314684859, -36.81671864481415]	[-7.984940314684859, -36.81671864481415, -135....

Load DataFrame with lists¶

atoti can load a pandas DataFrame containing NumPy arrays and Python lists

[20]:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "Index": [0, 1, 2],
        "NumPy array": [
            np.array([3.2, 1.0, 8, 9, 4.5, 7, 6, 18]),
            np.array([4.2, 4.0, 4, 9, 4.5, 8, 7, 8]),
            np.array([12, 1.0, 8, 9, 4.5, 7, 6, 18]),
        ],
        "Python list": [
            [3.2, 1.0, 8, 9, 4.5, 7, 6, 18],
            [4.2, 4.0, 4, 9, 4.5, 8, 7, 8],
            [12, 1.0, 8, 9, 4.5, 7, 6, 18],
        ],
    }
)
df

[20]:

	Index	NumPy array	Python list
0	0	[3.2, 1.0, 8.0, 9.0, 4.5, 7.0, 6.0, 18.0]	[3.2, 1.0, 8, 9, 4.5, 7, 6, 18]
1	1	[4.2, 4.0, 4.0, 9.0, 4.5, 8.0, 7.0, 8.0]	[4.2, 4.0, 4, 9, 4.5, 8, 7, 8]
2	2	[12.0, 1.0, 8.0, 9.0, 4.5, 7.0, 6.0, 18.0]	[12, 1.0, 8, 9, 4.5, 7, 6, 18]

[21]:

pd_store = session.read_pandas(df, "pandas")
pd_store

[21]:

pandas
- Index
  - key: False
  - nullable: False
  - type: long
- NumPy array
  - key: False
  - nullable: True
  - type: atoti_numpy_double[][ ]
- Python list
  - key: False
  - nullable: True
  - type: atoti_list_double[][,]

[22]:

pd_store.head()

[22]:

	Index	NumPy array	Python list
0	0	doubleVector[8]{3.2, ...}	doubleVector[8]{3.2, ...}
1	1	doubleVector[8]{4.2, ...}	doubleVector[8]{4.2, ...}
2	2	doubleVector[8]{12.0, ...}	doubleVector[8]{12.0, ...}

[ ]: