Array measures¶

Atoti is optimized to handle array data

Loading arrays from CSV¶

Atoti can load array from CSV files. The separator for array elements must be provided to the read_csv method, and the CSV columns must use another separator. All the arrays should have the same length.

[1]:

import atoti as tt

session = tt.create_session()

[2]:

store = session.read_csv(
    "data/arrays.csv", keys=["TradeId"], store_name="Store With Arrays", array_sep=";"
)
store.head()

[2]:

	Continent	Country	City	PnL	PnlArray	IntArray
TradeId
0	Europe	France	Paris	3.469516	doubleVector[10]{-0.46511920664962714, ...}	intVector[10]{1, ...}
1	Europe	France	Paris	92.851919	doubleVector[10]{10.587927935842618, ...}	intVector[10]{11, ...}
2	Europe	France	Lyon	425.866214	doubleVector[10]{68.43655816283167, ...}	intVector[10]{19, ...}
3	Europe	France	Bordeaux	454.127060	doubleVector[10]{-149.61978026139195, ...}	intVector[10]{14, ...}
4	Europe	UK	London	187.015613	doubleVector[10]{11.449089651224922, ...}	intVector[10]{13, ...}

[3]:

cube = session.create_cube(store, "Cube")

Arrays default aggregations¶

Atoti provides default aggregation functions on arrays: SUM and AVG. They are applied element by element on the array.

[4]:

lvl = cube.levels
m = cube.measures
cube.query(m["PnlArray.SUM"], levels=lvl["Continent"])

[4]:

	PnlArray.SUM
Continent
Asia	[-14.696414629727794, -35.38205147348333, -62....
Europe	[6.711474315042935, -1.4346671713308226, -72.3...

[5]:

cube.query(m["PnlArray.AVG"], levels=lvl["Continent"])

[5]:

	PnlArray.AVG
Continent
Asia	[-2.939282925945559, -7.076410294696666, -12.5...
Europe	[1.1185790525071557, -0.23911119522180374, -12...

Additional array functions¶

Sum, Average, Min or Max of all the array elements¶

[6]:

m["sum vect"] = tt.array.sum(m["PnlArray.SUM"])
cube.query(m["sum vect"], levels=lvl["Continent"])

[6]:

	sum vect
Continent
Asia	357.870873
Europe	49.101546

[7]:

m["avg vect"] = tt.array.avg(m["PnlArray.SUM"])
cube.query(m["avg vect"], levels=lvl["Continent"])

[7]:

	avg vect
Continent
Asia	35.787087
Europe	4.910155

[8]:

m["min vect"] = tt.array.min(m["PnlArray.SUM"])
cube.query(m["min vect"], levels=lvl["Continent"])

[8]:

	min vect
Continent
Asia	-222.821477
Europe	-86.079550

[9]:

m["max vect"] = tt.array.max(m["PnlArray.SUM"])
cube.query(m["max vect"], levels=lvl["Continent"])

[9]:

	max vect
Continent
Asia	232.607605
Europe	195.824737

Length¶

[10]:

m["length"] = tt.array.len(m["PnlArray.SUM"])
cube.query(m["length"])

[10]:

	length
0	10

Variance and Standard Deviation¶

[11]:

m["variance"] = tt.array.variance(m["PnlArray.SUM"])
m["standard deviation"] = tt.array.std(m["PnlArray.SUM"])
cube.query(m["variance"], m["standard deviation"], levels=lvl["Continent"])

[11]:

	variance	standard deviation
Continent
Asia	16149.335147	127.080034
Europe	6952.275020	83.380304

Sort¶

[12]:

m["sort"] = tt.array.sort(m["PnlArray.SUM"])
cube.query(m["sort"], levels=lvl["Continent"])

[12]:

	sort
Continent
Asia	[-222.82147745688468, -62.89214785725346, -35....
Europe	[-86.07955036608166, -72.30864853533133, -71.1...

Percentile¶

[13]:

m["95 percentile"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "simple")
m["95 exc percentile"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "exc")
m["95 inc percentile"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "inc")
m["95 centered percentile"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "centered")
cube.query(
    m["95 percentile"],
    m["95 exc percentile"],
    m["95 inc percentile"],
    m["95 centered percentile"],
    levels=[lvl["Continent"], lvl["Country"]],
)

[13]:

		95 percentile	95 exc percentile	95 inc percentile	95 centered percentile
Continent	Country
Asia	China	195.220094	256.014079	201.299492	256.014079
Asia	India	22.352927	25.270438	22.644678	25.270438
Europe	France	128.173379	163.731662	131.729208	163.731662
Europe	UK	112.243790	146.715692	115.690980	146.715692

[14]:

m["95 linear"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "linear")
m["95 lower"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "lower")
m["95 higher"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "higher")
m["95 nearest"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "nearest")
m["95 midpoint"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "midpoint")
cube.query(
    m["95 linear"], m["95 lower"], m["95 higher"], m["95 nearest"], m["95 midpoint"]
)

[14]:

	95 linear	95 lower	95 higher	95 nearest	95 midpoint
0	299.288892	290.080802	306.822784	306.822784	298.451793

n greatest / n lowest¶

Returns an array with the n greatest/lowest values of a another array.

[15]:

m["Top 3"] = tt.array.n_greatest(m["PnlArray.SUM"], 3)
cube.query(m["Top 3"])

[15]:

	Top 3
0	[290.0808017776343, 306.8227843309065, 195.349...

[16]:

m["Bottom 2"] = tt.array.n_lowest(m["PnlArray.SUM"], 2)
cube.query(m["Bottom 2"])

[16]:

	Bottom 2
0	[-135.20079639258478, -308.90102782296634]

nth greatest value / nth lowest value¶

Returns nth greatest or lowest value of a vector

[17]:

m["Third largest value"] = tt.array.nth_greatest(m["PnlArray.SUM"], 3)
cube.query(m["Third largest value"])

[17]:

	Third largest value
0	195.349055

[18]:

m["Second smallest value"] = tt.array.nth_lowest(m["PnlArray.SUM"], 2)
cube.query(m["Second smallest value"])

[18]:

	Second smallest value
0	-135.200796

Scale¶

[19]:

m["scale x10"] = m["PnlArray.SUM"] * 10.0
cube.query(m["scale x10"])

[19]:

	scale x10
0	[-79.8494031468486, -368.16718644814154, -1352...

Element at index¶

Extract the element at a given index

[20]:

m["first element"] = m["PnlArray.SUM"][0]
cube.query(m["first element"], m["PnlArray.SUM"])

[20]:

	first element	PnlArray.SUM
0	-7.98494	[-7.984940314684859, -36.81671864481415, -135....

With the create_parameter_hierarchy function it is possible to create a hierarchy corresponding to the indices of the array. This hierarchy can then be used to “slice” this array and create a measure which depends on the selected index.

[21]:

cube.create_parameter_hierarchy("index", list(range(0, 10)))
m["PnL at index"] = m["PnlArray.SUM"][lvl["index"]]
cube.query(m["PnL at index"], levels=lvl["index"])

[21]:

	PnL at index
index
0	-7.984940
1	-36.816719
2	-135.200796
3	30.371913
4	195.349055
5	306.822784
6	96.072594
7	-308.901028
8	290.080802
9	-22.821246

You can also build non-integer hierarchies and map each member to its index in the hierarchy using the index_measure argument:

[22]:

from datetime import date, timedelta

cube.create_parameter_hierarchy(
    "Vector Dates",
    [date(2020, 1, 1) + timedelta(days=x) for x in range(0, 10)],
    index_measure="Date Index",
)
m["PnL at date"] = m["PnlArray.SUM"][m["Date Index"]]
cube.query(m["Date Index"], m["PnL at date"], levels=lvl["Vector Dates"])

[22]:

	Date Index	PnL at date
Vector Dates
2020-01-01	0	-7.984940
2020-01-02	1	-36.816719
2020-01-03	2	-135.200796
2020-01-04	3	30.371913
2020-01-05	4	195.349055
2020-01-06	5	306.822784
2020-01-07	6	96.072594
2020-01-08	7	-308.901028
2020-01-09	8	290.080802
2020-01-10	9	-22.821246

In cases the indices also need to be of arbitrary order or range, it is also possible to manually provide them as a list.

[23]:

cube.create_parameter_hierarchy(
    "Custom Vector Dates",
    [date(2020, 1, 1) + timedelta(days=x) for x in range(0, 10)],
    indices=[9, 8, 7, 6, 5, 0, 1, 2, 3, 4],
    index_measure="Custom Date Index",
)
m["PnL at custom date"] = m["PnlArray.SUM"][m["Custom Date Index"]]
cube.query(
    m["Custom Date Index"], m["PnL at custom date"], levels=lvl["Custom Vector Dates"]
)

[23]:

	Custom Date Index	PnL at custom date
Custom Vector Dates
2020-01-01	9	-22.821246
2020-01-02	8	290.080802
2020-01-03	7	-308.901028
2020-01-04	6	96.072594
2020-01-05	5	306.822784
2020-01-06	0	-7.984940
2020-01-07	1	-36.816719
2020-01-08	2	-135.200796
2020-01-09	3	30.371913
2020-01-10	4	195.349055

Sub-arrays¶

Extract a slice of the array

[24]:

m["first 2 elements"] = m["PnlArray.SUM"][0:2]
cube.query(m["first 2 elements"], m["PnlArray.SUM"])

[24]:

	first 2 elements	PnlArray.SUM
0	[-7.984940314684859, -36.81671864481415]	[-7.984940314684859, -36.81671864481415, -135....