Array measures¶
Atoti is optimized to handle array data
Loading arrays from CSV¶
Atoti can load array from CSV files. The separator for array elements must be provided to the read_csv
method, and the CSV columns must use another separator. All the arrays should have the same length.
[1]:
import atoti as tt
session = tt.create_session()
[2]:
store = session.read_csv(
"data/arrays.csv", keys=["TradeId"], store_name="Store With Arrays", array_sep=";"
)
store.head()
[2]:
Continent | Country | City | PnL | PnlArray | IntArray | |
---|---|---|---|---|---|---|
TradeId | ||||||
0 | Europe | France | Paris | 3.469516 | doubleVector[10]{-0.46511920664962714, ...} | intVector[10]{1, ...} |
1 | Europe | France | Paris | 92.851919 | doubleVector[10]{10.587927935842618, ...} | intVector[10]{11, ...} |
2 | Europe | France | Lyon | 425.866214 | doubleVector[10]{68.43655816283167, ...} | intVector[10]{19, ...} |
3 | Europe | France | Bordeaux | 454.127060 | doubleVector[10]{-149.61978026139195, ...} | intVector[10]{14, ...} |
4 | Europe | UK | London | 187.015613 | doubleVector[10]{11.449089651224922, ...} | intVector[10]{13, ...} |
[3]:
cube = session.create_cube(store, "Cube")
Arrays default aggregations¶
Atoti provides default aggregation functions on arrays: SUM
and AVG
. They are applied element by element on the array.
[4]:
lvl = cube.levels
m = cube.measures
cube.query(m["PnlArray.SUM"], levels=lvl["Continent"])
[4]:
PnlArray.SUM | |
---|---|
Continent | |
Asia | [-14.696414629727794, -35.38205147348333, -62.... |
Europe | [6.711474315042935, -1.4346671713308226, -72.3... |
[5]:
cube.query(m["PnlArray.AVG"], levels=lvl["Continent"])
[5]:
PnlArray.AVG | |
---|---|
Continent | |
Asia | [-2.939282925945559, -7.076410294696666, -12.5... |
Europe | [1.1185790525071557, -0.23911119522180374, -12... |
Additional array functions¶
Sum, Average, Min or Max of all the array elements¶
[6]:
m["sum vect"] = tt.array.sum(m["PnlArray.SUM"])
cube.query(m["sum vect"], levels=lvl["Continent"])
[6]:
sum vect | |
---|---|
Continent | |
Asia | 357.870873 |
Europe | 49.101546 |
[7]:
m["avg vect"] = tt.array.avg(m["PnlArray.SUM"])
cube.query(m["avg vect"], levels=lvl["Continent"])
[7]:
avg vect | |
---|---|
Continent | |
Asia | 35.787087 |
Europe | 4.910155 |
[8]:
m["min vect"] = tt.array.min(m["PnlArray.SUM"])
cube.query(m["min vect"], levels=lvl["Continent"])
[8]:
min vect | |
---|---|
Continent | |
Asia | -222.821477 |
Europe | -86.079550 |
[9]:
m["max vect"] = tt.array.max(m["PnlArray.SUM"])
cube.query(m["max vect"], levels=lvl["Continent"])
[9]:
max vect | |
---|---|
Continent | |
Asia | 232.607605 |
Europe | 195.824737 |
Length¶
[10]:
m["length"] = tt.array.len(m["PnlArray.SUM"])
cube.query(m["length"])
[10]:
length | |
---|---|
0 | 10 |
Variance and Standard Deviation¶
[11]:
m["variance"] = tt.array.variance(m["PnlArray.SUM"])
m["standard deviation"] = tt.array.std(m["PnlArray.SUM"])
cube.query(m["variance"], m["standard deviation"], levels=lvl["Continent"])
[11]:
variance | standard deviation | |
---|---|---|
Continent | ||
Asia | 16149.335147 | 127.080034 |
Europe | 6952.275020 | 83.380304 |
Sort¶
[12]:
m["sort"] = tt.array.sort(m["PnlArray.SUM"])
cube.query(m["sort"], levels=lvl["Continent"])
[12]:
sort | |
---|---|
Continent | |
Asia | [-222.82147745688468, -62.89214785725346, -35.... |
Europe | [-86.07955036608166, -72.30864853533133, -71.1... |
Percentile¶
[13]:
m["95 percentile"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "simple")
m["95 exc percentile"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "exc")
m["95 inc percentile"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "inc")
m["95 centered percentile"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "centered")
cube.query(
m["95 percentile"],
m["95 exc percentile"],
m["95 inc percentile"],
m["95 centered percentile"],
levels=[lvl["Continent"], lvl["Country"]],
)
[13]:
95 percentile | 95 exc percentile | 95 inc percentile | 95 centered percentile | ||
---|---|---|---|---|---|
Continent | Country | ||||
Asia | China | 195.220094 | 256.014079 | 201.299492 | 256.014079 |
India | 22.352927 | 25.270438 | 22.644678 | 25.270438 | |
Europe | France | 128.173379 | 163.731662 | 131.729208 | 163.731662 |
UK | 112.243790 | 146.715692 | 115.690980 | 146.715692 |
[14]:
m["95 linear"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "linear")
m["95 lower"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "lower")
m["95 higher"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "higher")
m["95 nearest"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "nearest")
m["95 midpoint"] = tt.array.percentile(m["PnlArray.SUM"], 0.95, "inc", "midpoint")
cube.query(
m["95 linear"], m["95 lower"], m["95 higher"], m["95 nearest"], m["95 midpoint"]
)
[14]:
95 linear | 95 lower | 95 higher | 95 nearest | 95 midpoint | |
---|---|---|---|---|---|
0 | 299.288892 | 290.080802 | 306.822784 | 306.822784 | 298.451793 |
n greatest / n lowest¶
Returns an array with the n greatest/lowest values of a another array.
[15]:
m["Top 3"] = tt.array.n_greatest(m["PnlArray.SUM"], 3)
cube.query(m["Top 3"])
[15]:
Top 3 | |
---|---|
0 | [290.0808017776343, 306.8227843309065, 195.349... |
[16]:
m["Bottom 2"] = tt.array.n_lowest(m["PnlArray.SUM"], 2)
cube.query(m["Bottom 2"])
[16]:
Bottom 2 | |
---|---|
0 | [-135.20079639258478, -308.90102782296634] |
nth greatest value / nth lowest value¶
Returns nth greatest or lowest value of a vector
[17]:
m["Third largest value"] = tt.array.nth_greatest(m["PnlArray.SUM"], 3)
cube.query(m["Third largest value"])
[17]:
Third largest value | |
---|---|
0 | 195.349055 |
[18]:
m["Second smallest value"] = tt.array.nth_lowest(m["PnlArray.SUM"], 2)
cube.query(m["Second smallest value"])
[18]:
Second smallest value | |
---|---|
0 | -135.200796 |
Scale¶
[19]:
m["scale x10"] = m["PnlArray.SUM"] * 10.0
cube.query(m["scale x10"])
[19]:
scale x10 | |
---|---|
0 | [-79.8494031468486, -368.16718644814154, -1352... |
Element at index¶
Extract the element at a given index
[20]:
m["first element"] = m["PnlArray.SUM"][0]
cube.query(m["first element"], m["PnlArray.SUM"])
[20]:
first element | PnlArray.SUM | |
---|---|---|
0 | -7.98494 | [-7.984940314684859, -36.81671864481415, -135.... |
With the create_parameter_hierarchy
function it is possible to create a hierarchy corresponding to the indices of the array. This hierarchy can then be used to “slice” this array and create a measure which depends on the selected index.
[21]:
cube.create_parameter_hierarchy("index", list(range(0, 10)))
m["PnL at index"] = m["PnlArray.SUM"][lvl["index"]]
cube.query(m["PnL at index"], levels=lvl["index"])
[21]:
PnL at index | |
---|---|
index | |
0 | -7.984940 |
1 | -36.816719 |
2 | -135.200796 |
3 | 30.371913 |
4 | 195.349055 |
5 | 306.822784 |
6 | 96.072594 |
7 | -308.901028 |
8 | 290.080802 |
9 | -22.821246 |
You can also build non-integer hierarchies and map each member to its index in the hierarchy using the index_measure
argument:
[22]:
from datetime import date, timedelta
cube.create_parameter_hierarchy(
"Vector Dates",
[date(2020, 1, 1) + timedelta(days=x) for x in range(0, 10)],
index_measure="Date Index",
)
m["PnL at date"] = m["PnlArray.SUM"][m["Date Index"]]
cube.query(m["Date Index"], m["PnL at date"], levels=lvl["Vector Dates"])
[22]:
Date Index | PnL at date | |
---|---|---|
Vector Dates | ||
2020-01-01 | 0 | -7.984940 |
2020-01-02 | 1 | -36.816719 |
2020-01-03 | 2 | -135.200796 |
2020-01-04 | 3 | 30.371913 |
2020-01-05 | 4 | 195.349055 |
2020-01-06 | 5 | 306.822784 |
2020-01-07 | 6 | 96.072594 |
2020-01-08 | 7 | -308.901028 |
2020-01-09 | 8 | 290.080802 |
2020-01-10 | 9 | -22.821246 |
In cases the indices also need to be of arbitrary order or range, it is also possible to manually provide them as a list.
[23]:
cube.create_parameter_hierarchy(
"Custom Vector Dates",
[date(2020, 1, 1) + timedelta(days=x) for x in range(0, 10)],
indices=[9, 8, 7, 6, 5, 0, 1, 2, 3, 4],
index_measure="Custom Date Index",
)
m["PnL at custom date"] = m["PnlArray.SUM"][m["Custom Date Index"]]
cube.query(
m["Custom Date Index"], m["PnL at custom date"], levels=lvl["Custom Vector Dates"]
)
[23]:
Custom Date Index | PnL at custom date | |
---|---|---|
Custom Vector Dates | ||
2020-01-01 | 9 | -22.821246 |
2020-01-02 | 8 | 290.080802 |
2020-01-03 | 7 | -308.901028 |
2020-01-04 | 6 | 96.072594 |
2020-01-05 | 5 | 306.822784 |
2020-01-06 | 0 | -7.984940 |
2020-01-07 | 1 | -36.816719 |
2020-01-08 | 2 | -135.200796 |
2020-01-09 | 3 | 30.371913 |
2020-01-10 | 4 | 195.349055 |
Sub-arrays¶
Extract a slice of the array
[24]:
m["first 2 elements"] = m["PnlArray.SUM"][0:2]
cube.query(m["first 2 elements"], m["PnlArray.SUM"])
[24]:
first 2 elements | PnlArray.SUM | |
---|---|---|
0 | [-7.984940314684859, -36.81671864481415] | [-7.984940314684859, -36.81671864481415, -135.... |