Array measures¶
atoti is optimized to handle array data
Loading arrays from CSV¶
atoti can load array from CSV files. The separator for array elements must be provided to the read_csv
method, and the CSV columns must use another separator. All the arrays should have the same length.
[1]:
import atoti as tt
session = tt.create_session()
Welcome to atoti 0.4.0!
By using this community edition, you agree with the license available at https://www.atoti.io/eula.
Browse the official documentation at https://docs.atoti.io.
Join the community at https://www.atoti.io/register.
You can hide this message by setting the ATOTI_HIDE_EULA_MESSAGE environment variable to True.
[2]:
store = session.read_csv(
"data/arrays.csv", keys=["TradeId"], store_name="Store With Arrays", array_sep=";"
)
store.head()
[2]:
Continent | Country | City | PnL | PnlArray | IntArray | |
---|---|---|---|---|---|---|
TradeId | ||||||
0 | Europe | France | Paris | 3.469516 | doubleVector[10]{-0.46511920664962714, ...} | intVector[10]{1, ...} |
1 | Europe | France | Paris | 92.851919 | doubleVector[10]{10.587927935842618, ...} | intVector[10]{11, ...} |
2 | Europe | France | Lyon | 425.866214 | doubleVector[10]{68.43655816283167, ...} | intVector[10]{19, ...} |
3 | Europe | France | Bordeaux | 454.127060 | doubleVector[10]{-149.61978026139195, ...} | intVector[10]{14, ...} |
4 | Europe | UK | London | 187.015613 | doubleVector[10]{11.449089651224922, ...} | intVector[10]{13, ...} |
[3]:
cube = session.create_cube(store, "Cube")
Arrays default aggregations¶
atoti provides default aggregation functions on arrays: SUM
and MEAN
. They are applied element by element on the array.
[4]:
lvl = cube.levels
m = cube.measures
cube.query(m["PnlArray.SUM"], levels=lvl["Continent"])
[4]:
PnlArray.SUM | |
---|---|
Continent | |
Asia | [-14.696414629727794, -35.38205147348333, -62.... |
Europe | [6.711474315042935, -1.4346671713308226, -72.3... |
[5]:
cube.query(m["PnlArray.MEAN"], levels=lvl["Continent"])
[5]:
PnlArray.MEAN | |
---|---|
Continent | |
Asia | [-2.939282925945559, -7.076410294696666, -12.5... |
Europe | [1.1185790525071557, -0.23911119522180374, -12... |
Additional array functions¶
Sum, Average, Min or Max of all the array elements¶
[6]:
m["sum vect"] = tt.array.sum(m["PnlArray.SUM"])
cube.query(m["sum vect"], levels=lvl["Continent"])
[6]:
sum vect | |
---|---|
Continent | |
Asia | 357.870873 |
Europe | 49.101546 |
[7]:
m["mean vect"] = tt.array.mean(m["PnlArray.SUM"])
cube.query(m["mean vect"], levels=lvl["Continent"])
[7]:
mean vect | |
---|---|
Continent | |
Asia | 35.787087 |
Europe | 4.910155 |
[8]:
m["min vect"] = tt.array.min(m["PnlArray.SUM"])
cube.query(m["min vect"], levels=lvl["Continent"])
[8]:
min vect | |
---|---|
Continent | |
Asia | -222.821477 |
Europe | -86.079550 |
[9]:
m["max vect"] = tt.array.max(m["PnlArray.SUM"])
cube.query(m["max vect"], levels=lvl["Continent"])
[9]:
max vect | |
---|---|
Continent | |
Asia | 232.607605 |
Europe | 195.824737 |
Length¶
[10]:
m["length"] = tt.array.len(m["PnlArray.SUM"])
cube.query(m["length"])
[10]:
length | |
---|---|
0 | 10 |
Variance and Standard Deviation¶
[11]:
m["variance"] = tt.array.var(m["PnlArray.SUM"])
m["standard deviation"] = tt.array.std(m["PnlArray.SUM"])
cube.query(m["variance"], m["standard deviation"], levels=lvl["Continent"])
[11]:
variance | standard deviation | |
---|---|---|
Continent | ||
Asia | 16149.335147 | 127.080034 |
Europe | 6952.275020 | 83.380304 |
Sort¶
[12]:
m["sort"] = tt.array.sort(m["PnlArray.SUM"])
cube.query(m["sort"], levels=lvl["Continent"])
[12]:
sort | |
---|---|
Continent | |
Asia | [-222.82147745688468, -62.89214785725346, -35.... |
Europe | [-86.07955036608166, -72.30864853533133, -71.1... |
Quantile¶
[13]:
m["95 quantile"] = tt.array.quantile(m["PnlArray.SUM"], 0.95, mode="simple")
m["95 exc quantile"] = tt.array.quantile(m["PnlArray.SUM"], 0.95, mode="exc")
m["95 inc quantile"] = tt.array.quantile(m["PnlArray.SUM"], 0.95, mode="inc")
m["95 centered quantile"] = tt.array.quantile(m["PnlArray.SUM"], 0.95, mode="centered")
cube.query(
m["95 quantile"],
m["95 exc quantile"],
m["95 inc quantile"],
m["95 centered quantile"],
levels=[lvl["Continent"], lvl["Country"]],
)
[13]:
95 quantile | 95 exc quantile | 95 inc quantile | 95 centered quantile | ||
---|---|---|---|---|---|
Continent | Country | ||||
Asia | China | 195.220094 | 256.014079 | 201.299492 | 256.014079 |
India | 22.352927 | 25.270438 | 22.644678 | 25.270438 | |
Europe | France | 128.173379 | 163.731662 | 131.729208 | 163.731662 |
UK | 112.243790 | 146.715692 | 115.690980 | 146.715692 |
[14]:
m["95 linear"] = tt.array.quantile(
m["PnlArray.SUM"], 0.95, mode="inc", interpolation="linear"
)
m["95 lower"] = tt.array.quantile(
m["PnlArray.SUM"], 0.95, mode="inc", interpolation="lower"
)
m["95 higher"] = tt.array.quantile(
m["PnlArray.SUM"], 0.95, mode="inc", interpolation="higher"
)
m["95 nearest"] = tt.array.quantile(
m["PnlArray.SUM"], 0.95, mode="inc", interpolation="nearest"
)
m["95 midpoint"] = tt.array.quantile(
m["PnlArray.SUM"], 0.95, mode="inc", interpolation="midpoint"
)
cube.query(
m["95 linear"], m["95 lower"], m["95 higher"], m["95 nearest"], m["95 midpoint"]
)
[14]:
95 linear | 95 lower | 95 higher | 95 nearest | 95 midpoint | |
---|---|---|---|---|---|
0 | 299.288892 | 290.080802 | 306.822784 | 306.822784 | 298.451793 |
n greatest / n lowest¶
Returns an array with the n greatest/lowest values of a another array.
[15]:
m["Top 3"] = tt.array.n_greatest(m["PnlArray.SUM"], 3)
cube.query(m["Top 3"])
[15]:
Top 3 | |
---|---|
0 | [290.0808017776343, 306.8227843309065, 195.349... |
[16]:
m["Bottom 2"] = tt.array.n_lowest(m["PnlArray.SUM"], 2)
cube.query(m["Bottom 2"])
[16]:
Bottom 2 | |
---|---|
0 | [-135.20079639258478, -308.90102782296634] |
nth greatest value / nth lowest value¶
Returns nth greatest or lowest value of a vector
[17]:
m["Third largest value"] = tt.array.nth_greatest(m["PnlArray.SUM"], 3)
cube.query(m["Third largest value"])
[17]:
Third largest value | |
---|---|
0 | 195.349055 |
[18]:
m["Second smallest value"] = tt.array.nth_lowest(m["PnlArray.SUM"], 2)
cube.query(m["Second smallest value"])
[18]:
Second smallest value | |
---|---|
0 | -135.200796 |
Scale¶
[19]:
m["scale x10"] = m["PnlArray.SUM"] * 10.0
cube.query(m["scale x10"])
[19]:
scale x10 | |
---|---|
0 | [-79.8494031468486, -368.16718644814154, -1352... |
Element at index¶
Extract the element at a given index
[20]:
m["first element"] = m["PnlArray.SUM"][0]
cube.query(m["first element"], m["PnlArray.SUM"])
[20]:
first element | PnlArray.SUM | |
---|---|---|
0 | -7.98494 | [-7.984940314684859, -36.81671864481415, -135.... |
With the create_parameter_hierarchy
function it is possible to create a hierarchy corresponding to the indices of the array. This hierarchy can then be used to “slice” this array and create a measure which depends on the selected index.
[21]:
cube.create_parameter_hierarchy("index", list(range(0, 10)))
m["PnL at index"] = m["PnlArray.SUM"][lvl["index"]]
cube.query(m["PnL at index"], levels=lvl["index"])
[21]:
PnL at index | |
---|---|
index | |
0 | -7.984940 |
1 | -36.816719 |
2 | -135.200796 |
3 | 30.371913 |
4 | 195.349055 |
5 | 306.822784 |
6 | 96.072594 |
7 | -308.901028 |
8 | 290.080802 |
9 | -22.821246 |
You can also build non-integer hierarchies and map each member to its index in the hierarchy using the index_measure
argument:
[22]:
from datetime import date, timedelta
cube.create_parameter_hierarchy(
"Vector Dates",
[date(2020, 1, 1) + timedelta(days=x) for x in range(0, 10)],
index_measure="Date Index",
)
m["PnL at date"] = m["PnlArray.SUM"][m["Date Index"]]
cube.query(m["Date Index"], m["PnL at date"], levels=lvl["Vector Dates"])
[22]:
Date Index | PnL at date | |
---|---|---|
Vector Dates | ||
2020-01-01 | 0 | -7.984940 |
2020-01-02 | 1 | -36.816719 |
2020-01-03 | 2 | -135.200796 |
2020-01-04 | 3 | 30.371913 |
2020-01-05 | 4 | 195.349055 |
2020-01-06 | 5 | 306.822784 |
2020-01-07 | 6 | 96.072594 |
2020-01-08 | 7 | -308.901028 |
2020-01-09 | 8 | 290.080802 |
2020-01-10 | 9 | -22.821246 |
In cases the indices also need to be of arbitrary order or range, it is also possible to manually provide them as a list.
[23]:
cube.create_parameter_hierarchy(
"Custom Vector Dates",
[date(2020, 1, 1) + timedelta(days=x) for x in range(0, 10)],
indices=[9, 8, 7, 6, 5, 0, 1, 2, 3, 4],
index_measure="Custom Date Index",
)
m["PnL at custom date"] = m["PnlArray.SUM"][m["Custom Date Index"]]
cube.query(
m["Custom Date Index"], m["PnL at custom date"], levels=lvl["Custom Vector Dates"]
)
[23]:
Custom Date Index | PnL at custom date | |
---|---|---|
Custom Vector Dates | ||
2020-01-01 | 9 | -22.821246 |
2020-01-02 | 8 | 290.080802 |
2020-01-03 | 7 | -308.901028 |
2020-01-04 | 6 | 96.072594 |
2020-01-05 | 5 | 306.822784 |
2020-01-06 | 0 | -7.984940 |
2020-01-07 | 1 | -36.816719 |
2020-01-08 | 2 | -135.200796 |
2020-01-09 | 3 | 30.371913 |
2020-01-10 | 4 | 195.349055 |
Sub-arrays¶
Extract a slice of the array
[24]:
m["first 2 elements"] = m["PnlArray.SUM"][0:2]
cube.query(m["first 2 elements"], m["PnlArray.SUM"])
[24]:
first 2 elements | PnlArray.SUM | |
---|---|---|
0 | [-7.984940314684859, -36.81671864481415] | [-7.984940314684859, -36.81671864481415, -135.... |
Load DataFrame with lists¶
atoti can load DataFrame containing numpy arrays and python list
[25]:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"index": [0, 1, 2],
"python list": [
[3.2, 1.0, 8, 9, 4.5, 7, 6, 18],
[4.2, 4.0, 4, 9, 4.5, 8, 7, 8],
[12, 1.0, 8, 9, 4.5, 7, 6, 18],
],
"numpy array": [
np.array([3.2, 1.0, 8, 9, 4.5, 7, 6, 18]),
np.array([4.2, 4.0, 4, 9, 4.5, 8, 7, 8]),
np.array([12, 1.0, 8, 9, 4.5, 7, 6, 18]),
],
}
)
[26]:
pd_store = session.read_pandas(df, "Pandas")
pd_store
[26]:
- Pandas
- index
- key: False
- nullable: False
- type: long
- python list
- key: False
- nullable: True
- type: atoti_list_double[][,]
- numpy array
- key: False
- nullable: True
- type: atoti_numpy_double[][ ]
- index
[27]:
pd_store.head()
[27]:
index | python list | numpy array | |
---|---|---|---|
0 | 0 | doubleVector[8]{3.2, ...} | doubleVector[8]{3.2, ...} |
1 | 1 | doubleVector[8]{4.2, ...} | doubleVector[8]{4.2, ...} |
2 | 2 | doubleVector[8]{12.0, ...} | doubleVector[8]{12.0, ...} |