atoti.array module¶
atoti is designed to handle array data efficiently.
There are multiple ways to load arrays into atoti tables. For instance:
With
atoti.session.Session.read_pandas()
:>>> import numpy as np >>> df = pd.DataFrame( ... columns=["NumPy array", "Python list"], ... data=[ ... (np.array([1.0, 2.0, 3.0]), [1, 2, 3]), ... (np.array([4.0, 5.0, 6.0]), [4, 5, 6]), ... (np.array([7.0, 8.0, 9.0]), [7, 8, 9]), ... ], ... ) >>> df NumPy array Python list 0 [1.0, 2.0, 3.0] [1, 2, 3] 1 [4.0, 5.0, 6.0] [4, 5, 6] 2 [7.0, 8.0, 9.0] [7, 8, 9] >>> pandas_table = session.read_pandas( ... df, table_name="DataFrame with arrays" ... ) >>> pandas_table.head() NumPy array Python list 0 [1.0, 2.0, 3.0] [1, 2, 3] 1 [4.0, 5.0, 6.0] [4, 5, 6] 2 [7.0, 8.0, 9.0] [7, 8, 9]With
atoti.session.Session.read_csv()
:>>> pnl_table = session.read_csv( ... f"{RESOURCES}/pnl.csv", ... array_separator=";", ... keys=["Continent", "Country"], ... table_name="PnL", ... ) >>> pnl_table.head() PnL Continent Country Europe France [-0.465, -0.025, 0.601, 0.423, -0.815, 0.024, ... UK [11.449, -35.5464, -66.641, -48.498, -6.3126, ... America Mexico [-10.716, 9.593, 1.356, -21.185, 5.989, 9.686,... Asia China [-1.715, 2.425, -4.059, 4.102, -2.331, -2.662,... India [-18.716, 8.583, -41.356, -11.138, 3.949, 5.66...
As for scalar measures, atoti provides the default SUM
and MEAN
aggregations on array measures.
They are applied element by element:
>>> cube = session.create_cube(pnl_table)
>>> l, m = cube.levels, cube.measures
>>> cube.query(m["PnL.SUM"], m["PnL.MEAN"], levels=[l["Continent"]])
PnL.SUM PnL.MEAN
Continent
America doubleVector[10]{-10.716, ...} doubleVector[10]{-10.716, ...}
Asia doubleVector[10]{-20.431, ...} doubleVector[10]{-10.2155, ...}
Europe doubleVector[10]{10.984, ...} doubleVector[10]{5.492, ...}
Besides the functions below, arrays support the following operations:
Arithmetic operators:
>>> m["PnL +10"] = m["PnL.SUM"] + 10.0 >>> cube.query(m["PnL +10"]) PnL +10 0 doubleVector[10]{-10.163, ...} >>> m["PnL -10"] = m["PnL.SUM"] - 10.0 >>> cube.query(m["PnL -10"]) PnL -10 0 doubleVector[10]{-30.163, ...} >>> m["PnL x10"] = m["PnL.SUM"] * 10.0 >>> cube.query(m["PnL x10"]) PnL x10 0 doubleVector[10]{-201.63, ...} >>> m["PnL /10"] = m["PnL.SUM"] / 10.0 >>> cube.query(m["PnL /10"]) PnL /10 0 doubleVector[10]{-2.0163, ...}Indexing:
>>> m["First element"] = m["PnL.SUM"][0] >>> cube.query(m["First element"], m["PnL.SUM"]) First element PnL.SUM 0 -20.16 doubleVector[10]{-20.163, ...}This can be used with
atoti.cube.Cube.create_parameter_hierarchy_from_members()
to “slice” the array:>>> cube.create_parameter_hierarchy_from_members( ... "Index", list(range(0, 10)) ... ) >>> m["PnL at index"] = m["PnL.SUM"][l["Index"]] >>> cube.query(m["PnL at index"], levels=[l["Index"]]) PnL at index Index 0 -20.16 1 -14.97 2 -110.10 3 -76.30 4 .48 5 -57.51 6 -.53 7 -15.49 8 -22.97 9 9.26Non-integer hierarchies can also be created:
>>> from datetime import date, timedelta >>> dates = [ ... date(2020, 1, 1) + timedelta(days=offset) for offset in range(0, 10) ... ] >>> cube.create_parameter_hierarchy_from_members( ... "Dates", dates, index_measure_name="Date index" ... ) >>> m["PnL at date"] = m["PnL.SUM"][m["Date index"]] >>> cube.query(m["Date index"], m["PnL at date"], levels=[l["Dates"]]) Date index PnL at date Dates 2020-01-01 0 -20.16 2020-01-02 1 -14.97 2020-01-03 2 -110.10 2020-01-04 3 -76.30 2020-01-05 4 .48 2020-01-06 5 -57.51 2020-01-07 6 -.53 2020-01-08 7 -15.49 2020-01-09 8 -22.97 2020-01-10 9 9.26Slicing:
>>> m["First 2 elements"] = m["PnL.SUM"][0:2] >>> cube.query(m["First 2 elements"], m["PnL.SUM"]) First 2 elements PnL.SUM 0 doubleVector[2]{-20.163, ...} doubleVector[10]{-20.163, ...}
- atoti.array.len(measure)¶
Return a measure equal to the number of elements of the passed array measure.
Example
>>> m["Length"] = tt.array.len(m["PnL.SUM"]) >>> cube.query(m["Length"]) Length 0 10
- Return type
- atoti.array.max(measure)¶
Return a measure equal to the maximum element of the passed array measure.
Example
>>> m["Max"] = tt.array.max(m["PnL.SUM"]) >>> m["Empty max"] = tt.array.max(m["PnL.SUM"][0:0]) >>> cube.query(m["Max"], m["Empty max"]) Max Empty max 0 9.26
- Return type
- atoti.array.mean(value: ArrayType) JavaFunctionOperation ¶
- atoti.array.mean(value: MeasureDescription) MeasureDescription
Return a measure equal to the mean of all the elements of the passed array measure.
Example
>>> m["Mean"] = tt.array.mean(m["PnL.SUM"]) >>> m["Empty mean"] = tt.array.mean(m["PnL.SUM"][0:0]) >>> cube.query(m["Mean"], m["Empty mean"]) Mean Empty mean 0 -30.83 .00
- Return type
Union
[MeasureDescription
,JavaFunctionOperation
]
- atoti.array.min(measure)¶
Return a measure equal to the minimum element of the passed array measure.
Example
>>> m["Min"] = tt.array.min(m["PnL.SUM"]) >>> m["Empty min"] = tt.array.min(m["PnL.SUM"][0:0]) >>> cube.query(m["Min"], m["Empty min"]) Min Empty min 0 -110.10
- Return type
- atoti.array.n_greatest(measure, n)¶
Return an array measure containing the n greatest elements of the passed array measure.
Example
>>> m["Top 3"] = tt.array.n_greatest(m["PnL.SUM"], n=3) >>> cube.query(m["Top 3"]) Top 3 0 doubleVector[3]{9.259999999999998, ...}
- Return type
- atoti.array.n_greatest_indices(measure, n)¶
Return an array measure containing the indices of the n greatest elements of the passed array measure.
Example
>>> m["Top 3 indices"] = tt.array.n_greatest_indices(m["PnL.SUM"], n=3) >>> cube.query(m["Top 3 indices"]) Top 3 indices 0 intVector[3]{6, ...}
- Return type
- atoti.array.n_lowest(measure, n)¶
Return an array measure containing the n lowest elements of the passed array measure.
Example
>>> m["Bottom 3"] = tt.array.n_lowest(m["PnL.SUM"], n=3) >>> cube.query(m["Bottom 3"]) Bottom 3 0 doubleVector[3]{-57.51499999999999, ...}
- Return type
- atoti.array.n_lowest_indices(measure, n)¶
Return an array measure containing the indices of the n lowest elements of the passed array measure.
Example
>>> m["Bottom 3 indices"] = tt.array.n_lowest_indices(m["PnL.SUM"], n=3) >>> cube.query(m["Bottom 3 indices"]) Bottom 3 indices 0 intVector[3]{2, ...}
- Return type
- atoti.array.negative_values(measure)¶
Return a measure where all the elements > 0 of the passed array measure are replaced by 0.
- Return type
- atoti.array.nth_greatest(measure, n)¶
Return a measure equal to the n-th greatest element of the passed array measure.
Example
>>> m["3rd greatest"] = tt.array.nth_greatest(m["PnL.SUM"], n=3) >>> cube.query(m["3rd greatest"]) 3rd greatest 0 -.53
- Return type
- atoti.array.nth_lowest(measure, n)¶
Return a measure equal to the n-th lowest element of the passed array measure.
Example
>>> m["3rd lowest"] = tt.array.nth_lowest(m["PnL.SUM"], n=3) >>> cube.query(m["3rd lowest"]) 3rd lowest 0 -57.51
- Return type
- atoti.array.positive_values(measure)¶
Return a measure where all the elements < 0 of the passed array measure are replaced by 0.
- Return type
- atoti.array.prefix_sum(measure)¶
Return a measure equal to the sum of the previous elements in the passed array measure.
Example
If an array has the following values:
[2.0, 1.0, 0.0, 3.0]
, the returned array will be:[2.0, 3.0, 3.0, 6.0]
.- Return type
- atoti.array.prod(measure)¶
Return a measure equal to the product of all the elements of the passed array measure.
Example
>>> m["Product"] = tt.array.prod(m["PnL.SUM"]) >>> m["Empty product"] = tt.array.prod(m["PnL.SUM"][0:0]) >>> cube.query(m["Product"], m["Empty product"]) Product Empty product 0 122,513,372,194.94 1.00
- Return type
- atoti.array.quantile(measure, q, *, mode='inc', interpolation='linear')¶
Return a measure equal to the requested quantile of the elements of the passed array measure.
Here is how to obtain the same behavior as these standard quantile calculation methods:
R-1:
mode="centered"
andinterpolation="lower"
R-2:
mode="centered"
andinterpolation="midpoint"
R-3:
mode="simple"
andinterpolation="nearest"
R-4:
mode="simple"
andinterpolation="linear"
R-5:
mode="centered"
andinterpolation="linear"
R-6 (similar to Excel’s
PERCENTILE.EXC
):mode="exc"
andinterpolation="linear"
R-7 (similar to Excel’s
PERCENTILE.INC
):mode="inc"
andinterpolation="linear"
R-8 and R-9 are not supported
The formulae given for the calculation of the quantile index assume a 1-based indexing system.
- Parameters
measure (
MeasureDescription
) – The measure to get the quantile of.q (
Union
[float
,MeasureDescription
]) – The quantile to take. Must be between0
and1
. For instance,0.95
is the 95th percentile and0.5
is the median.mode (
Literal
[‘simple’, ‘centered’, ‘inc’, ‘exc’]) –The method used to calculate the index of the quantile. Available options are, when searching for the q quantile of a vector
X
:simple
:len(X) * q
centered
:len(X) * q + 0.5
exc
:(len(X) + 1) * q
inc
:(len(X) - 1) * q + 1
interpolation (
Literal
[‘linear’, ‘higher’, ‘lower’, ‘nearest’, ‘midpoint’]) –If the quantile index is not an integer, the interpolation decides what value is returned. The different options are, considering a quantile index
k
withi < k < j
for a sorted vectorX
:linear
:v = X[i] + (X[j] - X[i]) * (k - i)
lowest
:v = X[i]
highest
:v = X[j]
nearest
:v = X[i]
orv = X[j]
depending on which ofi
orj
is closest tok
midpoint
:v = (X[i] + X[j]) / 2
- Return type
- atoti.array.quantile_index(measure, q, *, mode='inc', interpolation='lower')¶
Return a measure equal to the index of requested quantile of the elements of the passed array measure.
- Parameters
measure (
MeasureDescription
) – The measure to get the quantile of.q (
Union
[float
,MeasureDescription
]) – The quantile to take. Must be between0
and1
. For instance,0.95
is the 95th percentile and0.5
is the median.mode (
Literal
[‘simple’, ‘centered’, ‘inc’, ‘exc’]) –The method used to calculate the index of the quantile. Available options are, when searching for the q quantile of a vector
X
:simple
:len(X) * q
centered
:len(X) * q + 0.5
exc
:(len(X) + 1) * q
inc
:(len(X) - 1) * q + 1
interpolation (
Literal
[‘higher’, ‘lower’, ‘nearest’]) –If the quantile index is not an integer, the interpolation decides what value is returned. The different options are, considering a quantile index
k
withi < k < j
for the original vectorX
and the sorted vectorY
:lowest
: the index inX
ofY[i]
highest
: the index inX
ofY[j]
nearest
: the index inX
ofY[i]
orY[j]
depending on which ofi
orj
is closest tok
- Return type
- atoti.array.replace(measure, replacements)¶
Return a measure where elements equal to a key of the replacements mapping are replaced with the corresponding value.
- Parameters
Example
>>> df = pd.DataFrame( ... columns=["Store ID", "New quantity", "Old quantity"], ... data=[ ... ( ... "Store 1", ... [12, 6, 2, 20], ... [6, 3, 0, 10], ... ), ... ("Store 2", [16, 8, 12, 15], [4, 4, 6, 3]), ... ("Store 3", [8, -10, 0, 33], [8, 0, 2, 11]), ... ], ... ) >>> table = session.read_pandas(df, table_name="Prices", keys=["Store ID"]) >>> cube = session.create_cube(table) >>> l, m = cube.levels, cube.measures >>> m["Old quantity"] = tt.value(table["Old quantity"]) >>> m["New quantity"] = tt.value(table["New quantity"]) >>> m["Quantity ratio"] = m["New quantity"] / m["Old quantity"] >>> m["Quantity ratio"].formatter = "ARRAY[',']" >>> cube.query(m["Quantity ratio"], levels=[l["Store ID"]]) Quantity ratio Store ID Store 1 2.0,2.0,Infinity,2.0 Store 2 4.0,2.0,2.0,5.0 Store 3 1.0,-Infinity,0.0,3.0
The function can be used to replace infinity with another value more suited to follow up computations:
>>> import math >>> m["Quantity ratio without infinity"] = tt.array.replace( ... m["Quantity ratio"], {math.inf: 1, -math.inf: -1} ... ) >>> m["Quantity ratio without infinity"].formatter = "ARRAY[',']" >>> cube.query(m["Quantity ratio without infinity"], levels=[l["Store ID"]]) Quantity ratio without infinity Store ID Store 1 2.0,2.0,1.0,2.0 Store 2 4.0,2.0,2.0,5.0 Store 3 1.0,-1.0,0.0,3.0
- Return type
- atoti.array.sort(measure, *, ascending=True)¶
Return an array measure with the elements of the passed array measure sorted.
Example
>>> m["Sorted ascendingly"] = tt.array.sort(m["PnL.SUM"]) >>> m["Sorted descendingly"] = tt.array.sort(m["PnL.SUM"], ascending=False) >>> cube.query(m["Sorted ascendingly"], m["Sorted descendingly"]) Sorted ascendingly Sorted descendingly 0 doubleVector[10]{-110.09900000000002, ...} doubleVector[10]{9.259999999999998, ...}
- Return type
- atoti.array.std(measure, *, mode='sample')¶
Return a measure equal to the standard deviation of the elements of the passed array measure.
- Parameters
measure (
MeasureDescription
) – The measure to get the standard deviation of.mode (
Literal
[‘sample’, ‘population’]) –One of the supported modes:
The
sample
standard deviation, similar to Excel’sSTDEV.S
, is \(\sqrt{\frac{\sum_{i=0}^{n} (X_i - m)^{2}}{n - 1}}\) wherem
is the sample mean andn
the size of the sample. Use this mode if the data represents a sample of the population.The
population
standard deviation, similar to Excel’sSTDEV.P
is \(\sqrt{\frac{\sum_{i=0}^{n}(X_i - m)^{2}}{n}}\) wherem
is the mean of theXi
elements andn
the size of the population. Use this mode if the data represents the entire population.
- Return type
- atoti.array.sum(value: ArrayType) JavaFunctionOperation ¶
- atoti.array.sum(value: MeasureDescription) MeasureDescription
Return a measure equal to the sum of all the elements of the passed array measure.
Example
>>> m["Sum"] = tt.array.sum(m["PnL.SUM"]) >>> m["Empty sum"] = tt.array.sum(m["PnL.SUM"][0:0]) >>> cube.query(m["Sum"], m["Empty sum"]) Sum Empty sum 0 -308.29 .00
- Return type
Union
[JavaFunctionOperation
,MeasureDescription
]
- atoti.array.var(measure, *, mode='sample')¶
Return a measure equal to the variance of the elements of the passed array measure.
- Parameters
measure (
MeasureDescription
) – The measure to get the variance of.mode (
Literal
[‘sample’, ‘population’]) –One of the supported modes:
The
sample
variance, similar to Excel’sVAR.S
, is \(\frac{\sum_{i=0}^{n} (X_i - m)^{2}}{n - 1}\) wherem
is the sample mean andn
the size of the sample. Use this mode if the data represents a sample of the population.The
population
variance, similar to Excel’sVAR.P
is \(\frac{\sum_{i=0}^{n}(X_i - m)^{2}}{n}\) wherem
is the mean of theXi
elements andn
the size of the population. Use this mode if the data represents the entire population.
- Return type