atoti.array module

atoti is designed to handle array data efficiently.

There are multiple ways to load arrays into atoti tables. For instance:

  • With atoti.session.Session.read_pandas():

    >>> import numpy as np
    >>> df = pd.DataFrame(
    ...     columns=["NumPy array", "Python list"],
    ...     data=[
    ...         (np.array([1.0, 2.0, 3.0]), [1, 2, 3]),
    ...         (np.array([4.0, 5.0, 6.0]), [4, 5, 6]),
    ...         (np.array([7.0, 8.0, 9.0]), [7, 8, 9]),
    ...     ],
    ... )
    >>> df
           NumPy array Python list
    0  [1.0, 2.0, 3.0]   [1, 2, 3]
    1  [4.0, 5.0, 6.0]   [4, 5, 6]
    2  [7.0, 8.0, 9.0]   [7, 8, 9]
    
    >>> pandas_table = session.read_pandas(
    ...     df, table_name="DataFrame with arrays"
    ... )
    >>> pandas_table.head()
           NumPy array Python list
    0  [1.0, 2.0, 3.0]   [1, 2, 3]
    1  [4.0, 5.0, 6.0]   [4, 5, 6]
    2  [7.0, 8.0, 9.0]   [7, 8, 9]
    
  • With atoti.session.Session.read_csv():

    >>> pnl_table = session.read_csv(
    ...     f"{RESOURCES}/pnl.csv",
    ...     array_separator=";",
    ...     keys=["Continent", "Country"],
    ...     table_name="PnL",
    ... )
    >>> pnl_table.head()
                                                                     PnL
    Continent Country
    Europe    France   [-0.465, -0.025, 0.601, 0.423, -0.815, 0.024, ...
              UK       [11.449, -35.5464, -66.641, -48.498, -6.3126, ...
    America   Mexico   [-10.716, 9.593, 1.356, -21.185, 5.989, 9.686,...
    Asia      China    [-1.715, 2.425, -4.059, 4.102, -2.331, -2.662,...
              India    [-18.716, 8.583, -41.356, -11.138, 3.949, 5.66...
    

As for scalar measures, atoti provides the default SUM and MEAN aggregations on array measures. They are applied element by element:

>>> cube = session.create_cube(pnl_table)
>>> l, m = cube.levels, cube.measures
>>> cube.query(m["PnL.SUM"], m["PnL.MEAN"], levels=[l["Continent"]])
                                  PnL.SUM                         PnL.MEAN
Continent
America    doubleVector[10]{-10.716, ...}   doubleVector[10]{-10.716, ...}
Asia       doubleVector[10]{-20.431, ...}  doubleVector[10]{-10.2155, ...}
Europe      doubleVector[10]{10.984, ...}     doubleVector[10]{5.492, ...}

Besides the functions below, arrays support the following operations:

  • Arithmetic operators:

    >>> m["PnL +10"] = m["PnL.SUM"] + 10.0
    >>> cube.query(m["PnL +10"])
                              PnL +10
    0  doubleVector[10]{-10.163, ...}
    
    >>> m["PnL -10"] = m["PnL.SUM"] - 10.0
    >>> cube.query(m["PnL -10"])
                              PnL -10
    0  doubleVector[10]{-30.163, ...}
    
    >>> m["PnL x10"] = m["PnL.SUM"] * 10.0
    >>> cube.query(m["PnL x10"])
                              PnL x10
    0  doubleVector[10]{-201.63, ...}
    
    >>> m["PnL /10"] = m["PnL.SUM"] / 10.0
    >>> cube.query(m["PnL /10"])
                              PnL /10
    0  doubleVector[10]{-2.0163, ...}
    
  • Indexing:

    >>> m["First element"] = m["PnL.SUM"][0]
    >>> cube.query(m["First element"], m["PnL.SUM"])
      First element                         PnL.SUM
    0        -20.16  doubleVector[10]{-20.163, ...}
    

    This can be used with atoti.cube.Cube.create_parameter_hierarchy_from_members() to “slice” the array:

    >>> cube.create_parameter_hierarchy_from_members(
    ...     "Index", list(range(0, 10))
    ... )
    >>> m["PnL at index"] = m["PnL.SUM"][l["Index"]]
    >>> cube.query(m["PnL at index"], levels=[l["Index"]])
          PnL at index
    Index
    0           -20.16
    1           -14.97
    2          -110.10
    3           -76.30
    4              .48
    5           -57.51
    6             -.53
    7           -15.49
    8           -22.97
    9             9.26
    

    Non-integer hierarchies can also be created:

    >>> from datetime import date, timedelta
    >>> dates = [
    ...     date(2020, 1, 1) + timedelta(days=offset) for offset in range(0, 10)
    ... ]
    
    >>> cube.create_parameter_hierarchy_from_members(
    ...     "Dates", dates, index_measure_name="Date index"
    ... )
    >>> m["PnL at date"] = m["PnL.SUM"][m["Date index"]]
    >>> cube.query(m["Date index"], m["PnL at date"], levels=[l["Dates"]])
               Date index PnL at date
    Dates
    2020-01-01          0      -20.16
    2020-01-02          1      -14.97
    2020-01-03          2     -110.10
    2020-01-04          3      -76.30
    2020-01-05          4         .48
    2020-01-06          5      -57.51
    2020-01-07          6        -.53
    2020-01-08          7      -15.49
    2020-01-09          8      -22.97
    2020-01-10          9        9.26
    
  • Slicing:

    >>> m["First 2 elements"] = m["PnL.SUM"][0:2]
    >>> cube.query(m["First 2 elements"], m["PnL.SUM"])
                    First 2 elements                         PnL.SUM
    0  doubleVector[2]{-20.163, ...}  doubleVector[10]{-20.163, ...}
    
atoti.array.len(measure)

Return a measure equal to the number of elements of the passed array measure.

Example

>>> m["Length"] = tt.array.len(m["PnL.SUM"])
>>> cube.query(m["Length"])
  Length
0     10
Return type

MeasureDescription

atoti.array.max(measure)

Return a measure equal to the maximum element of the passed array measure.

Example

>>> m["Max"] = tt.array.max(m["PnL.SUM"])
>>> m["Empty max"] = tt.array.max(m["PnL.SUM"][0:0])
>>> cube.query(m["Max"], m["Empty max"])
    Max Empty max
0  9.26
Return type

MeasureDescription

atoti.array.mean(value: ArrayType) JavaFunctionOperation
atoti.array.mean(value: MeasureDescription) MeasureDescription

Return a measure equal to the mean of all the elements of the passed array measure.

Example

>>> m["Mean"] = tt.array.mean(m["PnL.SUM"])
>>> m["Empty mean"] = tt.array.mean(m["PnL.SUM"][0:0])
>>> cube.query(m["Mean"], m["Empty mean"])
     Mean Empty mean
0  -30.83        .00
Return type

Union[MeasureDescription, JavaFunctionOperation]

atoti.array.min(measure)

Return a measure equal to the minimum element of the passed array measure.

Example

>>> m["Min"] = tt.array.min(m["PnL.SUM"])
>>> m["Empty min"] = tt.array.min(m["PnL.SUM"][0:0])
>>> cube.query(m["Min"], m["Empty min"])
       Min Empty min
0  -110.10
Return type

MeasureDescription

atoti.array.n_greatest(measure, n)

Return an array measure containing the n greatest elements of the passed array measure.

Example

>>> m["Top 3"] = tt.array.n_greatest(m["PnL.SUM"], n=3)
>>> cube.query(m["Top 3"])
                                     Top 3
0  doubleVector[3]{9.259999999999998, ...}
Return type

MeasureDescription

atoti.array.n_greatest_indices(measure, n)

Return an array measure containing the indices of the n greatest elements of the passed array measure.

Example

>>> m["Top 3 indices"] = tt.array.n_greatest_indices(m["PnL.SUM"], n=3)
>>> cube.query(m["Top 3 indices"])
          Top 3 indices
0  intVector[3]{6, ...}
Return type

MeasureDescription

atoti.array.n_lowest(measure, n)

Return an array measure containing the n lowest elements of the passed array measure.

Example

>>> m["Bottom 3"] = tt.array.n_lowest(m["PnL.SUM"], n=3)
>>> cube.query(m["Bottom 3"])
                                   Bottom 3
0  doubleVector[3]{-57.51499999999999, ...}
Return type

MeasureDescription

atoti.array.n_lowest_indices(measure, n)

Return an array measure containing the indices of the n lowest elements of the passed array measure.

Example

>>> m["Bottom 3 indices"] = tt.array.n_lowest_indices(m["PnL.SUM"], n=3)
>>> cube.query(m["Bottom 3 indices"])
       Bottom 3 indices
0  intVector[3]{2, ...}
Return type

MeasureDescription

atoti.array.negative_values(measure)

Return a measure where all the elements > 0 of the passed array measure are replaced by 0.

Return type

MeasureDescription

atoti.array.nth_greatest(measure, n)

Return a measure equal to the n-th greatest element of the passed array measure.

Example

>>> m["3rd greatest"] = tt.array.nth_greatest(m["PnL.SUM"], n=3)
>>> cube.query(m["3rd greatest"])
  3rd greatest
0         -.53
Return type

MeasureDescription

atoti.array.nth_lowest(measure, n)

Return a measure equal to the n-th lowest element of the passed array measure.

Example

>>> m["3rd lowest"] = tt.array.nth_lowest(m["PnL.SUM"], n=3)
>>> cube.query(m["3rd lowest"])
  3rd lowest
0     -57.51
Return type

MeasureDescription

atoti.array.positive_values(measure)

Return a measure where all the elements < 0 of the passed array measure are replaced by 0.

Return type

MeasureDescription

atoti.array.prefix_sum(measure)

Return a measure equal to the sum of the previous elements in the passed array measure.

Example

If an array has the following values: [2.0, 1.0, 0.0, 3.0], the returned array will be: [2.0, 3.0, 3.0, 6.0].

Return type

MeasureDescription

atoti.array.prod(measure)

Return a measure equal to the product of all the elements of the passed array measure.

Example

>>> m["Product"] = tt.array.prod(m["PnL.SUM"])
>>> m["Empty product"] = tt.array.prod(m["PnL.SUM"][0:0])
>>> cube.query(m["Product"], m["Empty product"])
              Product Empty product
0  122,513,372,194.94          1.00
Return type

MeasureDescription

atoti.array.quantile(measure, q, *, mode='inc', interpolation='linear')

Return a measure equal to the requested quantile of the elements of the passed array measure.

Here is how to obtain the same behavior as these standard quantile calculation methods:

  • R-1: mode="centered" and interpolation="lower"

  • R-2: mode="centered" and interpolation="midpoint"

  • R-3: mode="simple" and interpolation="nearest"

  • R-4: mode="simple" and interpolation="linear"

  • R-5: mode="centered" and interpolation="linear"

  • R-6 (similar to Excel’s PERCENTILE.EXC): mode="exc" and interpolation="linear"

  • R-7 (similar to Excel’s PERCENTILE.INC): mode="inc" and interpolation="linear"

  • R-8 and R-9 are not supported

The formulae given for the calculation of the quantile index assume a 1-based indexing system.

Parameters
  • measure (MeasureDescription) – The measure to get the quantile of.

  • q (Union[float, MeasureDescription]) – The quantile to take. Must be between 0 and 1. For instance, 0.95 is the 95th percentile and 0.5 is the median.

  • mode (Literal[‘simple’, ‘centered’, ‘inc’, ‘exc’]) –

    The method used to calculate the index of the quantile. Available options are, when searching for the q quantile of a vector X:

    • simple: len(X) * q

    • centered: len(X) * q + 0.5

    • exc: (len(X) + 1) * q

    • inc: (len(X) - 1) * q + 1

  • interpolation (Literal[‘linear’, ‘higher’, ‘lower’, ‘nearest’, ‘midpoint’]) –

    If the quantile index is not an integer, the interpolation decides what value is returned. The different options are, considering a quantile index k with i < k < j for a sorted vector X:

    • linear: v = X[i] + (X[j] - X[i]) * (k - i)

    • lowest: v = X[i]

    • highest: v = X[j]

    • nearest: v = X[i] or v = X[j] depending on which of i or j is closest to k

    • midpoint: v = (X[i] + X[j]) / 2

Return type

MeasureDescription

atoti.array.quantile_index(measure, q, *, mode='inc', interpolation='lower')

Return a measure equal to the index of requested quantile of the elements of the passed array measure.

Parameters
  • measure (MeasureDescription) – The measure to get the quantile of.

  • q (Union[float, MeasureDescription]) – The quantile to take. Must be between 0 and 1. For instance, 0.95 is the 95th percentile and 0.5 is the median.

  • mode (Literal[‘simple’, ‘centered’, ‘inc’, ‘exc’]) –

    The method used to calculate the index of the quantile. Available options are, when searching for the q quantile of a vector X:

    • simple: len(X) * q

    • centered: len(X) * q + 0.5

    • exc: (len(X) + 1) * q

    • inc: (len(X) - 1) * q + 1

  • interpolation (Literal[‘higher’, ‘lower’, ‘nearest’]) –

    If the quantile index is not an integer, the interpolation decides what value is returned. The different options are, considering a quantile index k with i < k < j for the original vector X and the sorted vector Y:

    • lowest: the index in X of Y[i]

    • highest: the index in X of Y[j]

    • nearest: the index in X of Y[i] or Y[j] depending on which of i or j is closest to k

Return type

MeasureDescription

atoti.array.replace(measure, replacements)

Return a measure where elements equal to a key of the replacements mapping are replaced with the corresponding value.

Parameters

Example

>>> df = pd.DataFrame(
...     columns=["Store ID", "New quantity", "Old quantity"],
...     data=[
...         (
...             "Store 1",
...             [12, 6, 2, 20],
...             [6, 3, 0, 10],
...         ),
...         ("Store 2", [16, 8, 12, 15], [4, 4, 6, 3]),
...         ("Store 3", [8, -10, 0, 33], [8, 0, 2, 11]),
...     ],
... )
>>> table = session.read_pandas(df, table_name="Prices", keys=["Store ID"])
>>> cube = session.create_cube(table)
>>> l, m = cube.levels, cube.measures
>>> m["Old quantity"] = tt.value(table["Old quantity"])
>>> m["New quantity"] = tt.value(table["New quantity"])
>>> m["Quantity ratio"] = m["New quantity"] / m["Old quantity"]
>>> m["Quantity ratio"].formatter = "ARRAY[',']"
>>> cube.query(m["Quantity ratio"], levels=[l["Store ID"]])
                 Quantity ratio
Store ID
Store 1    2.0,2.0,Infinity,2.0
Store 2         4.0,2.0,2.0,5.0
Store 3   1.0,-Infinity,0.0,3.0

The function can be used to replace infinity with another value more suited to follow up computations:

>>> import math
>>> m["Quantity ratio without infinity"] = tt.array.replace(
...     m["Quantity ratio"], {math.inf: 1, -math.inf: -1}
... )
>>> m["Quantity ratio without infinity"].formatter = "ARRAY[',']"
>>> cube.query(m["Quantity ratio without infinity"], levels=[l["Store ID"]])
         Quantity ratio without infinity
Store ID
Store 1                  2.0,2.0,1.0,2.0
Store 2                  4.0,2.0,2.0,5.0
Store 3                 1.0,-1.0,0.0,3.0
Return type

MeasureDescription

atoti.array.sort(measure, *, ascending=True)

Return an array measure with the elements of the passed array measure sorted.

Example

>>> m["Sorted ascendingly"] = tt.array.sort(m["PnL.SUM"])
>>> m["Sorted descendingly"] = tt.array.sort(m["PnL.SUM"], ascending=False)
>>> cube.query(m["Sorted ascendingly"], m["Sorted descendingly"])
                           Sorted ascendingly                       Sorted descendingly
0  doubleVector[10]{-110.09900000000002, ...}  doubleVector[10]{9.259999999999998, ...}
Return type

MeasureDescription

atoti.array.std(measure, *, mode='sample')

Return a measure equal to the standard deviation of the elements of the passed array measure.

Parameters
  • measure (MeasureDescription) – The measure to get the standard deviation of.

  • mode (Literal[‘sample’, ‘population’]) –

    One of the supported modes:

    • The sample standard deviation, similar to Excel’s STDEV.S, is \(\sqrt{\frac{\sum_{i=0}^{n} (X_i - m)^{2}}{n - 1}}\) where m is the sample mean and n the size of the sample. Use this mode if the data represents a sample of the population.

    • The population standard deviation, similar to Excel’s STDEV.P is \(\sqrt{\frac{\sum_{i=0}^{n}(X_i - m)^{2}}{n}}\) where m is the mean of the Xi elements and n the size of the population. Use this mode if the data represents the entire population.

Return type

MeasureDescription

atoti.array.sum(value: ArrayType) JavaFunctionOperation
atoti.array.sum(value: MeasureDescription) MeasureDescription

Return a measure equal to the sum of all the elements of the passed array measure.

Example

>>> m["Sum"] = tt.array.sum(m["PnL.SUM"])
>>> m["Empty sum"] = tt.array.sum(m["PnL.SUM"][0:0])
>>> cube.query(m["Sum"], m["Empty sum"])
       Sum Empty sum
0  -308.29       .00
Return type

Union[JavaFunctionOperation, MeasureDescription]

atoti.array.var(measure, *, mode='sample')

Return a measure equal to the variance of the elements of the passed array measure.

Parameters
  • measure (MeasureDescription) – The measure to get the variance of.

  • mode (Literal[‘sample’, ‘population’]) –

    One of the supported modes:

    • The sample variance, similar to Excel’s VAR.S, is \(\frac{\sum_{i=0}^{n} (X_i - m)^{2}}{n - 1}\) where m is the sample mean and n the size of the sample. Use this mode if the data represents a sample of the population.

    • The population variance, similar to Excel’s VAR.P is \(\frac{\sum_{i=0}^{n}(X_i - m)^{2}}{n}\) where m is the mean of the Xi elements and n the size of the population. Use this mode if the data represents the entire population.

Return type

MeasureDescription