Array functions#

Atoti is designed to handle array data efficiently.

There are multiple ways to load arrays into Atoti tables. For instance:

  • With atoti.Session.read_pandas():

    >>> import numpy as np
    >>> df = pd.DataFrame(
    ...     columns=["NumPy array", "Python list"],
    ...     data=[
    ...         (np.array([1.0, 2.0, 3.0]), [1, 2, 3]),
    ...         (np.array([4.0, 5.0, 6.0]), [4, 5, 6]),
    ...         (np.array([7.0, 8.0, 9.0]), [7, 8, 9]),
    ...     ],
    ... )
    >>> pandas_table = session.read_pandas(df, table_name="DataFrame with arrays")
    >>> pandas_table.head()
           NumPy array Python list
    0  [1.0, 2.0, 3.0]   [1, 2, 3]
    1  [4.0, 5.0, 6.0]   [4, 5, 6]
    2  [7.0, 8.0, 9.0]   [7, 8, 9]
    
  • With atoti.Session.read_csv():

    >>> pnl_table = session.read_csv(
    ...     TEST_RESOURCES_PATH / "csv" / "pnl.csv",
    ...     array_separator=";",
    ...     keys={"Continent", "Country"},
    ...     table_name="PnL",
    ... )
    >>> pnl_table.head().sort_index()
                                                                     PnL
    Continent Country
    America   Mexico   [-10.716, 9.593, 1.356, -21.185, 5.989, 9.686,...
    Asia      China    [-1.715, 2.425, -4.059, 4.102, -2.331, -2.662,...
              India    [-18.716, 8.583, -41.356, -11.138, 3.949, 5.66...
    Europe    France   [-0.465, -0.025, 0.601, 0.423, -0.815, 0.024, ...
              UK       [11.449, -35.5464, -66.641, -48.498, -6.3126, ...
    

As for scalar measures, Atoti provides the default SUM and MEAN aggregations on array measures. They are applied element by element:

>>> cube = session.create_cube(pnl_table)
>>> l, m = cube.levels, cube.measures
>>> cube.query(m["PnL.SUM"], m["PnL.MEAN"], levels=[l["Continent"]])
                                  PnL.SUM                         PnL.MEAN
Continent
America    doubleVector[10]{-10.716, ...}   doubleVector[10]{-10.716, ...}
Asia       doubleVector[10]{-20.431, ...}  doubleVector[10]{-10.2155, ...}
Europe      doubleVector[10]{10.984, ...}     doubleVector[10]{5.492, ...}

Besides the functions below, arrays support the following operations:

  • Arithmetic operators:

    >>> m["PnL +10"] = m["PnL.SUM"] + 10.0
    >>> cube.query(m["PnL +10"])
                              PnL +10
    0  doubleVector[10]{-10.163, ...}
    
    >>> m["PnL -10"] = m["PnL.SUM"] - 10.0
    >>> cube.query(m["PnL -10"])
                              PnL -10
    0  doubleVector[10]{-30.163, ...}
    
    >>> m["PnL x10"] = m["PnL.SUM"] * 10.0
    >>> cube.query(m["PnL x10"])
                              PnL x10
    0  doubleVector[10]{-201.63, ...}
    
    >>> m["PnL /10"] = m["PnL.SUM"] / 10.0
    >>> cube.query(m["PnL /10"])
                              PnL /10
    0  doubleVector[10]{-2.0163, ...}
    
  • Indexing:

    >>> m["First element"] = m["PnL.SUM"][0]
    >>> cube.query(m["First element"], m["PnL.SUM"])
      First element                         PnL.SUM
    0        -20.16  doubleVector[10]{-20.163, ...}
    

    This can be used with atoti.Cube.create_parameter_hierarchy_from_members() to “slice” the array:

    >>> cube.create_parameter_hierarchy_from_members("Index", list(range(0, 10)))
    >>> m["PnL at index"] = m["PnL.SUM"][l["Index"]]
    >>> cube.query(m["PnL at index"], levels=[l["Index"]])
          PnL at index
    Index
    0           -20.16
    1           -14.97
    2          -110.10
    3           -76.30
    4              .48
    5           -57.51
    6             -.53
    7           -15.49
    8           -22.97
    9             9.26
    

    Non-integer hierarchies can also be created:

    >>> from datetime import date, timedelta
    >>> dates = [date(2020, 1, 1) + timedelta(days=offset) for offset in range(0, 10)]
    >>> cube.create_parameter_hierarchy_from_members(
    ...     "Dates", dates, index_measure_name="Date index"
    ... )
    >>> m["PnL at date"] = m["PnL.SUM"][m["Date index"]]
    >>> cube.query(m["Date index"], m["PnL at date"], levels=[l["Dates"]])
               Date index PnL at date
    Dates
    2020-01-01          0      -20.16
    2020-01-02          1      -14.97
    2020-01-03          2     -110.10
    2020-01-04          3      -76.30
    2020-01-05          4         .48
    2020-01-06          5      -57.51
    2020-01-07          6        -.53
    2020-01-08          7      -15.49
    2020-01-09          8      -22.97
    2020-01-10          9        9.26
    
  • Slicing:

    >>> m["First 2 elements"] = m["PnL.SUM"][0:2]
    >>> cube.query(m["First 2 elements"], m["PnL.SUM"])
                    First 2 elements                         PnL.SUM
    0  doubleVector[2]{-20.163, ...}  doubleVector[10]{-20.163, ...}
    
  • Selecting elements at given indices:

    A Tuple[int, ...] or a measure of type INT_ARRAY or LONG_ARRAY can be provided to create another array measure containing the values at the passed indices:

    >>> m["First and last"] = m["PnL.SUM"][0, -1]
    >>> cube.query(m["First and last"])
                      First and last
    0  doubleVector[2]{-20.163, ...}
    

len

Return a measure equal to the number of elements of the passed array measure.

max

Return a measure equal to the maximum element of the passed array measure.

mean

Return a measure equal to the mean of all the elements of the passed array measure.

min

Return a measure equal to the minimum element of the passed array measure.

n_greatest

Return an array measure containing the n greatest elements of the passed array measure.

n_greatest_indices

Return an array measure containing the indices of the n greatest elements of the passed array measure.

n_lowest

Return an array measure containing the n lowest elements of the passed array measure.

n_lowest_indices

Return an array measure containing the indices of the n lowest elements of the passed array measure.

negative_values

Return a measure where all the elements > 0 of the passed array measure are replaced by 0.

nth_greatest

Return a measure equal to the n-th greatest element of the passed array measure.

nth_lowest

Return a measure equal to the n-th lowest element of the passed array measure.

positive_values

Return a measure where all the elements < 0 of the passed array measure are replaced by 0.

prefix_sum

Return a measure equal to the sum of the previous elements in the passed array measure.

prod

Return a measure equal to the product of all the elements of the passed array measure.

quantile

Return a measure equal to the requested quantile of the elements of the passed array measure.

quantile_index

Return a measure equal to the index of requested quantile of the elements of the passed array measure.

replace

Return a measure where elements equal to a key of the replacements mapping are replaced with the corresponding value.

sort

Return an array measure with the elements of the passed array measure sorted.

std

Return a measure equal to the standard deviation of the elements of the passed array measure.

sum

Return a measure equal to the sum of all the elements of the passed array measure.

var

Return a measure equal to the variance of the elements of the passed array measure.