atoti package¶
Subpackages¶
- atoti.config package
- atoti.query package
- Submodules
- atoti.query.auth module
- atoti.query.basic_auth module
- atoti.query.cube module
- atoti.query.cubes module
- atoti.query.hierarchies module
- atoti.query.hierarchy module
- atoti.query.level module
- atoti.query.levels module
- atoti.query.measure module
- atoti.query.measures module
- atoti.query.session module
- Module contents
- atoti.scope package
Submodules¶
atoti.agg module¶
Aggregation functions.
-
atoti.agg.
avg
(measure, scope=None)¶ Perform an average aggregation of the input measure.
-
atoti.agg.
count_distinct
(column, scope=None)¶ Perform a count dictinct aggregation of the input store column.
-
atoti.agg.
long
(measure, scope=None)¶ Perform a long aggregation of the input measure, which is a the sum of positive values.
-
atoti.agg.
max
(measure, scope=None)¶ Perform a max aggregation of the input measure.
-
atoti.agg.
median
(measure, scope=None)¶ Perform a median aggregation of the input measure.
-
atoti.agg.
min
(measure, scope=None)¶ Perform a min aggregation of the input measure.
-
atoti.agg.
percentile
(measure, percentile_value, mode='inc', interpolation='linear', scope=None)¶ Perform a percentile aggregation of the given scalar measure.
Here is how to obtain the same behaviour as standard quantile calculation methods, described here: https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample.
R1: centered method using lower interpolation
R2: centered method using midpoint interpolation
R3: simple method using nearest interpolation
R4: simple method using linear interpolation
R5: centered method using linear interpolation
R6: exc method using linear interpolation. This also corresponds to Excel’s PERCENTILE.EXC
R7: inc method using linear interpolation. This also corresponds to Excel’s PERCENTILE.INC
R8 & R9 are not currently supported by our API
The formulae given for the calculation of the quantile index assume a 1-based indexing system.
- Parameters
measure (
Union
[Measure
,MeasureConvertible
]) – the scalar measure.percentile_value (
Union
[float
,Measure
]) – The percentile to take. Must be strictly between 0 and 1. For instance 0.95 is the 95th percentile, 0.5 is the median…mode (
str
) –The method used to calculate the index of the percentile, available options are, when search for the q-th percentile of a vector X:
exc: The calculated position of the percentile is (len(X) + 1) * q
inc: The calculated position of the percentile is (len(X) - 1) * q + 1
centered: The calculated position of the percentile is len(X) * q + 0.5
simple: The calculated position of the percentile is len(X) * q
interpolation (
str
) –If the percentile index is not an integer, the interpolation decides what value is returned. The different options are, considering a percentile index k between i and j, i < k < j for a sorted vector X:
linear: v = X[i] + (X[j] - X[i]) * (k - i)
lowest: v = X[i]
highest: v = X[j]
nearest: v = X[i] or v = X[j] depending on which of i or j is closest to k
midpoint: v = (X[i] + X[j]) / 2
scope (
Optional
[Scope
]) – The scope of the aggregation
- Return type
Measure
- Returns
The aggregated percentile measure.
-
atoti.agg.
prod
(measure, scope=None)¶ Perform a product aggregation of the input measure.
-
atoti.agg.
short
(measure, scope=None)¶ Perform a short aggregation of the input measure, which is a the sum of negative values.
-
atoti.agg.
single_value
(measure, scope=None)¶ Perform a single value aggregation of the input measure.
-
atoti.agg.
square_sum
(measure, scope=None)¶ Perform the sum of the square of the input measure.
-
atoti.agg.
std
(measure, mode='sample', scope=None)¶ Get the standard deviation of the input data.
The standard deviation is the square root of the variance.
As for variance, there are two types of standard deviation:
The sample standard deviation, which is similar to STDEV.S in Excel and based on the sample variance. It uses the following formula: sqrt( sum((Xi - m)²) / (n -1) ) where m is the sample mean and n the size of the sample. Use this mode if your data represents a sample of the population.
The population standard deviation, which is similar to STDEV.P in Excel and based on the population variance. It uses the following formula: sqrt( sum((Xi - m)²) / n ) where m is the average of the Xi elements and n the size of the population. Use this mode if your data represents the entire population.
- Parameters
- Return type
Measure
- Returns
The measure representing the standard deviation of the input measure or store column.
-
atoti.agg.
stop
(measure, at)¶ Stop aggregating the measure’s values above the provided levels.
- Parameters
measure (Measure) – The measure to restrict.
at (Collection[LevelOrName]) – List of levels to stop at.
- Return type
RestrictedMeasure
- Returns
The restricted measure.
-
atoti.agg.
sum
(measure, scope=None)¶ Perform a sum aggregation of the input measure.
-
atoti.agg.
variance
(measure, mode='sample', scope=None)¶ Get the variance of the input data.
There are two types of variance:
The sample variance, which is similar to VAR.S in Excel. It uses the following formula: sum((Xi - m)²) / (n -1) where m is the sample mean and n the size of the sample. Use this mode if your data represents a sample of the population.
The population variance, which is similar to VAR.P in Excel. It uses the following formula: sum((Xi - m)²) / n where m is the average of the Xi elements and n the size of the population. Use this mode if your data represents the entire population.
- Parameters
- Return type
Measure
- Returns
The measure representing the variance of the input measure or store column.
atoti.aggregates_cache module¶
Aggregates cache.
-
class
atoti.aggregates_cache.
AggregatesCache
(_java_api, _cube)¶ Bases:
object
The aggregates cache associated with a cube.
-
property
capacity
¶ Capacity of the cache.
It’s the number of (location, measure) pairs of all the aggregates that can be stored.
A strictly negative value will disable caching.
A zero value will enable sharing but no caching. This means that queries will share their computations if they are executed at the same time, but the aggregated values will not be stored to be retrieved later. The size of the cache will then dictate the number of (location, measure) pairs that can be stored in the cache with their values.
- Return type
-
property
atoti.array module¶
Measure functions.
-
atoti.array.
avg
(measure)¶ Return the average of all the values of an array.
- Parameters
measure (
Measure
) – the array measure to average- Return type
Measure
- Returns
A new measure equal to the average of all the values of the array.
-
atoti.array.
len
(measure)¶ Get the length of an array measure.
- Parameters
measure (
Measure
) – An array measure to get the size of
- Returns
A measure representing the size of the array.
- Return type
Measure
-
atoti.array.
max
(measure)¶ Return the biggest value of the array.
- Parameters
measure (
Measure
) – the array measure which max is taken- Return type
Measure
- Returns
A new measure equal to the max of the array.
-
atoti.array.
min
(measure)¶ Return the smallest value of the array.
- Parameters
measure (
Measure
) – the array measure which min is taken- Return type
Measure
- Returns
A new measure equal to the min of the array.
-
atoti.array.
n_greatest
(measure, n)¶ Take the top n values of an array measure.
-
atoti.array.
n_lowest
(measure, n)¶ Take the bottom n values of an array measure.
-
atoti.array.
negative_values
(measure)¶ Replace all the stricly positive values in an array measure by 0.
- Parameters
measure (
Measure
) – the array measure- Return type
Measure
- Returns
A new array measure that contain only negative values.
-
atoti.array.
nth_greatest
(measure, n)¶ Return the nth greatest element of an array measure.
-
atoti.array.
nth_lowest
(measure, n)¶ Return the nth lowest element of an array measure.
-
atoti.array.
percentile
(measure, percentile_value, mode='inc', interpolation='linear')¶ Percentile calculation.
Take the percentile of an array. The n-th percentile is the smallest value for which n% of the elements are smaller.
Here is how to obtain the same behaviour as standard quantile calculation methods, described here: https://en.wikipedia.org/wiki/Quantile#Estimating_quantiles_from_a_sample.
R1: centered method using lower interpolation
R2: centered method using midpoint interpolation
R3: simple method using nearest interpolation
R4: simple method using linear interpolation
R5: centered method using linear interpolation
R6: exc method using linear interpolation. This also corresponds to Excel’s PERCENTILE.EXC
R7: inc method using linear interpolation. This also corresponds to Excel’s PERCENTILE.INC
R8 & R9 are not currently supported by our API
The formulae given for the calculation of the quantile index assume a 1-based indexing system.
- Parameters
measure (
Measure
) – the array measure.percentile_value (
Union
[float
,Measure
]) – The percentile to take. Must be between 0 and 1. For instance 0.95 is the 95th percentile, 0.5 is the median…mode (
str
) –The method used to calculate the index of the percentile, available options are, when search for the q-th percentile of a vector X:
exc: The calculated position of the percentile is (len(X) + 1) * q
inc: The calculated position of the percentile is (len(X) - 1) * q + 1
centered: The calculated position of the percentile is len(X) * q + 0.5
simple: The calculated position of the percentile is len(X) * q
interpolation (
str
) –If the percentile index is not an integer, the interpolation decides what value is returned. The different options are, considering a percentile index k between i and j, i < k < j for a sorted vector X:
linear: v = X[i] + (X[j] - X[i]) * (k - i)
lowest: v = X[i]
highest: v = X[j]
nearest: v = X[i] or v = X[j] depending on which of i or j is closest to k
midpoint: v = (X[i] + X[j]) / 2
- Return type
Measure
- Returns
The percentile measure.
-
atoti.array.
positive_values
(measure)¶ Replace all the stricly negative values in an array measure by 0.
- Parameters
measure (
Measure
) – the array measure- Return type
Measure
- Returns
A new array measure that contain only positive values.
-
atoti.array.
sort
(measure)¶ Return the ascend sorted array of a measure.
- Parameters
measure (
Measure
) – the array measure to apply the sort on.- Return type
Measure
- Returns
A new measure corresponding to the array of the first one.
-
atoti.array.
std
(measure, mode='sample')¶ Get the standard deviation of the array elements.
The standard deviation is the square root of the variance.
As for variance, there are two types of standard deviation:
The sample standard deviation, which is similar to STDEV.S in Excel and based on the sample variance. It uses the following formula: sqrt( sum((Xi - m)²) / (n -1) ) where m is the sample mean and n the size of the sample. Use this mode if your data represents a sample of the population.
The population standard deviation, which is similar to STDEV.P in Excel and based on the population variance. It uses the following formula: sqrt( sum((Xi - m)²) / n ) where m is the average of the Xi elements and n the size of the population. Use this mode if your data represents the entire population.
- Parameters
measure (
Measure
) – the array measure to apply the standard deviation on.mode (
str
) – the standard deviation mode, either “sample” or “population”.
- Return type
Measure
- Returns
The measure representing the standard deviation of the input measure or field.
-
atoti.array.
sum
(measure)¶ Return the sum of all the values of an array.
- Parameters
measure (
Measure
) – the array measure to sum- Return type
Measure
- Returns
A new measure equal to the sum of all the values of the array.
-
atoti.array.
variance
(measure, mode='sample')¶ Return the variance of the array elements.
There are two types of variance :
The sample variance, which is similar to VAR.S in Excel. It uses the following formula : sum((Xi - m)²) / (n -1) where m is the sample mean and n the size of the sample. Use this mode if your data represents a sample of the population.
The population variance, which is similar to VAR.P in Excel. It uses the following formula : sum((Xi - m)²) / n where m is the average of the Xi elements and n the size of the population. Use this mode if your data represents the entire population.
- Parameters
measure (
Measure
) – the array measure to apply the variance on.mode (
str
) – the variance mode, either “sample” or “population”
- Return type
Measure
- Returns
A new measure equal to the variance of the array.
atoti.column module¶
Column of a Store.
atoti.comparator module¶
Level comparators.
-
atoti.comparator.
first_members
(members)¶ Create a level comparator with the given first members.
- Return type
atoti.cube module¶
Cube of a Session.
-
class
atoti.cube.
Cube
(java_api, name, base_store, session)¶ Bases:
object
Cube of a Session.
-
property
aggregates_cache
¶ Aggregates cache of the cube.
- Return type
-
create_bucketing
(name, columns, rows=None, bucket_dimension='Buckets', weight_name=None, weighted_measures=None)¶ Create a bucket.
The bucketing is done by mapping one or several columns to buckets with weights. This mapping is done in a store with all the columns of the mapping, a column with the bucket and a column for the weight:
+---------+---------+---------+-----------+------------------+ | Column1 | Column2 | Column3 | My Bucket | My Bucket_weight | +---------+---------+---------+-----------+------------------+ | a | b | c | BucketA | 0.25 | | a | b | c | BucketB | 0.75 | | d | e | f | BucketA | 1.0 | | g | h | i | BucketB | 1.0 | +---------+---------+---------+-----------+------------------+
There are multiple ways to feed this store:
with a pandas DataFrame corresponding to the store
with a list of the rows:
[ ["a","b","c","BucketA", 0.25], ["a","b","c","BucketB", 0.75], ... ]
with a dict:
{ ("a","b","c") : { "BucketA" : 0.25, "BucketB" : 0.75 }, ("d","e","f") : { "BucketA" : 1.0}, ... }
Some measures can be overriden automatically to be scaled with the weights.
- Parameters
name (str) – The name of the bucket. It will be used as the name of the column in the bucket store and as the name of the bucket hierarchy.
columns (Sequence[LevelOrName]) – the columns to bucket on.
weighted_measures (Optional[Sequence[MeasureOrName]]) – Measures that will be scaled with the weight
rows (BucketRows) – The mapping between the columns and the bucket. It can either be a list of rows, or a pandas DataFrame.
bucket_dimension (str) – The name of the dimension ot put the bucket hierarchy in.
weight_name (Optional[str]) – the name of the measure for the weights.
- Return type
- Returns
The store of the bucketing. This store can be modified to change the bucket dynamically.
-
create_parameter_hierarchy
(level_and_hierarchy_name, members, indices=None, slicing=True, index_measure='', level_type=None)¶ Create an arbitrary single-level static analysis hierarchy with the given members.
- Parameters
level_and_hierarchy_name (
str
) – the name of the single level in the new hierarchyindices (
Optional
[List
[int
]]) – the list of indices for each member in the new hierarchyslicing (
bool
) – whether the hierarchy is slicingindex_measure (
str
) – the name of the indexing measure for this hierarchy, if anylevel_type (
Optional
[AtotiType
]) – the type with which the members will be stored. Automatically inferred by default.
-
explain_query
(*measures, levels=None, condition=None, scenario='Base', timeout=30)¶ Run the query but return an explanation of the query instead of the result.
The explanation contains a summary, global timings and the query plan with all the retrievals.
- Parameters
measures (
NamedMeasure
) – the measures to query.levels (
Union
[Level
,Sequence
[Level
],None
]) – the levels to split on.condition (
Union
[LevelCondition
,MultiCondition
,None
]) –the filtering condition. Only conditions on level equality with a string are supported. For instance:
lvl["Country"] == "France"
(lvl["Country"] == "USA") & (lvl["Currency"] == "USD")
scenario (
str
) – the scenario to query.timeout (
int
) – the query timeout in seconds.
- Return type
QueryAnalysis
- Returns
the query explanation
-
property
hierarchies
¶ Hierarchies of the cube.
- Return type
-
query
(*measures, levels=None, condition=None, scenario='Base', timeout=30)¶ Query the cube to get the value of some measures.
The value of the measures is given on all the members of the given levels. If no measure is specified then all the measures are returned. If no level is specified then the value at the top level is returned
- Parameters
measures (
NamedMeasure
) – the measures to query.levels (
Union
[Level
,Sequence
[Level
],None
]) – the levels to split on.condition (
Union
[LevelCondition
,MultiCondition
,None
]) –the filtering condition. Only conditions on level equality with a string are supported. For instance:
lvl["Country"] == "France"
(lvl["Country"] == "USA") & (lvl["Currency"] == "USD")
scenario (
str
) – the scenario to query.timeout (
int
) – the query timeout in seconds.
- Return type
DataFrame
- Returns
the resulting DataFrame.
-
setup_simulation
(name, multiply=None, replace=None, add=None, per=None, base_scenario_name='Base')¶ Create a simulation for the given measures.
This creates a store to configure the simulation. You cannot use the same measure in several methods.
You can create as many scenarios as you want for each simulation you create.
- Parameters
name (
str
) – The name of the simulationmultiply (
Optional
[Collection
[Measure
]]) – Collection of measures whose values will be multipliedreplace (
Optional
[Collection
[Measure
]]) – Collection of measures whose values will be replacedadd (
Optional
[Collection
[Measure
]]) – Collection of measures whose values will be added (incremented)per (
Optional
[Sequence
[Level
]]) – Sequence of levels to simulate onbase_scenario_name (
str
) – The name of the base scenario
- Return type
- Returns
The simulation on which scenarios can be made
-
property
simulations
¶ Simulations of the cube.
- Return type
-
visualize
(name=None)¶ Display an Atoti widget to explore the cube interactively.
This is only supported in JupyterLab and requires the Atoti extension to be installed and enabled.
The widget state will be stored in the cell metadata. You should not have to edit this state but, if you want to, you can find it in JupyterLab by opening the “Notebook tools” sidebar and expanding the the “Advanced Tools” section.
-
property
atoti.cubes module¶
Cubes.
-
class
atoti.cubes.
Cubes
(_java_api, _cubes=<factory>)¶ Bases:
collections.abc.MutableMapping
,typing.Generic
Manage the cubes of the session.
atoti.exceptions module¶
Custom Atoti exceptions.
The custom exceptions are here to disguise the “ugly” stack traces which occur when Py4J raises a Java error. If any other exception is raised by the code inside the custom hook, it is processed normally.
This module is public so the exceptions classes are public and can be documented.
-
exception
atoti.exceptions.
AtotiException
¶ Bases:
Exception
The generic Atoti exception class.
All exceptions which inherit from this class will be treated differently when raised. However, this exception is still handled by the default excepthook.
-
exception
atoti.exceptions.
AtotiJavaException
(message, java_traceback, java_exception)¶ Bases:
atoti.exceptions.AtotiException
Exception thrown when Py4J throws a Java exception.
-
exception
atoti.exceptions.
AtotiPy4JException
¶ Bases:
atoti.exceptions.AtotiException
Exception thrown when Py4J throws a Py4JError.
-
exception
atoti.exceptions.
AtotimNetworkException
¶ Bases:
atoti.exceptions.AtotiException
Exception thrown when Py4J throws a network exception.
atoti.hierarchies module¶
Hierarchies.
-
class
atoti.hierarchies.
Hierarchies
(_java_api, _cube)¶ Bases:
atoti._mappings.DelegateMutableMapping
Manage the hierarchies.
atoti.hierarchy module¶
Hierarchy of a Cube.
atoti.level module¶
Level of a Hierarchy.
atoti.levels module¶
Levels.
-
class
atoti.levels.
Levels
(_hierarchies)¶ Bases:
atoti._base_levels.BaseLevels
Flat representation of all the levels in the cube.
atoti.logs module¶
Logs.
atoti.measures module¶
Measures.
-
class
atoti.measures.
Measures
(_java_api, _cube)¶ Bases:
atoti._mappings.DelegateMutableMapping
Manage the measures.
atoti.sampling module¶
Different sampling modes for sources.
-
class
atoti.sampling.
SamplingMode
(name, parameters)¶ Bases:
object
Mode of source loading.
-
name
: str = None¶
-
parameters
: List[Any] = None¶
-
atoti.session module¶
Session.
-
class
atoti.session.
Session
(sampling_mode, port=None, max_memory=None, java_args=None, name='Unnamed', config=None, **kwargs)¶ Bases:
object
Holds a connection to the Java gateway.
-
close
()¶ Close this session and free all the associated resources.
-
create_cube
(base_store, name=None, mode='auto')¶ Create a cube using the provided store as the base store.
Create a cube using all the reachable columns as single level dimension.
- Parameters
base_store (
Store
) – the store to use as the base store.name (
Optional
[str
]) – The name of the created cube. It should be alphanumeric without spaces. If no name is provided, it will default to the name of the base store, stripped of all non alphanumeric characters.mode (
str
) – The cube creation configuration. “manual” doesn’t create any hierarchy or measure (except the count); “auto” creates hierarchies for every non-numeric column, and measures for every numeric column; “no_measures” creates the hierarchies like “auto” but does not create any measures.
- Return type
-
create_scenario
(name, origin='Base')¶ Create a new scenario in the datastore.
-
delete_scenario
(scenario)¶ Delete the source scenario with the provided name if it exists.
- Return type
None
-
property
excel_url
¶ URL of the Excel endpoint.
To connect to the cubes in Excel, create a new connection to an Analysis Services. Use this URL for the ‘server’ field and choose to connect with “User Name and Password”:
without authentication, leave these fields blank.
with Basic authentication, fill them with your username and password.
other authentication types (such as Auth0) are not supported by Excel.
- Return type
-
explain_mdx_query
(mdx, timeout=30)¶ Explain an MDX SELECT query.
-
load_all_data
()¶ Trigger the full loading of the data.
-
logs_tail
(n=20)¶ Get the n last lines of the logs.
-
query_mdx
(mdx, timeout=30)¶ Execute an MDX SELECT query and return its result as a pandas DataFrame.
-
read_csv
(file_path, keys=None, store_name=None, in_all_scenarios=True, sep=None, encoding='utf-8', process_quotes=None, partitioning=None, types=None, watch=False, array_sep=None, sampling_mode=None)¶ Read a CSV file into a store.
The columns data types are automatically inferred based on the first 1,000 lines of the CSV file. The types parameter can be specified to explicitely set some column data types.
- Parameters
file_path (Union[pathlib.Path, str]) – The path to the CSV file or directory to load. If a path pointing to a directory is provided, all of the files with the ‘.csv’ extension will be loaded into the same store and, as such, they are all expected to share the same schema.
keys (optional) – The list of key columns in the file.
store_name (optional) – The name of the store to create. Defaults to the name of the file.
in_all_scenarios (bool) – whether to load the CSV in all existing scenarios. True by default.
sep (Optional[str]) – Delimiter to use. If sep is None, the separator will automatically be detected.
encoding (str) – Encoding to use for UTF when reading. Defaults to ‘utf-8’.
process_quotes (Optional[bool]) –
Whether double quotes should be processed to follow the official CSV specification:
Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields
A double-quote appearing inside a field must be escaped by preceding it with another double quote
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes
When setting this parameter to false, all double-quotes within a field will be treated as any regular character, to follow Excel behavior. CAREFUL: in this mode, it is expected that fields are NOT enclosed in double quotes. It is also not possible to have a line break inside a field. If set to None, the behaviour will be inferred from the first lines of the CSV.
partitioning (optional) – The store partitioning description that describes how the data will be split across partitions of the store. For instance, use ‘hash4(country)’ to split the data across 4 partitions based on the country column’s hash value. Only key columns can be used in the partitioning description.
types (Optional[Dict[str, AtotiType]]) – Types for some columns of the store. Types are automatically inferred but they can also be specified.
watch (bool) – Whether or not the source file or directory should be watched for changes. If this option is set to true, whenever you change the source, the changes will be reflected in the store. If the source is a directory, any new CSV files added will be loaded into the same store as the initial data, the new files must therefore have the same schema as the initial data as well. Any files added to the directory which aren’t CSV files will be ignored.
array_sep (Optional[str]) – Delimiter to use for arrays. Setting it to a non-None value will parse all the columns containing this separator as arrays. Defaults to None.
sampling_mode (Optional[SamplingMode]) – the sampling mode. Defaults to this session’s one.
- Return type
- Returns
The created store that holds the content of the file.
-
read_numpy
(data, columns, store_name, keys, in_all_scenarios=True, partitioning=None, sep='|')¶ Read a numpy array into a new store.
- Parameters
data (np.ndarray) – The numpy array to read the data from, must be a 2D array.
columns (Sequence[str]) – The names to use for the store’s columns, they must be in the same order as the values in the numpy array.
store_name (str) – The name of the store to create.
keys (Sequence[str]) – Key columns for the store.
in_all_scenarios (bool) – whether to load the data in all existing scenarios of the datastore.
by default. (True) –
partitioning (optional) – The store partitioning description that describes how the data will be split across partitions of the store. For instance, use ‘hash4(country)’ to split the data across 4 partitions based on the country column’s hash value. Only key columns can be used in the partitioning descirption.
sep (str) – Specify a separator to use if you have special characters in your column values.
- Return type
- Returns
The created store that holds the content of the array.
-
read_pandas
(dataframe, keys=None, store_name=None, partitioning=None, types=None, **kwargs)¶ Read a pandas DataFrame into a store.
All the named indices of the dataframe are included into the store. Multilevel columns are flattened into a single string name.
- Parameters
dataframe (
DataFrame
) – The DataFrame to load.keys (optional) – The list of key columns in the DataFrame.
store_name (optional) – The name of the store to create. Defaults to a random string.
partitioning (optional) – The store partitioning description that describes how the data will be split across partitions of the store. For instance, use ‘hash4(country)’ to split the data across 4 partitions based on the country column’s hash value. Only key columns can be used in the partitioning description.
types (optional) – Types for some columns of the store. Types are automatically inferred but can also be specified.
- Return type
- Returns
The created store that holds the content of the DataFrame.
-
read_parquet
(file_path, keys=None, store_name=None, in_all_scenarios=True, partitioning=None, sampling_mode=None)¶ Read a parquet file into a store.
- Parameters
file_path (Union[pathlib.Path, str]) – The path to the Parquet file to load.
keys (optional) – The list of key columns in the file.
store_name (optional) – The name of the store to create. Defaults to the name of the file.
in_all_scenarios (optional) – whether to load the CSV in all existing scenarios. True by default.
partitioning (optional) – The store partitioning description that describes how the data will be split across partitions of the store. For instance, use ‘hash4(country)’ to split the data across 4 partitions based on the country column’s hash value. Only key columns can be used in the partitioning description.
sampling_mode (Optional[SamplingMode]) – The sampling mode. Defaults to this session’s one.
- Return type
- Returns
The created store that holds the content of the file.
-
read_spark
(dataframe, keys=None, store_name=None, partitioning=None)¶ Read a spark dataframe into a store.
- Parameters
dataframe (SparkDataFrame) – The Spark dataframe to load
keys (optional) – The list of key columns in the dataframe.
store_name (optional) – The name of the store to create. Defaults to a random string.
partitioning (optional) – The store partitioning description that describes how the data will be split across partitions of the store. For instance, use ‘hash4(country)’ to split the data across 4 partitions based on the country column’s hash value. Only key columns can be used in the partitioning description.
- Return type
- Returns
The created store that holds the content of the dataframe.
-
property
scenarios
¶ List of scenarios of the session.
- Return type
-
property
url
¶ Public URL of the session.
If the ATOTI_URL_PATTERN environment variable is set then it used to build the URL. The following placeholders are replaced in the pattern:
{port} will be replaced by the actual port number.
{host} will be replaced by the host address.
{env.XXX} will be replaced by the value of the XXX environment variable.
If it is not set, it defaults to “http://localhost:{port}”.
- Return type
-
wait
()¶ Wait for the underlying server subprocess to terminate.
This will prevent the Python process to exit.
- Return type
None
-
atoti.simulation module¶
Simulation and related classes.
-
class
atoti.simulation.
Priority
¶ Bases:
enum.Enum
Predifined priority levels for simulations.
-
CRITICAL
= 3¶
-
IMPORTANT
= 2¶
-
LOW
= 0¶
-
NORMAL
= 1¶
-
-
class
atoti.simulation.
Scenario
(name, _simulation, _java_api)¶ Bases:
object
A scenario for a simulation.
-
property
columns
¶ Get the columns of the scenario.
They can be used as headers of a DataFrame to load into the scenario.
-
property
columns_without_priority
¶ Get the columns of the scenario (Priority column excluded).
They can be used as headers of a DataFrame to load into the scenario.
-
head
(n=5)¶ Return first n rows of this scenario as a pandas DataFrame.
- Parameters
n (
int
) – the number of rows to display.- Return type
DataFrame
- Returns
The first n rows of the scenario as a pandas dataframe.
-
insert
(row)¶ Insert a row into the scenario.
- Parameters
row (Row) – the row to be inserted. Can either be a list of values in the correct order or
dict whose keys are the column names. (a) –
-
load_csv
(file, delimiter=', ')¶ Load a CSV into this scenario.
The expected format for the CSV’s columns is: column_1, column_2, … , column_n, simulationName_measure1_value
The name of the scenario is automatically added before the row is added to the simulation store.
-
load_pandas
(dataframe, **kwargs)¶ Load a DataFrame into this scenario.
The expected format for the DataFrame’s headers is: column_1, column_2, … , column_n, simulationName_value
The scenario’s name is automatically added to the DataFrame.
- Parameters
dataframe (
DataFrame
) – The DataFrame to load
-
name
: str = None¶
-
property
-
class
atoti.simulation.
Simulation
(_name, _levels, _multiply, _replace, _add, _base_scenario, _cube, _java_api)¶ Bases:
object
Represents a simulation.
-
property
columns
¶ Columns of the simulation.
They can be used as headers of a DataFrame to load into the simulation.
-
head
(n=5)¶ Return the first n rows of the simulation as a pandas DataFrame.
- Parameters
n (
int
) – the number of rows to display.- Return type
DataFrame
- Returns
The first n rows of the simulation
-
load_csv
(file, sep=None, encoding='utf-8', process_quotes=True, watch=False, array_sep=None)¶ Load a CSV into this simulation.
The expected format for the CSV’s headers is: column_1, column_2, … , column_n, simulationName, simulationName_measure1_value, …
The value provided in the simulationName column is the name of the scenario you want to apply the values to.
- Parameters
file (Union[pathlib.Path, str]) – the path to the CSV file.
sep (Optional[str]) – Delimiter to use. If sep is None, the separator will automatically be detected.
encoding (str) – Encoding to use for UTF when reading. Defaults to ‘utf-8’.
process_quotes (bool) –
Whether double quotes should be processed to follow the official CSV specification:
Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields
A double-quote appearing inside a field must be escaped by preceding it with another double quote
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes
When setting this parameter to false, all double-quotes within a field will be treated as any regular character, to follow Excel behavior. CAREFUL: in this mode, it is expected that fields are NOT enclosed in double quotes. It is also not possible to have a line break inside a field.
watch (bool) – Whether or not the source file or directory should be watched for changes. If this option is set to true, whenever you change the source, the changes will be reflected in the store.
array_sep (Optional[str]) – Delimiter to use for arrays. Setting it to a non-None value will parse all the columns containing this separator as arrays.
-
load_pandas
(dataframe, **kwargs)¶ Load a pandas DataFrame into this simulation.
The expected format for the DataFrame’s headers is: column_1, column_2, … , column_n, simulationName, simulationName_measure1_value, …
The value provided in the simulationName column is the name of the scenario you want to apply the value to.
- Parameters
dataframe (
DataFrame
) – The DataFrame to use.
-
property
scenarios
¶ Scenarios of the simulation.
- Return type
-
property
-
class
atoti.simulation.
SimulationScenarios
(_simulation)¶ Bases:
collections.abc.MutableMapping
,typing.Generic
Manage the scenarios of a simulation.
atoti.simulations module¶
Simulations.
-
class
atoti.simulations.
Simulations
(_java_api, _simulations=<factory>)¶ Bases:
collections.abc.MutableMapping
,typing.Generic
Manage the simulations.
atoti.store module¶
Store and related classes.
-
class
atoti.store.
Store
(_name, _java_api, _scenario='Base', _columns=<factory>)¶ Bases:
object
Represents a single store.
-
head
(n=5)¶ Return the n first rows of store as a pandas DataFrame.
- Parameters
n (
int
) – the number of rows to return.- Return type
DataFrame
- Returns
A pandas dataframe of the n first rows of the store.
-
insert_rows
(rows)¶ Insert a row into the store.
-
join
(other, mapping=None)¶ Define a reference between this store and another.
- There are two different situations possible when creating references:
All the key columns of the destination store are mapped: this is a normal reference.
Only some of the key columns of the destination store are mapped: this is a partial
reference.
In the first case, there are no requirements for the reference to be created.
In the second case, there are several requirements which must be met for the reference to work correctly:
The columns from the source store used in the mapping must be attached to hierarchies.
The un-mapped key columns of the destination store will be converted into hierarchies
in the cube.
Based on the creation mode you have chosen for your cube, creating the reference will generate different hierarchies and measures for your cube:
MANUAL: the un-mapped keys of the destination store will become hierarchies.
NO_MEASURES: all of the non-numeric columns from the destination store, as well as those
containing integers, will be converted into hierarchies. No measures will be created in this mode. - AUTO: in this mode, the same hierarchies will be created as in the NO_MEASURES mode. Additionaly, columns containing numeric values, or arrays, except for columns which contain only integers, will be converted into measures.
-
load_csv
(file_path, sep=None, encoding='utf-8', process_quotes=True, all_scenarios=False, truncate=False, watch=False, array_sep=None)¶ Load a CSV into this scenario.
- Parameters
file_path (Union[pathlib.Path, str]) – the path to the CSV file or directory. If the path points to a directory, all of the CSV files in the directory will be loaded into the store and, as such, are expected to have the same schema as the store. Files without the ‘.csv’ extension will be ignored.
sep (Optional[str]) – Delimiter to use. If sep is None, the separator will automatically be detected.
encoding (str) – Encoding to use for UTF when reading. Defaults to ‘utf-8’.
process_quotes (bool) –
Whether double quotes should be processed to follow the official CSV specification:
Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields
A double-quote appearing inside a field must be escaped by preceding it with another double quote
Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes
When setting this parameter to false, all double-quotes within a field will be treated as any regular character, to follow Excel behavior. CAREFUL: in this mode, it is expected that fields are NOT enclosed in double quotes. It is also not possible to have a line break inside a field.
all_scenarios (bool) – indicates if the data should be loaded into all the scenarios or not.
truncate (bool) – clear the store before loading the content of this CSV into it.
watch (bool) – Whether or not the source file or directory should be watched for changes. If this option is set to true, whenever you change the source, the changes will be reflected in the store. If the source is a directory, then any new files added with the ‘.csv’ extension will be loaded into the store. Files without the ‘.csv’ extension will be ignored.
array_sep (Optional[str]) – Delimiter to use for arrays. Setting it to a non-None value will parse all the columns containing this separator as arrays.
-
load_pandas
(dataframe, all_scenarios=False, truncate=False, **kwargs)¶ Load a pandas dataset into this scenario.
-
load_parquet
(file_path, all_scenarios=False, truncate=False, watch=False)¶ Load a Parquet into this scenario.
- Parameters
file_path (Union[pathlib.Path, str]) – The path to the Parquet file.
all_scenarios (bool) – indicates if the data should be loaded into all the scenarios or not.
truncate (bool) – clear the store before loading the content of this CSV into it.
watch (bool) – Watch the path to dynamically load any new files into the store
-
load_spark
(dataframe, all_scenarios=False, truncate=False)¶ Load a spark dataframe into this scenario.
- Parameters
-
property
scenarios
¶ All the scenarios the store can be on.
- Return type
-
atoti.stores module¶
Stores.
-
class
atoti.stores.
Stores
(_data)¶ Bases:
atoti._mappings.ImmutableMapping
Manage the stores.
atoti.types module¶
Atoti Types.
-
class
atoti.types.
AtotiType
(java_type, nullable)¶ Bases:
object
Atoti Type.
-
java_type
¶ The name of the associated Java literal type.
-
nullable
¶ Whether the Objects of this type can be null. Please note that elements within array types cannot be null and that this attribute therefore applies to an entire array object.
-
java_type
: str = None
-
nullable
: bool = None
-
-
atoti.types.
local_date
(format)¶ Create a date type with the given date format.
-
atoti.types.
local_date_time
(format)¶ Create a datetime type with the given Java datetime format.
Module contents¶
Atoti’s entrypoint.
-
atoti.
copy_tutorial
(path)¶ Copy the tutorial to a given path.
-
atoti.
create_session
(name='Unnamed', sampling_mode=SamplingMode(name='limit_lines', parameters=[10000]), port=None, max_memory=None, java_args=None, config=None, **kwargs)¶ Create a session.
- Parameters
name (
str
) – The name of the sessionsampling_mode (
SamplingMode
) – How files are loaded into the stores. It’s faster to build the data model when only part of the data is loaded. Other modes are available in the atoti.sampling module. If you didn’t use atoti.sampling.FULL, call Session.load_all_data() to load everything once you’re done defining your model.port (
Optional
[int
]) – The port on which the session will be exposed. Defaults to a random available port.max_memory (
Optional
[str
]) – The maximum amount of memory that can be used by the underlying session. It should be a user-readable string like ‘512M’ or ‘64G’.java_args (
Optional
[List
[str
]]) – Additional arguments to pass to the Java process.config (
Union
[SessionConfiguration
,Path
,str
,None
]) – The configuration of the session or the path to a configuration file.
- Return type
- Returns
The created session.
-
atoti.
open_query_session
(url, auth=None, name=None)¶ Join an existing session to query it.
- Parameters
- Return type
- Returns
The query session.
-
atoti.
abs
(measure)¶ Return a new measure equal to the absolute value of the input measure.
- Parameters
measure (
Measure
) – A measure.- Return type
Measure
- Returns
A new measure equal to the absolute value of the input measure.
-
atoti.
at
(measure, coordinates)¶ Take the value of the measure at some other coordinates in the cube.
Examples
This measure will return the value of the quantity for “France” on each member of the “Country” level:
atoti.at(m["Quantity"], {lvl["Country"]: "France"})
This measure will return the value of the quantity for the current value of the “Target Country” and “Target City” levels:
atoti.at(m["Quantity"], { lvl["Country"]: lvl["Target Country"], lvl["City"]: lvl["Target City"], })
- Parameters
- Returns
The measure at the given position.
-
atoti.
ceil
(measure)¶ Return the smallest value that is greater than or equal to the measure.
- Parameters
measure (
Measure
) – The measure to round.- Return type
Measure
- Returns
The rounded measure
-
atoti.
cos
(measure)¶ Return the cosinus of a measure.
- Parameters
measure (
Measure
) – A measure- Return type
Measure
- Returns
A new mesure on which cosinus function have been applied
-
atoti.
date_diff
(from_date, to_date, unit='days')¶ Return a measure that computes the difference between two date measures.
- Parameters
from_date (
Union
[Measure
,date
,datetime
]) – The first measure or date objectto_date (
Union
[Measure
,date
,datetime
]) – The second measure or date objectunit (
str
) – The difference unit. Allowed units are seconds, minutes, hours, days, weeks, months and years. Seconds, minutes and hours are only allowed if the dates contain time information.
- Retruns:
The date difference as a measure.
- Return type
CalculatedMeasure
-
atoti.
date_shift
(measure, on, time_offset, method='exact')¶ Create the a shifted measure.
The shifted measure at a given date uses the value of that same measure at another date, specified by the date string, of the form “xxDxxWxxMxxQxxY”. A shift method can be specified for custom behavior when no record corresponds to the target date: either fall back on the previous/following member, or interpolate the value between those members. Example with
m2 = atoti.date_shift("m1", on=h["date"], time_offset="1M", method="interpolate")
:+------------+----+-------+ | date | m1 | m2 | +------------+----+-------+ | 2000/01/05 | 15 | 10.79 | <-- linear interpolation of {2000/02/03, 10} and | | | | {2000/03/03, 21} for 2000/02/05 | 2000/02/03 | 10 | 21 | <-- exact match at 2000/03/03, no need to interpolate | 2000/03/03 | 21 | 9.73 | <-- linear interpolation of {2000/03/03, 21} and | | | | {2000/04/05, 9} for 2000/04/03 | 2000/04/05 | 9 | ∅ | <-- no record after 2000/04/05, cannot interpolate +------------+----+-------+
Currently supported aliases in shift strings: D, W, M, Q, Y. See https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases.
- Parameters
- Return type
DateShift
- Returns
The shifted measure.
-
atoti.
exp
(measure)¶ Return a new measure equal to the exponential value of the input measure.
Exponential is the Euler’s number e raised to the power of a double value.
- Parameters
measure (
Measure
) – A measure.- Return type
Measure
- Returns
A new measure equal to the exponential value of the input measure.
-
atoti.
filter
(measure, condition)¶ Return a filtered measure Object.
You can apply and combine several different types of conditions.
You can compare levels to levels:
lvl["source"] == lvl["destination"]
You can compare levels to litterals:
lvl["city"] == "Paris"
You can also combine these conditions together using the & operator:
(lvl["source"] == lvl["destination"]) & (lvl["city"] == "Paris")
Only the & operator is currently supported.
- Parameters
measure (
Measure
) – The measure to filter.condition (
Union
[LevelCondition
,MultiCondition
]) – The filtering expression
- Return type
Measure
- Returns
A filtered measure Object
-
atoti.
floor
(measure)¶ Return the largest value that is less than or equal to the measure.
- Parameters
measure (
Measure
) – The measure to round.- Return type
Measure
- Returns
The rounded measure
-
atoti.
log
(measure)¶ Return a new measure equal to the natural logarithm (base e) of the input measure.
- Parameters
measure (
Measure
) – A measure.- Return type
Measure
- Returns
A new measure equal to the natural logarithm (base e) of the input measure.
-
atoti.
log10
(measure)¶ Return a new measure equal to the base 10 logarithm of the input measure.
- Parameters
measure (
Measure
) – A measure.- Return type
Measure
- Returns
A new measure equal to the base 10 logarithm of the input measure.
-
atoti.
max
(*measures)¶ Return a new measure equal to the maximum of the input arguments.
- Parameters
measures (
Any
) – List of measures or scalar values.- Return type
Measure
- Returns
A new measure equal to the maximum of the input arguments.
-
atoti.
min
(*measures)¶ Return a new measure equal to the minimum of the input arguments.
- Parameters
measures (
Any
) – List of measures or scalar values.- Return type
Measure
- Returns
A new measure equal to the minimum of the input arguments.
-
atoti.
parent_value
(measure, on_hierarchies=None, top_value=None)¶ Create the a parent value measure.
- Parameters
- Returns
The parent value measure.
-
atoti.
pow
(measure, exponent)¶ Return a new measure equal to the power of the first measure by the second one.
-
atoti.
round
(measure)¶ Return the closest number to the measure.
- Parameters
measure (
Measure
) – The measure to round.- Return type
Measure
- Returns
The rounded measure
-
atoti.
shift
(measure, on, period=1)¶ Create the a shifted measure.
-
atoti.
sin
(measure)¶ Return the sinus of a measure.
- Parameters
measure (
Measure
) – A measure- Return type
Measure
- Returns
A new mesure on which sinus function have been applied
-
atoti.
sqrt
(measure)¶ Return a new measure the square root of the measure.
- Parameters
measure (
Measure
) – List of measures or scalar values.- Return type
Measure
- Returns
The suare rooted measure
-
atoti.
tan
(measure)¶ Return the tangent of a measure.
- Parameters
measure (
Measure
) – A measure- Return type
Measure
- Returns
A new mesure on which tangent function have been applied
-
atoti.
where
(condition, true_measure, false_measure=None)¶ Return a new measure with a conditional value.
This function is equivalent to an “if-then-else” statement. The new measure’s value depends on whether the condition is true of false when the measure is evaluated:
if the condition is true, the new measure will be equal to the “true measure”.
if the condition is false, the new measure will be equal to the “false measure”. If the “false measure” is None, the new measure’s value will be None wherever the condition is false
- Several types of conditions can be applied and combined.
measures can be compared to anything convertible into a measure:
m["Test"] == 20
levels can be compared to levels:
lvl["source"] == lvl["destination"]
levels can be compared to litterals:
lvl["city"] == "Paris"
These conditions can also be combined together using the & operator:
(m["Test"] == 20) & (lvl["city"] == "Paris")
Only the & operator is currently supported.
- Parameters
condition (
Union
[BooleanMeasure
,LevelCondition
,MultiCondition
]) – The condition to evaluate.true_measure (
Union
[date
,datetime
,int
,float
,str
,Measure
,MeasureConvertible
]) – The measure to return when the condition is true.false_measure (
Union
[date
,datetime
,int
,float
,str
,Measure
,MeasureConvertible
,None
]) – The measure to return when the condition is false. Defaults to None.
- Return type
Measure
- Returns
A new measure Object