0.6.0 (2021-07-20)¶
Highlights:
The main theme of this release is simplification.
From the session configuration to simulations through data loading, many aspects of the API have been revised. The goal was to increase the API’s intuitiveness and consistency, in turn making atoti easier to learn and use.
This release also comes with breaking changes, most of them being the result of:
Naming standardization (e.g. Table instead of Store).
Removal of non atoti specific features which can be replicated just as well using standard Python functions or popular libraries (e.g. data sampling and file watching).
A lot of performance improvement work also happened behind the scenes. In particular, creating and joining tables and defining new measures will be quicker than before.
Added¶
session.Session.link()
andquery.session.QuerySession.link()
to display a link to the session in JupyterLab.UserContentStorageConfig
to back the user content storage with a remote database.Connect with Excel and Watch local files how-tos.
User interface¶
Filters can be saved.
Like dashboards, filters are saved in the
user content storage
.Saved dashboards listed in the app’s home page show a thumbnail instead of a blank card.
Ability to duplicate a dashboard page by right-clicking on its tab.
Changed¶
All the changes are BREAKING.
Config¶
create_session()
’s config parameter expects a plain Python object following the structure ofSessionConfig
rather than an object created withconfig.create_config()
or a path to a config file.session = tt.create_session( - config=tt.config.create_config(port=9090) + config={"port": 9090} )
The metadata_db config parameter has been renamed
user_content_storage
. Existingmetadata.mv.db
files must be renamedcontent.mv.db
.
Data loading¶
The
store.Store
class has been renamedtable.Table
. Same thing for thesession.Session.stores
property which has becomesession.Session.tables
, the store_name parameter which has become table_name, andsession.Session.create_store()
which has becomesession.Session.create_table()
.int
andlong
table columns, unless they arekeys
, automatically become measures instead of levels. With this change, all the numeric columns behave the same.type.INT_ARRAY
and other array types have become non-nullable, usetype.NULLABLE_INT_ARRAY
and the other nullable array types instead.table.Table.load_pandas()
andtable.Table.append()
do not automatically infer date types.date
ordatetime
objects must be used.- table.append("2020-02-01", "France", "Paris", "id-113", 111.) + table.append(datetime.date(2020, 2, 1), "France, "Paris", "id-113", 111.)
session.Session.read_csv()
andtable.Table.load_csv()
’s sep and array_sep parameters have been renamed separator and array_separator.session.Session.read_csv()
andsession.Session.read_parquet()
’s table_name parameter is required when the path argument is a glob pattern.session.Session.read_csv()
andtable.Table.load_csv()
’s path parameter no longer accepts a directory. Use a glob instead.session.read_csv( - path="path/to/sales/", + path="path/to/sales/*.csv", table_name="Sales" )
When passing a directory to
session.Session.read_parquet()
andtable.Table.load_parquet()
’s path parameter, only Parquet files in this directory will be loaded, not the ones in possible subdirectories. Use a glob to load Parquet files in subdirectories.session.read_parquet( - path="path/to/sales/", + path="path/to/sales/**/*.parquet", table_name="Sales" )
session.Session.read_sql()
andtable.Table.load_sql()
expect the username and password to be in the connection string passed to the url parameter instead of being passed as dedicated parameters. The url parameter has also been made keyword-only.table.load_sql( - "h2:file:/file/path", "SELECT * FROM MYTABLE;", + url="h2:file:/file/path;USER=username;PASSWORD=passwd", - username="username", - password="passwd" )
session.Session.read_numpy()
(array, columns, store_name, *, keys=None, in_all_scenarios=True, partitioning=None, hierarchized_columns=None, **kwargs)
->(array, *, columns, table_name, keys=None, partitioning=None, hierarchized_columns=None, **kwargs)
.session.read_numpy( - np_array, ["Id", "Country", "City", "Price"], "Prices", + np_array, columns=["Id", "Country", "City", "Price"], table_name="Prices", )
session.Session.start_transaction()
()
->(scenario_name="Base")
. Transactions only accept loading operations which impact the scenario they are started on.
Querying¶
The levels parameter of
cube.Cube.query()
andquery.cube.QueryCube.query()
does not accept a single level anymore. If a value is passed, it must be a sequence of levels. No more choice overload.cube.query( m["contributors.COUNT"], + levels=l["Product"] - levels=[l["Product"]] )
Searching for the regular expression
levels=([^[][^,)]+)
and replacing it withlevels=[$1]
can help adapting most occurrences of the single level calls to the new syntax.Passing no measures to
cube.Cube.query()
andquery.cube.QueryCube.query()
queries no measures instead of querying all visible measures (#220).# Query all visible measures on all products: cube.query( + *[measure for measure in cube.measures.values() if measure.visible], levels=[l["Product"]] )
Simulations¶
cube.Cube.setup_simulation()
has been replaced withcube.Cube.create_parameter_simulation()
. Instead of intercepting the regular aggregation flow of an existing measure likesetup_simulation()
,create_parameter_simulation()
creates a new parameter measure that can be used to define new measures or redefine existing ones. Levels created bysetup_simulation()
were in the Measure simulations dimension but the levels created bycreate_parameter_simulation()
have the same name, hierarchy name, and dimension name as the simulation.turnover = tt.agg.sum(table["Unit price"] * table["Quantity"]) - m["Turnover"] = turnover - simulation = cube.setup_simulation( - "Country Simulation", - levels=[l["Country"]], - multiply=[m["Turnover"]] - ) + simulation = cube.create_parameter_simulation( + "Country Simulation", + measure_name="Country parameter", + default_value=1, + levels=[l["Country"]] + ) + m["Turnover"] = tt.agg.sum( + turnover * m["Country parameter"], + scope=tt.scope.origin(l["Country"]) + ) - simulation.scenarios["France boost"] += ("France", 1.15) + simulation += ("France boost", "France", 1.15) cube.query( m["Turnover"], levels=[l["Country Simulation"], l["Country"]] )
Other¶
column.Column
are not automatically converted into measures. Columns and measures cannot be used together in calculations without first converting the column into a measure.value()
can be used to convert a table column to a measure.- m["Quantity.VALUE"] = table["Quantity"] + m["Quantity.VALUE"] = tt.value(table["Quantity"]) - m["Final Price"] = m["Unit Price"] * table["Rate"] + m["Final Price"] = m["Unit Price"] * tt.value(table["Rate"])
experimental.create_date_hierarchy()
(name, cube, column, *, levels={'Day': 'd', 'Month': 'M', 'Year', 'Y'})
->(name, *, cube, column, levels={'Day': 'd', 'Month': 'M', 'Year': 'y'})
.cube.Cube.create_store_column_parameter_hierarchy()
has been renamedcube.Cube.create_parameter_hierarchy_from_column()
. The created hierarchy isslicing
by default.cube.Cube.create_static_parameter_hierarchy(name, members, *, indices=None, data_type=None, index_measure=None)
has been changed tocube.Cube.create_parameter_hierarchy_from_members()
and(name, members, *, data_type=None, index_measure_name=None)
. Thesorted()
function can be used as a replacement of the indices parameter to change the order of the members.- cube.create_static_parameter_hierarchy( + cube.create_parameter_hierarchy_from_members( "Date", - ["2020/01/30", "2020/02/27", "2020/03/30", "2020/04/30"], - store_name="Date", - indices=[3, 2, 1, 0], + sorted(["2020/01/30", "2020/02/27", "2020/03/30", "2020/04/30"], reverse=True), - index_measure="Date Index", + index_measure_name="Date Index", )
parent_value()
’s on parameter has been removed in favor of degrees which has become required.tt.parent_value( m["Price.SUM"], - on=h["Date"], + degrees={h["Date"]: 1} )
date_diff()
’sfollowing
method has been renamednext
.tt.date_shift( m["Price.SUM"], on=h["Date"], offset="1D", - method="following" + method="next" )
date_shift()
(measure, on, offset, *, method='exact')
->(measure, on, *, offset, method='exact')
.rank()
(measure, hierarchy, ascending=True, apply_filters=True)
->(measure, hierarchy, *, ascending=True, apply_filters=True)
.measure.Measure
has been renamedmeasure_description.MeasureDescription
andnamed_measure.NamedMeasure
has been renamedmeasure.Measure
.The REST API for tables only grants write access to users with the ROLE_ADMIN and the configured role restrictions
restrictions
are used to limit which lines each user can read (#270). The REST API does not allow modification of tables created with thecube.Cube.create_parameter_simulation()
andcube.Cube.create_parameter_hierarchy_from_members()
.comparator.ASC
has becomecomparator.ASCENDING
andcomparator.DESC
has becomecomparator.DESCENDING
.
Removed¶
All the removals are BREAKING.
Config¶
Global configuration mechanism. Only the object passed to
create_session()
’s config parameter will be used.cache_cloud_files config parameter.
accent_color and frame_color branding parameters.
Data loading¶
Watching CSV and Parquet files. This removes the watch parameter of
load_*()
andread_*()
methods. The Watch local files how-to shows an alternative.Sampling of data loading operations. This removes the
session.Session.load_all_data()
method and the sampling_mode parameter ofconfig.create_config()
as well as methods such assession.Session.read_csv()
andsession.Session.read_parquet()
.The recommended alternative for projects loading large amount of data into their session is to:
Extract a smaller and meaningful dataset and work with it when iterating on the declaration of the data model.
Perform the large data loading operations after the tables and cubes of the sessions have reached their final structures.
truncate parameter of
table.Table
’s loading methods. The same behavior can be replicated by callingtable.Table.drop()
first.- store.load_pandas(df, truncate=True) + with session.start_transaction(): + table.drop() + table.load_pandas(df)
in_all_scenarios parameter from all methods of the API.
session.Session.read_csv()
and all othersession.Session.read_*()
methods only load data in the Base scenario.table.Table.load_csv()
and all othertable.Table.load_*()
methods only load data in the current scenario of the table.Data loaded into parameter tables created by
cube.Cube.create_parameter_simulation()
orcube.Cube.create_parameter_hierarchy_from_members()
is always loaded into all the existing scenarios.
store.StoreScenarios.load_csv()
.base_directory = Path("path/to/base/directory") - store.scenarios.load_csv(base_directory) + for scenario_path in base_directory.iterdir(): + if scenario_path.is_dir(): + table.scenarios[scenario_path.name].load_csv(f"str(scenario_path.resolve())/**.csv")
This can be combined with the Watch local files how-to to generate new scenarios in real-time.
Simulations¶
Measure simulations and Source simulation editors in the app.
table.Table
’ssource_simulation_enabled
property since it was only used by the Source simulation editor.
Other¶
session.Session.url
andsession.Session.excel_url
, dropping the need forconfig.create_config()
’s url_pattern parameter (#260).There is no reliable way for the session to know if it is exposed through a reverse proxy, a load balancer, a Docker container, or anything else that would make the public URL used to access the session and the hostname/IP of the machine hosting the session different. In these situations, url_pattern had to be defined to prevent
session.Session.url
andsession.Session.excel_url
from being unusable. However, it is simpler to build the session URL directly from the known network setup andsession.Session.port
than using url_pattern.In JupyterLab,
session.Session.link()
can be used instead, even when Jupyter Server is not running locally.For a session running locally,
session.Session.url
can be replaced withf"http://localhost:{session.port}"
.The Connect with Excel guide shows how to connect Excel without
session.Session.excel_url
.
hierarchy.Hierarchy.name
can no longer be changed to rename the hierarchy, seeHierarchies
for information on renaming hierarchies.table.Table.shape
.- table.shape + {"columns": len(table.columns), "rows": len(table)}
Support for Python 3.7.0. Python versions >= 3.7.1 are supported.
Following deprecated functions and methods:
agg._single_value()
.agg._stop()
.config.create_role()
,config.create_basic_user()
, andconfig.create_kerberos_user()
. Usesession.Session.security
instead.Session.logs_tail()
.- lines = session.logs_tail(n=10) + with open(session.logs_path) as logs: + lines = logs.readlines()[-10:]
Fixed¶
Measure names including
,
are rejected instead of silently renamed (#271).Measures combining
scope.cumulative()
andshift()
(#295).value()
when no name was specified for the created measure (#281 and #287).Handling of glob patterns containing parentheses (#285).
hierarchized_columns of joined tables not taken into account (#303).
Handling of null arrays when aggregating table columns.
Errors occurring inside transactions leaving the session in an unstable state.