0.6.0 (2021-07-20)¶

Highlights:

The main theme of this release is simplification.

From the session configuration to simulations through data loading, many aspects of the API have been revised. The goal was to increase the API’s intuitiveness and consistency, in turn making atoti easier to learn and use.

This release also comes with breaking changes, most of them being the result of:

Naming standardization (e.g. Table instead of Store).
Removal of non atoti specific features which can be replicated just as well using standard Python functions or popular libraries (e.g. data sampling and file watching).

A lot of performance improvement work also happened behind the scenes. In particular, creating and joining tables and defining new measures will be quicker than before.

Added¶

session.Session.link() and query.session.QuerySession.link() to display a link to the session in JupyterLab.
UserContentStorageConfig to back the user content storage with a remote database (#277).
Connect with Excel and Watch local files how-tos.

User interface¶

Filters can be saved.

Like dashboards, filters are saved in the user content storage.
Saved dashboards listed in the app’s home page show a thumbnail instead of a blank card.
Ability to duplicate a dashboard page by right-clicking on its tab.

Changed¶

All the changes are BREAKING.

Config¶

create_session()’s config parameter expects a plain Python object following the structure of SessionConfig rather than an object created with config.create_config() or a path to a config file.
```
  session = tt.create_session(
-   config=tt.config.create_config(port=9090)
+   config={"port": 9090}
  )
```
The metadata_db config parameter has been renamed user_content_storage. Existing metadata.mv.db files must be renamed content.mv.db.
Other options have been changed:
- default_locale and i18n_directory have been regrouped under i18n.
- java_args has been renamed java_options.
- jwt_key_pair has been renamed jwt.
- max_memory has been removed. Pass an -Xmx option to java_options instead.

Data loading¶

The store.Store class has been renamed table.Table. Same thing for the session.Session.stores property which has become session.Session.tables, the store_name parameter which has become table_name, and session.Session.create_store() which has become session.Session.create_table().
int and long table columns, unless they are keys, automatically become measures instead of levels. With this change, all the numeric columns behave the same.
type.INT_ARRAY and other array types have become non-nullable, use type.NULLABLE_INT_ARRAY and the other nullable array types instead.

table.Table.load_pandas() and table.Table.append() do not automatically infer date types. date or datetime objects must be used.

- table.append("2020-02-01", "France", "Paris", "id-113", 111.)
+ table.append(datetime.date(2020, 2, 1), "France, "Paris", "id-113", 111.)

session.Session.read_csv() and table.Table.load_csv()’s sep and array_sep parameters have been renamed separator and array_separator.
session.Session.read_csv() and session.Session.read_parquet()’s table_name parameter is required when the path argument is a glob pattern.
session.Session.read_csv() and table.Table.load_csv()’s path parameter no longer accepts a directory. Use a glob instead.
session.read_csv( - path="path/to/sales/", + path="path/to/sales/*.csv", table_name="Sales" )
When passing a directory to session.Session.read_parquet() and table.Table.load_parquet()’s path parameter, only Parquet files in this directory will be loaded, not the ones in possible subdirectories. Use a glob to load Parquet files in subdirectories.
session.read_parquet( - path="path/to/sales/", + path="path/to/sales/**/*.parquet", table_name="Sales" )
session.Session.read_sql() and table.Table.load_sql() expect the username and password to be in the connection string passed to the url parameter instead of being passed as dedicated parameters. The url parameter has also been made keyword-only.
```
  table.load_sql(
-   "h2:file:/file/path",
    "SELECT * FROM MYTABLE;",
+   url="h2:file:/file/path;USER=username;PASSWORD=passwd",
-   username="username",
-   password="passwd"
  )
```
session.Session.read_numpy() (array, columns, store_name, *, keys=None, in_all_scenarios=True, partitioning=None, hierarchized_columns=None, **kwargs) -> (array, *, columns, table_name, keys=None, partitioning=None, hierarchized_columns=None, **kwargs).
```
  session.read_numpy(
-   np_array, ["Id", "Country", "City", "Price"], "Prices",
+   np_array, columns=["Id", "Country", "City", "Price"], table_name="Prices",
  )
```
session.Session.start_transaction() () -> (scenario_name="Base"). Transactions only accept loading operations which impact the scenario they are started on.

Querying¶

The levels parameter of cube.Cube.query() and query.cube.QueryCube.query() does not accept a single level anymore. If a value is passed, it must be a sequence of levels. No more choice overload.
```
  cube.query(
    m["contributors.COUNT"],
+   levels=l["Product"]
-   levels=[l["Product"]]
  )
```
Searching for the regular expression levels=([^[][^,)]+) and replacing it with levels=[$1] can help adapting most occurrences of the single level calls to the new syntax.

Passing no measures to cube.Cube.query() and query.cube.QueryCube.query() queries no measures instead of querying all visible measures (#220).

  # Query all visible measures on all products:
  cube.query(
+   *[measure for measure in cube.measures.values() if measure.visible],
    levels=[l["Product"]]
  )

Simulations¶

cube.Cube.setup_simulation() has been replaced with cube.Cube.create_parameter_simulation(). Instead of intercepting the regular aggregation flow of an existing measure like setup_simulation(), create_parameter_simulation() creates a new parameter measure that can be used to define new measures or redefine existing ones. Levels created by setup_simulation() were in the Measure simulations dimension but the levels created by create_parameter_simulation() have the same name, hierarchy name, and dimension name as the simulation.

  turnover = tt.agg.sum(table["Unit price"] * table["Quantity"])
- m["Turnover"] = turnover
- simulation = cube.setup_simulation(
-   "Country Simulation",
-   levels=[l["Country"]],
-   multiply=[m["Turnover"]]
- )
+ simulation = cube.create_parameter_simulation(
+   "Country Simulation",
+   measure_name="Country parameter",
+   default_value=1,
+   levels=[l["Country"]]
+ )
+ m["Turnover"] = tt.agg.sum(
+   turnover * m["Country parameter"],
+   scope=tt.scope.origin(l["Country"])
+ )
- simulation.scenarios["France boost"] += ("France", 1.15)
+ simulation += ("France boost", "France", 1.15)
  cube.query(
    m["Turnover"],
    levels=[l["Country Simulation"], l["Country"]]
  )

Other¶

column.Column are not automatically converted into measures. Columns and measures cannot be used together in calculations without first converting the column into a measure. value() can be used to convert a table column to a measure.
```
- m["Quantity.VALUE"] = table["Quantity"]
+ m["Quantity.VALUE"] = tt.value(table["Quantity"])
- m["Final Price"] = m["Unit Price"] * table["Rate"]
+ m["Final Price"] = m["Unit Price"] * tt.value(table["Rate"])
```
experimental.create_date_hierarchy() (name, cube, column, *, levels={'Day': 'd', 'Month': 'M', 'Year', 'Y'}) -> (name, *, cube, column, levels={'Day': 'd', 'Month': 'M', 'Year': 'y'}).
cube.Cube.create_store_column_parameter_hierarchy() has been renamed cube.Cube.create_parameter_hierarchy_from_column(). The created hierarchy is slicing by default.

cube.Cube.create_static_parameter_hierarchy(name, members, *, indices=None, data_type=None, index_measure=None) has been changed to cube.Cube.create_parameter_hierarchy_from_members() and (name, members, *, data_type=None, index_measure_name=None). The sorted() function can be used as a replacement of the indices parameter to change the order of the members.

- cube.create_static_parameter_hierarchy(
+ cube.create_parameter_hierarchy_from_members(
    "Date",
-   ["2020/01/30", "2020/02/27", "2020/03/30", "2020/04/30"],
-   store_name="Date",
-   indices=[3, 2, 1, 0],
+   sorted(["2020/01/30", "2020/02/27", "2020/03/30", "2020/04/30"], reverse=True),
-   index_measure="Date Index",
+   index_measure_name="Date Index",
  )

parent_value()’s on parameter has been removed in favor of degrees which has become required.

  tt.parent_value(
    m["Price.SUM"],
-   on=h["Date"],
+   degrees={h["Date"]: 1}
  )

date_diff()’s following method has been renamed next.

  tt.date_shift(
    m["Price.SUM"],
    on=h["Date"],
    offset="1D",
-   method="following"
+   method="next"
  )

date_shift() (measure, on, offset, *, method='exact') -> (measure, on, *, offset, method='exact').
rank() (measure, hierarchy, ascending=True, apply_filters=True) -> (measure, hierarchy, *, ascending=True, apply_filters=True).
measure.Measure has been renamed measure_description.MeasureDescription and named_measure.NamedMeasure has been renamed measure.Measure.
The REST API for tables only grants write access to users with the ROLE_ADMIN and the configured role restrictions restrictions are used to limit which lines each user can read (#270). The REST API does not allow modification of tables created with the cube.Cube.create_parameter_simulation() and cube.Cube.create_parameter_hierarchy_from_members().
comparator.ASC has become comparator.ASCENDING and comparator.DESC has become comparator.DESCENDING.

Deprecated¶

Support for remote content servers. Configure the user content storage with a JDBC url instead.

Removed¶

All the removals are BREAKING.

Config¶

Global configuration mechanism. Only the object passed to create_session()’s config parameter will be used.
cache_cloud_files config parameter.
accent_color and frame_color branding parameters.

Data loading¶

Watching CSV and Parquet files. This removes the watch parameter of load_*() and read_*() methods. The Watch local files how-to shows an alternative.
Sampling of data loading operations. This removes the session.Session.load_all_data() method and the sampling_mode parameter of config.create_config() as well as methods such as session.Session.read_csv() and session.Session.read_parquet().

The recommended alternative for projects loading large amount of data into their session is to:
1. Extract a smaller and meaningful dataset and work with it when iterating on the declaration of the data model.
2. Perform the large data loading operations after the tables and cubes of the sessions have reached their final structures.

truncate parameter of table.Table’s loading methods. The same behavior can be replicated by calling table.Table.drop() first.

- store.load_pandas(df, truncate=True)
+ with session.start_transaction():
+     table.drop()
+     table.load_pandas(df)

in_all_scenarios parameter from all methods of the API.
- session.Session.read_csv() and all other session.Session.read_*() methods only load data in the Base scenario.
- table.Table.load_csv() and all other table.Table.load_*() methods only load data in the current scenario of the table.
- Data loaded into parameter tables created by cube.Cube.create_parameter_simulation() or cube.Cube.create_parameter_hierarchy_from_members() is always loaded into all the existing scenarios.

store.StoreScenarios.load_csv().

  base_directory = Path("path/to/base/directory")
- store.scenarios.load_csv(base_directory)
+ for scenario_path in base_directory.iterdir():
+     if scenario_path.is_dir():
+         table.scenarios[scenario_path.name].load_csv(f"str(scenario_path.resolve())/**.csv")

This can be combined with the Watch local files how-to to generate new scenarios in real-time.

Simulations¶

Measure simulations and Source simulation editors in the app.
table.Table’s source_simulation_enabled property since it was only used by the Source simulation editor.

Other¶

session.Session.url and session.Session.excel_url, dropping the need for config.create_config()’s url_pattern parameter (#260).

There is no reliable way for the session to know if it is exposed through a reverse proxy, a load balancer, a Docker container, or anything else that would make the public URL used to access the session and the hostname/IP of the machine hosting the session different. In these situations, url_pattern had to be defined to prevent session.Session.url and session.Session.excel_url from being unusable. However, it is simpler to build the session URL directly from the known network setup and session.Session.port than using url_pattern.
- In JupyterLab, session.Session.link() can be used instead, even when Jupyter Server is not running locally.
- For a session running locally, session.Session.url can be replaced with f"http://localhost:{session.port}".
- The Connect with Excel guide shows how to connect Excel without session.Session.excel_url.
hierarchy.Hierarchy.name can no longer be changed to rename the hierarchy, see Hierarchies for information on renaming hierarchies.

table.Table.shape.

- table.shape
+ {"columns": len(table.columns), "rows": len(table)}

Support for Python 3.7.0. Python versions >= 3.7.1 are supported.
Following deprecated functions and methods:
- agg._single_value().
- agg._stop().
- config.create_role(), config.create_basic_user(), and config.create_kerberos_user(). Use session.Session.security instead.
- Session.logs_tail().
```
- lines = session.logs_tail(n=10)
+ with open(session.logs_path) as logs:
+   lines = logs.readlines()[-10:]
```

Fixed¶

Measure names including , are rejected instead of silently renamed (#271).
Measures combining scope.cumulative() and shift() (#295).
value() when no name was specified for the created measure (#281 and #287).
Handling of glob patterns containing parentheses (#285).
hierarchized_columns of joined tables not taken into account (#303).
Handling of multiline strings in dataframes passed to session.Session.read_pandas() and table.Table.load_pandas() (#186).
Handling of null arrays when aggregating table columns.
Errors occurring inside transactions leaving the session in an unstable state.