atoti_directquery_databricks.ConnectionConfig#

final class atoti_directquery_databricks.ConnectionConfig#

Config to connect to a Databricks database.

To aggregate native Databrick arrays, UDAFs (User Defined Aggregation Functions) provided by ActiveViam must be registered on the cluster.

Native array aggregation is not supported on SQL warehouses.

Example

>>> import os
>>> from atoti_directquery_databricks import ConnectionConfig
>>> connection_config = ConnectionConfig(
...     url="jdbc:databricks://"
...     + os.environ["DATABRICKS_SERVER_HOSTNAME"]
...     + "/default;"
...     + "transportMode=http;"
...     + "ssl=1;"
...     + "httpPath="
...     + os.environ["DATABRICKS_HTTP_PATH"]
...     + ";"
...     + "AuthMech=3;"
...     + "UID=token;"
...     + "EnableArrow=0;",
...     password=os.environ["DATABRICKS_AUTH_TOKEN"],
... )
>>> external_database = session.connect_to_external_database(connection_config)
array_long_agg_function_name: str | None = None#

The name (if different from the default) of the UDAF performing atoti.agg.long() on native arrays.

Note

This function must be defined in Databricks and accessible to the role running the queries.

array_short_agg_function_name: str | None = None#

The name (if different from the default) of the UDAF performing atoti.agg.short() on native arrays.

Note

This function must be defined in Databricks and accessible to the role running the queries.

array_sum_agg_function_name: str | None = None#

The name (if different from the default) of the UDAF performing atoti.agg.sum() on native arrays.

Note

This function must be defined in Databricks and accessible to the role running the queries.

array_sum_product_agg_function_name: str | None = None#

The name (if different from the default) of the UDAF performing atoti.agg.sum_product() on native arrays.

Note

This function must be defined in Databricks and accessible to the role running the queries.

auto_multi_column_array_conversion: AutoMultiColumnArrayConversion | None = None#

When not None, multi-column array conversion will be performed automatically.

column_clustered_queries: 'all' | 'feeding' = 'feeding'#

Control which queries will use clustering columns.

feeding_query_timeout: Duration = datetime.timedelta(seconds=3600)#

Timeout for queries performed on the external database during feeding phases.

The feeding phases are:

feeding_url: str | None = None#

When not None, this JDBC connection string will be used instead of url for the feeding phases.

lookup_mode: 'allow' | 'warn' | 'deny' = 'warn'#

Whether lookup queries on the external database are allowed.

Lookup can be very slow and expensive as the database may not enforce primary keys.

max_sub_queries: Annotated[int, Field(gt=0)] = 500#

Maximum number of sub queries performed when splitting a query into multi-step queries.

password: str | None = None#

The password to connect to the database.

Passing it in this separate attribute prevents it from being logged alongside the connection string.

If None, a password is expected to be present in url.

query_timeout: Duration = datetime.timedelta(seconds=300)#

Timeout for queries performed on the external database outside feeding phases.

time_travel: Literal[False, 'lax', 'strict'] = 'strict'#

How to use Databricks’ time travel feature.

Databricks does not support time travel with views, so the options are:

  • False: tables and views are queried on the latest state of the database.

  • "lax": tables are queried with time travel but views are queried without it.

  • "strict": tables are queried with time travel and querying a view raises an error.

url: str#

The JDBC connection string.