Working with other libraries

Working with Spark

When working with Spark, you need to configure Java correctly as Spark only supports Java 8 while atoti requires Java 11.

Setup Java version

To work with both atoti and Spark, you will need to have both Java 11 and 8 installed on your system, and make sure that each library uses the correct version.

As Java 8 will soon be deprecated, we recommend using Java 11 as your default Java installation. Below are two ways to provide the required Java version to Spark.

Setup JAVA_HOME directly inside Python

This is not an elegant way of doing it, but it is the easiest: modify the Java version in the environment when starting the Spark session:

import os

# First modify the env to point to Java 8
previous_java_home = os.environ["JAVA_HOME"]
os.environ["JAVA_HOME"] = "path/to/java8"

# Start the Spark session
spark = SparkSession.builder.appName("Demo").getOrCreate()

# Set the env variable back to its initial value
os.environ["JAVA_HOME"] = previous_java_home

Using standalone Spark

Pyspark’s main purpose is to connect to another Spark instance. One solution is to install a standalone Spark, configure it and then use it from PySpark:

  • Install Spark standalone and pyspark (same version)

  • Set your SPARK_HOME environment variable to your Spark standalone version (pyspark will now use it)

  • In your $SPARK_HOME/conf/ set JAVA_HOME=/path/to/java8