Working with other libraries¶
Working with Spark¶
When working with Spark, you need to configure Java correctly as Spark only supports Java 8 while atoti requires Java 11.
Setup Java version¶
To work with both atoti and Spark, you will need to have both Java 11 and 8 installed on your system, and make sure that each library uses the correct version.
As Java 8 will soon be deprecated, we recommend using Java 11 as your default Java installation. Below are two ways to provide the required Java version to Spark.
Setup JAVA_HOME directly inside Python¶
This is not an elegant way of doing it, but it is the easiest: modify the Java version in the environment when starting the Spark session:
import os
# First modify the env to point to Java 8
previous_java_home = os.environ["JAVA_HOME"]
os.environ["JAVA_HOME"] = "path/to/java8"
# Start the Spark session
spark = SparkSession.builder.appName("Demo").getOrCreate()
# Set the env variable back to its initial value
os.environ["JAVA_HOME"] = previous_java_home
Using standalone Spark¶
Pyspark’s main purpose is to connect to another Spark instance. One solution is to install a standalone Spark, configure it and then use it from PySpark:
Install Spark standalone and pyspark (same version)
Set your
SPARK_HOME
environment variable to your Spark standalone version (pyspark will now use it)In your
$SPARK_HOME/conf/spark-env.sh
setJAVA_HOME=/path/to/java8