Working with other libraries¶
Working with Spark¶
When working with Spark, you need to configure Java correctly as Spark only supports Java 8 while atoti requires Java 11.
Setup Java version¶
To work with both atoti and Spark, you will need to have both Java 11 and 8 installed on your system, and make sure that each library uses the correct version.
As Java 8 will soon be deprecated, we recommend using Java 11 as your default Java installation. Below are two ways to provide the required Java version to Spark.
Setup JAVA_HOME directly inside Python¶
This is not an elegant way of doing it, but it is the easiest: modify the Java version in the environment when starting the Spark session:
import os # First modify the env to point to Java 8 previous_java_home = os.environ["JAVA_HOME"] os.environ["JAVA_HOME"] = "path/to/java8" # Start the Spark session spark = SparkSession.builder.appName("Demo").getOrCreate() # Set the env variable back to its initial value os.environ["JAVA_HOME"] = previous_java_home
Using standalone Spark¶
Pyspark’s main purpose is to connect to another Spark instance. One solution is to install a standalone Spark, configure it and then use it from PySpark:
Install Spark standalone and pyspark (same version)
SPARK_HOMEenvironment variable to your Spark standalone version (pyspark will now use it)