How to Run a Spark File from Spark Shell: A Step-by-Step Guide

Running a Spark file from Spark Shell provides a convenient way to develop and test Spark applications interactively. In this guide, we explore how to do this with detailed steps and sample code snippets. Here, we focus on PySpark, but the process is similar for Scala and other languages supported by Spark.

Contents hide

1 Step-by-Step Guide to Running a Spark File from Spark Shell

1.1 1. Installation Prerequisites

1.2 2. Start the Spark Shell

1.3 3. Create a Spark Script File

1.4 4. Run the Script File from the Spark Shell

1.5 5. Running a Scala Script in Spark Shell

2 About Editorial Team

3 You Might Also Like:

Step-by-Step Guide to Running a Spark File from Spark Shell

1. Installation Prerequisites

Before starting, ensure you have the following installations:

Java Development Kit (JDK)
Apache Spark
PySpark (for Python users)

2. Start the Spark Shell

To start the Spark Shell, navigate to your Spark installation directory and execute the following command:


./bin/spark-shell

If you are using PySpark, use the following command:


./bin/pyspark

3. Create a Spark Script File

Create a script file with the desired Spark operations. For example, create a file named `example.py` with the following content:


# example.py
from pyspark.sql import SparkSession

# Create a Spark Session
spark = SparkSession.builder.appName("ExampleApp").getOrCreate()

# Create a DataFrame
data = [("Alice", 28), ("Bob", 35), ("Cathy", 30)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)

# Show DataFrame
df.show()

4. Run the Script File from the Spark Shell

Use the `exec(open())` function in Python to run the script file. Start the PySpark shell with:


./bin/pyspark

Once you’re in the PySpark shell, execute the script file with:


exec(open('path/to/your/example.py').read())

Ensure to replace `path/to/your/example.py` with the actual path to your script file. You should see an output similar to the following:


+-----+---+
| Name|Age|
+-----+---+
|Alice| 28|
|  Bob| 35|
|Cathy| 30|
+-----+---+

5. Running a Scala Script in Spark Shell

For Scala users, create a script file named `example.scala` with the following content:


// example.scala
import org.apache.spark.sql.SparkSession

// Create a Spark Session
val spark = SparkSession.builder.appName("ExampleApp").getOrCreate()

// Create a DataFrame
val data = Seq(("Alice", 28), ("Bob", 35), ("Cathy", 30))
val columns = Seq("Name", "Age")
val df = spark.createDataFrame(data).toDF(columns: _*)

// Show DataFrame
df.show()

Run the script in the Spark Shell using:


./bin/spark-shell -i path/to/your/example.scala

Again, replace `path/to/your/example.scala` with the actual path to your script file. The output will look like this:


+-----+---+
| Name|Age|
+-----+---+
|Alice| 28|
|  Bob| 35|
|Cathy| 30|
+-----+---+

This guide has walked through the process of running a Spark file from the Spark Shell step-by-step, using both PySpark and Scala as examples. Following these steps, you can easily test and develop Spark applications interactively.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts who are highly skilled in Apache Spark, PySpark, and Machine Learning. They are also proficient in Python, Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They aren't just experts; they are passionate teachers. They are dedicated to making complex data concepts easy to understand through engaging and simple tutorials with examples.