How to Run a Spark File from Spark Shell: A Step-by-Step Guide

Running a Spark file from Spark Shell provides a convenient way to develop and test Spark applications interactively. In this guide, we explore how to do this with detailed steps and sample code snippets. Here, we focus on PySpark, but the process is similar for Scala and other languages supported by Spark.

Step-by-Step Guide to Running a Spark File from Spark Shell

1. Installation Prerequisites

Before starting, ensure you have the following installations:

  • Java Development Kit (JDK)
  • Apache Spark
  • PySpark (for Python users)

2. Start the Spark Shell

To start the Spark Shell, navigate to your Spark installation directory and execute the following command:


./bin/spark-shell

If you are using PySpark, use the following command:


./bin/pyspark

3. Create a Spark Script File

Create a script file with the desired Spark operations. For example, create a file named `example.py` with the following content:


# example.py
from pyspark.sql import SparkSession

# Create a Spark Session
spark = SparkSession.builder.appName("ExampleApp").getOrCreate()

# Create a DataFrame
data = [("Alice", 28), ("Bob", 35), ("Cathy", 30)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, columns)

# Show DataFrame
df.show()

4. Run the Script File from the Spark Shell

Use the `exec(open())` function in Python to run the script file. Start the PySpark shell with:


./bin/pyspark

Once you’re in the PySpark shell, execute the script file with:


exec(open('path/to/your/example.py').read())

Ensure to replace `path/to/your/example.py` with the actual path to your script file. You should see an output similar to the following:


+-----+---+
| Name|Age|
+-----+---+
|Alice| 28|
|  Bob| 35|
|Cathy| 30|
+-----+---+

5. Running a Scala Script in Spark Shell

For Scala users, create a script file named `example.scala` with the following content:


// example.scala
import org.apache.spark.sql.SparkSession

// Create a Spark Session
val spark = SparkSession.builder.appName("ExampleApp").getOrCreate()

// Create a DataFrame
val data = Seq(("Alice", 28), ("Bob", 35), ("Cathy", 30))
val columns = Seq("Name", "Age")
val df = spark.createDataFrame(data).toDF(columns: _*)

// Show DataFrame
df.show()

Run the script in the Spark Shell using:


./bin/spark-shell -i path/to/your/example.scala

Again, replace `path/to/your/example.scala` with the actual path to your script file. The output will look like this:


+-----+---+
| Name|Age|
+-----+---+
|Alice| 28|
|  Bob| 35|
|Cathy| 30|
+-----+---+

This guide has walked through the process of running a Spark file from the Spark Shell step-by-step, using both PySpark and Scala as examples. Following these steps, you can easily test and develop Spark applications interactively.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts deeply skilled in Apache Spark, PySpark, and Machine Learning, alongside proficiency in Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They're not just experts; they're passionate educators, dedicated to demystifying complex data concepts through engaging and easy-to-understand tutorials.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top