How to Load a CSV File as a DataFrame in Spark?

Loading CSV files as DataFrames in Spark is a common operation. Depending on the language you are using with Spark, the syntax will vary slightly. Below are examples using PySpark, Scala, and Java to demonstrate how to accomplish this.

Loading a CSV file in PySpark

In PySpark, you can use `spark.read.csv` to read a CSV file.


from pyspark.sql import SparkSession

# Initialize SparkSession
spark = SparkSession.builder.appName("CSVExample").getOrCreate()

# Load CSV file as DataFrame
df = spark.read.csv("path/to/csvfile.csv", header=True, inferSchema=True)

# Show DataFrame
df.show()

![Output](https://via.placeholder.com/728×90)


+----+-------+------+
| ID |  Name |  Age |
+----+-------+------+
|  1 | Alice |   23 |
|  2 |   Bob |   34 |
|  3 | Carol |   50 |
+----+-------+------+

Loading a CSV file in Scala

In Scala, you can use `spark.read.format(“csv”)` to read a CSV file.


import org.apache.spark.sql.SparkSession

// Initialize SparkSession
val spark = SparkSession.builder.appName("CSVExample").getOrCreate()

// Load CSV file as DataFrame
val df = spark.read.format("csv")
  .option("header", "true")
  .option("inferSchema", "true")
  .load("path/to/csvfile.csv")

// Show DataFrame
df.show()

+----+-------+------+
| ID |  Name |  Age |
+----+-------+------+
|  1 | Alice |   23 |
|  2 |   Bob |   34 |
|  3 | Carol |   50 |
+----+-------+------+

Loading a CSV file in Java

In Java, the process is similar using `spark.read().format(“csv”)`.


import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class CSVExample {
    public static void main(String[] args) {
        // Initialize SparkSession
        SparkSession spark = SparkSession.builder()
            .appName("CSVExample")
            .getOrCreate();

        // Load CSV file as DataFrame
        Dataset<Row> df = spark.read().format("csv")
            .option("header", "true")
            .option("inferSchema", "true")
            .load("path/to/csvfile.csv");

        // Show DataFrame
        df.show();
    }
}

+----+-------+------+
| ID |  Name |  Age |
+----+-------+------+
|  1 | Alice |   23 |
|  2 |   Bob |   34 |
|  3 | Carol |   50 |
+----+-------+------+

These examples demonstrate how easy it is to load a CSV file into a Spark DataFrame using various languages. Key options like `header` and `inferSchema` help manage the structure and types of the data being read.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts who are highly skilled in Apache Spark, PySpark, and Machine Learning. They are also proficient in Python, Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They aren't just experts; they are passionate teachers. They are dedicated to making complex data concepts easy to understand through engaging and simple tutorials with examples.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top