Loading CSV files as DataFrames in Spark is a common operation. Depending on the language you are using with Spark, the syntax will vary slightly. Below are examples using PySpark, Scala, and Java to demonstrate how to accomplish this.
Loading a CSV file in PySpark
In PySpark, you can use `spark.read.csv` to read a CSV file.
from pyspark.sql import SparkSession
# Initialize SparkSession
spark = SparkSession.builder.appName("CSVExample").getOrCreate()
# Load CSV file as DataFrame
df = spark.read.csv("path/to/csvfile.csv", header=True, inferSchema=True)
# Show DataFrame
df.show()
![Output](https://via.placeholder.com/728×90)
+----+-------+------+
| ID | Name | Age |
+----+-------+------+
| 1 | Alice | 23 |
| 2 | Bob | 34 |
| 3 | Carol | 50 |
+----+-------+------+
Loading a CSV file in Scala
In Scala, you can use `spark.read.format(“csv”)` to read a CSV file.
import org.apache.spark.sql.SparkSession
// Initialize SparkSession
val spark = SparkSession.builder.appName("CSVExample").getOrCreate()
// Load CSV file as DataFrame
val df = spark.read.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("path/to/csvfile.csv")
// Show DataFrame
df.show()
+----+-------+------+
| ID | Name | Age |
+----+-------+------+
| 1 | Alice | 23 |
| 2 | Bob | 34 |
| 3 | Carol | 50 |
+----+-------+------+
Loading a CSV file in Java
In Java, the process is similar using `spark.read().format(“csv”)`.
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class CSVExample {
public static void main(String[] args) {
// Initialize SparkSession
SparkSession spark = SparkSession.builder()
.appName("CSVExample")
.getOrCreate();
// Load CSV file as DataFrame
Dataset<Row> df = spark.read().format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("path/to/csvfile.csv");
// Show DataFrame
df.show();
}
}
+----+-------+------+
| ID | Name | Age |
+----+-------+------+
| 1 | Alice | 23 |
| 2 | Bob | 34 |
| 3 | Carol | 50 |
+----+-------+------+
These examples demonstrate how easy it is to load a CSV file into a Spark DataFrame using various languages. Key options like `header` and `inferSchema` help manage the structure and types of the data being read.