To provide a schema while reading a CSV file as a DataFrame in Scala Spark, you can use the `StructType` and `StructField` classes. This can help in specifying the column names, data types, and also enforce data integrity. Below are the steps on how to achieve this:
Providing Schema While Reading a CSV as DataFrame in Scala Spark
Here’s an example demonstrating how to provide a schema while reading a CSV file using Scala in Spark:
import org.apache.spark.sql.{SparkSession, DataFrame}
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}
val spark = SparkSession.builder()
.appName("CSVWithSchemaExample")
.config("spark.master", "local")
.getOrCreate()
// Define the schema
val schema = StructType(Array(
StructField("Name", StringType, true),
StructField("Age", IntegerType, true),
StructField("City", StringType, true)
))
// Read CSV with custom schema
val df: DataFrame = spark.read
.format("csv")
.option("header", "true")
.schema(schema)
.load("path_to_your_csv_file.csv")
df.show()
Explanation:
Let’s break down the code:
- Importing necessary libraries: Import libraries required to work with DataFrame and define schema.
- Creating SparkSession: Create a SparkSession object which is the entry point for reading data and executing Spark queries.
- Defining the schema: Use `StructType` and `StructField` to define the schema for the CSV file. Here:
- `Name` is of `StringType`
- `Age` is of `IntegerType`
- `City` is of `StringType`
- Reading CSV: Use `spark.read.format(“csv”)` to specify that we are reading a CSV file. The `.option(“header”, “true”)` ensures that the first row is used as the header. `.schema(schema)` applies the defined schema to the CSV file.
- Displaying DataFrame: Use `df.show()` to display the content of the DataFrame.
Expected Output:
+-----+---+------+
| Name|Age| City|
+-----+---+------+
| John| 25| York|
| Jane| 30| Paris|
|Doe | 35|Berlin|
+-----+---+------+
In the above output, the data from the CSV file is displayed in a tabular format where the schema defined is enforced.