How Do You Export a Table DataFrame in PySpark to CSV?

Exporting a DataFrame to a CSV file in PySpark is a straightforward process, but it involves a few steps. Below is a detailed explanation along with a code snippet to demonstrate exporting a DataFrame to CSV.

Exporting a DataFrame to CSV in PySpark

To export a DataFrame to a CSV file in PySpark, you need to use the `write` method provided by the DataFrame object. The `write.csv` method is used to write the DataFrame to a CSV file. Let’s walk through the steps with an example.

Step-by-Step Explanation

1. **Create a PySpark DataFrame**: First, you need to have a DataFrame in PySpark. For this example, we’ll create a simple DataFrame with some sample data.

2. **Write DataFrame to CSV**: Use the `write.csv` method to export the DataFrame to a CSV file. Various options such as mode, header, and path can be specified.

Example Code


# Import necessary modules
from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder.appName("ExportToCSVExample").getOrCreate()

# Create a sample DataFrame
data = [("Alice", 34), ("Bob", 45), ("Cathy", 29)]
columns = ["Name", "Age"]
df = spark.createDataFrame(data, schema=columns)

# Show the DataFrame
df.show()

# Write DataFrame to CSV
df.write.csv(path="output/data.csv", header=True, mode="overwrite")

# Stop the SparkSession
spark.stop()

Code Explanation

1. **Create a SparkSession**: Instantiate a SparkSession object.

2. **Create a DataFrame**: The `data` variable contains sample data, and `columns` specifies the column names. The DataFrame is created using the `createDataFrame` method.

3. **Display the DataFrame**: The `show` method is used to display the DataFrame.

4. **Write DataFrame to CSV**: The `write.csv` method is used to write the DataFrame to a CSV file. Here, `path` specifies the location to save the CSV file, `header=True` adds the header to the CSV, and `mode=”overwrite”` specifies that any existing file with the same name should be overwritten.

5. **Stop the SparkSession**: The `stop` method is called to stop the SparkSession.

Expected Output


+---+---+
|Name|Age|
+---+---+
|Alice| 34|
|Bob  | 45 |
|Cathy| 29|
+---+---+

The contents of the CSV file (`data.csv`) will look like:


Name,Age
Alice,34
Bob,45
Cathy,29

By following these steps, you can easily export a DataFrame to a CSV file in PySpark.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts deeply skilled in Apache Spark, PySpark, and Machine Learning, alongside proficiency in Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They're not just experts; they're passionate educators, dedicated to demystifying complex data concepts through engaging and easy-to-understand tutorials.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top