How to Sort in Descending Order Using PySpark?

Sorting in descending order using PySpark can be achieved by employing the `orderBy` function with the `desc` function. Below is a detailed explanation and code snippet to illustrate how you can sort a DataFrame in descending order using PySpark.

Contents hide

1 Step-by-Step Explanation

1.1 1. Setting Up the Environment

1.2 2. Create a Sample DataFrame

1.3 3. Sorting the DataFrame in Descending Order

2 Additional Notes

3 About Editorial Team

4 You Might Also Like:

Step-by-Step Explanation

1. Setting Up the Environment

First, ensure you have PySpark installed and your Spark session is correctly set up.

2. Create a Sample DataFrame

For demonstration purposes, let’s create a simple DataFrame.


from pyspark.sql import SparkSession
from pyspark.sql.functions import col, desc

# Initialize Spark session
spark = SparkSession.builder \
    .appName("SortDescDemo") \
    .getOrCreate()

# Sample data
data = [("Alice", 34), ("Bob", 45), ("Catherine", 29), ("David", 37)]

# Create DataFrame
df = spark.createDataFrame(data, ["Name", "Age"])

# Show the original DataFrame
df.show()


+---------+---+
|     Name|Age|
+---------+---+
|    Alice| 34|
|      Bob| 45|
| Catherine| 29|
|    David| 37|
+---------+---+

3. Sorting the DataFrame in Descending Order

You can sort the DataFrame in descending order by using the `orderBy` function along with the `desc` function from `pyspark.sql.functions`.


# Sort by Age in descending order
sorted_df = df.orderBy(desc("Age"))

# Show the sorted DataFrame
sorted_df.show()


+---------+---+
|     Name|Age|
+---------+---+
|      Bob| 45|
|    David| 37|
|    Alice| 34|
| Catherine| 29|
+---------+---+

In the above example, the DataFrame is sorted based on the “Age” column in descending order. The `desc` function specifies that the ordering should be descending.

Additional Notes

– You can sort by multiple columns by chaining the `desc` function with other columns.
– Always ensure your Spark session is properly configured for efficient execution.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts who are highly skilled in Apache Spark, PySpark, and Machine Learning. They are also proficient in Python, Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They aren't just experts; they are passionate teachers. They are dedicated to making complex data concepts easy to understand through engaging and simple tutorials with examples.