How to Sort by Column in Descending Order in Spark SQL?

To sort by a column in descending order in Spark SQL, you can use the `ORDER BY` clause with the `DESC` keyword. You can run a SQL query using Spark SQL after creating a temporary view of your DataFrame or directly using the DataFrame API in PySpark, Scala, or Java. Below are examples in PySpark and Scala:

Example in PySpark

First, we’ll create a DataFrame and then sort it in descending order using Spark SQL.


from pyspark.sql import SparkSession
from pyspark.sql.functions import desc

# Initialize SparkSession
spark = SparkSession.builder \
    .appName("SortExample") \
    .getOrCreate()

# Sample data
data = [("Alice", 34), ("Bob", 45), ("Cathy", 29)]
columns = ["Name", "Age"]

# Create DataFrame
df = spark.createDataFrame(data, columns)

# Create a temporary view
df.createOrReplaceTempView("people")

# Sort by Age in descending order using Spark SQL
sorted_df_sql = spark.sql("SELECT * FROM people ORDER BY Age DESC")

# Show result
sorted_df_sql.show()

+-----+---+
| Name|Age|
+-----+---+
|  Bob| 45|
|Alice| 34|
|Cathy| 29|
+-----+---+

Alternatively, you can use the DataFrame API to sort by a column in descending order.


# Sort by Age in descending order using DataFrame API
sorted_df = df.orderBy(desc("Age"))

# Show result
sorted_df.show()

+-----+---+
| Name|Age|
+-----+---+
|  Bob| 45|
|Alice| 34|
|Cathy| 29|
+-----+---+

Example in Scala

We’ll achieve the same result using Scala:


import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.desc

// Initialize SparkSession
val spark = SparkSession.builder()
    .appName("SortExample")
    .getOrCreate()

// Sample data
val data = Seq(("Alice", 34), ("Bob", 45), ("Cathy", 29))
val columns = Seq("Name", "Age")

// Create DataFrame
val df = spark.createDataFrame(data).toDF(columns: _*)

// Create a temporary view
df.createOrReplaceTempView("people")

// Sort by Age in descending order using Spark SQL
val sorted_df_sql = spark.sql("SELECT * FROM people ORDER BY Age DESC")

// Show result
sorted_df_sql.show()

+-----+---+
| Name|Age|
+-----+---+
|  Bob| 45|
|Alice| 34|
|Cathy| 29|
+-----+---+

You can also use the DataFrame API in Scala:


// Sort by Age in descending order using DataFrame API
val sorted_df = df.orderBy(desc("Age"))

// Show result
sorted_df.show()

+-----+---+
| Name|Age|
+-----+---+
|  Bob| 45|
|Alice| 34|
|Cathy| 29|
+-----+---+

Either method can be used depending on whether you prefer SQL syntax or the DataFrame API.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts who are highly skilled in Apache Spark, PySpark, and Machine Learning. They are also proficient in Python, Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They aren't just experts; they are passionate teachers. They are dedicated to making complex data concepts easy to understand through engaging and simple tutorials with examples.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top