To sort by a column in descending order in Spark SQL, you can use the `ORDER BY` clause with the `DESC` keyword. You can run a SQL query using Spark SQL after creating a temporary view of your DataFrame or directly using the DataFrame API in PySpark, Scala, or Java. Below are examples in PySpark and Scala:
Example in PySpark
First, we’ll create a DataFrame and then sort it in descending order using Spark SQL.
from pyspark.sql import SparkSession
from pyspark.sql.functions import desc
# Initialize SparkSession
spark = SparkSession.builder \
.appName("SortExample") \
.getOrCreate()
# Sample data
data = [("Alice", 34), ("Bob", 45), ("Cathy", 29)]
columns = ["Name", "Age"]
# Create DataFrame
df = spark.createDataFrame(data, columns)
# Create a temporary view
df.createOrReplaceTempView("people")
# Sort by Age in descending order using Spark SQL
sorted_df_sql = spark.sql("SELECT * FROM people ORDER BY Age DESC")
# Show result
sorted_df_sql.show()
+-----+---+
| Name|Age|
+-----+---+
| Bob| 45|
|Alice| 34|
|Cathy| 29|
+-----+---+
Alternatively, you can use the DataFrame API to sort by a column in descending order.
# Sort by Age in descending order using DataFrame API
sorted_df = df.orderBy(desc("Age"))
# Show result
sorted_df.show()
+-----+---+
| Name|Age|
+-----+---+
| Bob| 45|
|Alice| 34|
|Cathy| 29|
+-----+---+
Example in Scala
We’ll achieve the same result using Scala:
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.desc
// Initialize SparkSession
val spark = SparkSession.builder()
.appName("SortExample")
.getOrCreate()
// Sample data
val data = Seq(("Alice", 34), ("Bob", 45), ("Cathy", 29))
val columns = Seq("Name", "Age")
// Create DataFrame
val df = spark.createDataFrame(data).toDF(columns: _*)
// Create a temporary view
df.createOrReplaceTempView("people")
// Sort by Age in descending order using Spark SQL
val sorted_df_sql = spark.sql("SELECT * FROM people ORDER BY Age DESC")
// Show result
sorted_df_sql.show()
+-----+---+
| Name|Age|
+-----+---+
| Bob| 45|
|Alice| 34|
|Cathy| 29|
+-----+---+
You can also use the DataFrame API in Scala:
// Sort by Age in descending order using DataFrame API
val sorted_df = df.orderBy(desc("Age"))
// Show result
sorted_df.show()
+-----+---+
| Name|Age|
+-----+---+
| Bob| 45|
|Alice| 34|
|Cathy| 29|
+-----+---+
Either method can be used depending on whether you prefer SQL syntax or the DataFrame API.