To check if a Spark DataFrame is empty, you can use several methods depending on the programming language you are using. I’ll show you examples in PySpark, Scala, and Java.
Method 1: Using the count
Method
PySpark
In PySpark, you can use the count
method to check if the DataFrame is empty. The count
method returns the number of rows in the DataFrame.
from pyspark.sql import SparkSession
# Initialize Spark session
spark = SparkSession.builder.appName("example").getOrCreate()
# Create an empty DataFrame
df = spark.createDataFrame([], schema="id INT, name STRING")
# Check if DataFrame is empty
is_empty = df.count() == 0
print("Is DataFrame empty?", is_empty)
Is DataFrame empty? True
Scala
In Scala, you can also use the count
method to check if the DataFrame is empty.
import org.apache.spark.sql.SparkSession
// Initialize Spark session
val spark = SparkSession.builder.appName("example").getOrCreate()
// Create an empty DataFrame
val df = spark.emptyDataFrame
// Check if DataFrame is empty
val isEmpty = df.count() == 0
println(s"Is DataFrame empty? $isEmpty")
Is DataFrame empty? true
Java
In Java, you can use the count
method similarly to check if the DataFrame is empty.
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
public class Main {
public static void main(String[] args) {
// Initialize Spark session
SparkSession spark = SparkSession.builder().appName("example").getOrCreate();
// Create an empty DataFrame
Dataset<Row> df = spark.emptyDataFrame();
// Check if DataFrame is empty
boolean isEmpty = df.count() == 0;
System.out.println("Is DataFrame empty? " + isEmpty);
}
}
Is DataFrame empty? true
Method 2: Using the head
Method
Checking for an empty DataFrame using the count
method can be costly as it requires a full scan of the data. An alternative approach is to use the head
method.
PySpark
# Check if DataFrame is empty using head method
is_empty = len(df.head(1)) == 0
print("Is DataFrame empty?", is_empty)
Is DataFrame empty? True
Scala
// Check if DataFrame is empty using head method
val isEmpty = df.head(1).isEmpty
println(s"Is DataFrame empty? $isEmpty")
Is DataFrame empty? true
Java
import java.util.List;
// Check if DataFrame is empty using head method
List<Row> rows = df.head(1);
boolean isEmpty = rows.isEmpty();
System.out.println("Is DataFrame empty? " + isEmpty);
Is DataFrame empty? true
Both methods are effective, but the head
method is typically more efficient for large DataFrames as it does not require a full data scan.