How to Check if a Spark DataFrame is Empty?

To check if a Spark DataFrame is empty, you can use several methods depending on the programming language you are using. I’ll show you examples in PySpark, Scala, and Java.

Method 1: Using the count Method

PySpark

In PySpark, you can use the count method to check if the DataFrame is empty. The count method returns the number of rows in the DataFrame.


from pyspark.sql import SparkSession

# Initialize Spark session
spark = SparkSession.builder.appName("example").getOrCreate()

# Create an empty DataFrame
df = spark.createDataFrame([], schema="id INT, name STRING")

# Check if DataFrame is empty
is_empty = df.count() == 0
print("Is DataFrame empty?", is_empty)

Is DataFrame empty? True

Scala

In Scala, you can also use the count method to check if the DataFrame is empty.


import org.apache.spark.sql.SparkSession

// Initialize Spark session
val spark = SparkSession.builder.appName("example").getOrCreate()

// Create an empty DataFrame
val df = spark.emptyDataFrame

// Check if DataFrame is empty
val isEmpty = df.count() == 0
println(s"Is DataFrame empty? $isEmpty")

Is DataFrame empty? true

Java

In Java, you can use the count method similarly to check if the DataFrame is empty.


import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;

public class Main {
    public static void main(String[] args) {
        // Initialize Spark session
        SparkSession spark = SparkSession.builder().appName("example").getOrCreate();

        // Create an empty DataFrame
        Dataset<Row> df = spark.emptyDataFrame();

        // Check if DataFrame is empty
        boolean isEmpty = df.count() == 0;
        System.out.println("Is DataFrame empty? " + isEmpty);
    }
}

Is DataFrame empty? true

Method 2: Using the head Method

Checking for an empty DataFrame using the count method can be costly as it requires a full scan of the data. An alternative approach is to use the head method.

PySpark


# Check if DataFrame is empty using head method
is_empty = len(df.head(1)) == 0
print("Is DataFrame empty?", is_empty)

Is DataFrame empty? True

Scala


// Check if DataFrame is empty using head method
val isEmpty = df.head(1).isEmpty
println(s"Is DataFrame empty? $isEmpty")

Is DataFrame empty? true

Java


import java.util.List;

// Check if DataFrame is empty using head method
List<Row> rows = df.head(1);
boolean isEmpty = rows.isEmpty();
System.out.println("Is DataFrame empty? " + isEmpty);

Is DataFrame empty? true

Both methods are effective, but the head method is typically more efficient for large DataFrames as it does not require a full data scan.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts deeply skilled in Apache Spark, PySpark, and Machine Learning, alongside proficiency in Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They're not just experts; they're passionate educators, dedicated to demystifying complex data concepts through engaging and easy-to-understand tutorials.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top