How to Extract Year, Month, and Day from TimestampType in Spark DataFrame?

The task of extracting the year, month, and day from a `TimestampType` column in an Apache Spark DataFrame can be efficiently handled using built-in functions in Spark SQL. Below, I will provide detailed explanations and examples using PySpark, Scala, and Java.

Using PySpark

In PySpark, the `year()`, `month()`, and `dayofmonth()` functions are used to extract the year, month, and day from a `TimestampType` column.


from pyspark.sql import SparkSession
from pyspark.sql.functions import year, month, dayofmonth

# Initialize SparkSession
spark = SparkSession.builder.master("local").appName("ExtractDatetime").getOrCreate()

# Sample data
data = [("2023-10-05 12:30:00",), ("2022-06-01 08:45:00",)]
columns = ["timestamp"]

# Create DataFrame
df = spark.createDataFrame(data, columns)

# Extract Year, Month, and Day
df = df.withColumn("year", year(df["timestamp"])) \
       .withColumn("month", month(df["timestamp"])) \
       .withColumn("day", dayofmonth(df["timestamp"]))

df.show()
+——————-+—-+—–+—+ | timestamp|year|month|day| +——————-+—-+—–+—+ |2023-10-05 12:30:00|2023| 10| 5| |2022-06-01 08:45:00|2022| 6| 1| +——————-+—-+—–+—+

Using Scala

In Scala, we use similar functions provided by the `org.apache.spark.sql.functions` package to extract the year, month, and day.


import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{year, month, dayofmonth}

// Initialize SparkSession
val spark = SparkSession.builder.master("local").appName("ExtractDatetime").getOrCreate()

// Sample data
val data = Seq(("2023-10-05 12:30:00"), ("2022-06-01 08:45:00"))
val columns = Seq("timestamp")

// Create DataFrame
import spark.implicits._
val df = data.toDF(columns: _*)

// Extract Year, Month, and Day
val resultDF = df.withColumn("year", year($"timestamp"))
  .withColumn("month", month($"timestamp"))
  .withColumn("day", dayofmonth($"timestamp"))

resultDF.show()
+——————-+—-+—–+—+ | timestamp|year|month|day| +——————-+—-+—–+—+ |2023-10-05 12:30:00|2023| 10| 5| |2022-06-01 08:45:00|2022| 6| 1| +——————-+—-+—–+—+

Using Java

In Java, the process is slightly more verbose, but the methodology remains the same. We use the `org.apache.spark.sql.functions` package.


import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import static org.apache.spark.sql.functions.*;

public class ExtractDatetime {
    public static void main(String[] args) {
        // Initialize SparkSession
        SparkSession spark = SparkSession.builder().master("local").appName("ExtractDatetime").getOrCreate();

        // Sample data
        List<Row> data = Arrays.asList(
            RowFactory.create("2023-10-05 12:30:00"),
            RowFactory.create("2022-06-01 08:45:00")
        );
        StructType schema = new StructType(new StructField[] {
            new StructField("timestamp", DataTypes.StringType, false, Metadata.empty())
        });

        // Create DataFrame
        Dataset<Row> df = spark.createDataFrame(data, schema);

        // Extract Year, Month, and Day
        df = df.withColumn("year", year(col("timestamp")))
               .withColumn("month", month(col("timestamp")))
               .withColumn("day", dayofmonth(col("timestamp")));

        df.show();
    }
}
+——————-+—-+—–+—+ | timestamp|year|month|day| +——————-+—-+—–+—+ |2023-10-05 12:30:00|2023| 10| 5| |2022-06-01 08:45:00|2022| 6| 1| +——————-+—-+—–+—+

In these examples, we first created a DataFrame with sample timestamp data. Using Spark SQL functions like `year()`, `month()`, and `dayofmonth()`, we extracted the relevant date parts and added them as new columns to the DataFrame.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts who are highly skilled in Apache Spark, PySpark, and Machine Learning. They are also proficient in Python, Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They aren't just experts; they are passionate teachers. They are dedicated to making complex data concepts easy to understand through engaging and simple tutorials with examples.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top