How to Convert PySpark String to Date Format?

To convert a string to a date format in PySpark, you typically use the `to_date` or `to_timestamp` functions available in the `pyspark.sql.functions` module. Here’s how you can do it:

Method 1: Using `to_date` function

The `to_date` function converts a string to a date type without time information.

Example:


from pyspark.sql import SparkSession
from pyspark.sql.functions import to_date

# Initialize SparkSession
spark = SparkSession.builder.appName("String to Date Conversion").getOrCreate()

# Sample data
data = [("2023-10-01",), ("2021-05-12",), ("2019-07-25",)]
columns = ["date_string"]

# Create DataFrame
df = spark.createDataFrame(data, columns)

# Convert string to date
df = df.withColumn("date", to_date(df["date_string"], "yyyy-MM-dd"))

# Show the DataFrame
df.show()

+-----------+----------+
|date_string|      date|
+-----------+----------+
| 2023-10-01|2023-10-01|
| 2021-05-12|2021-05-12|
| 2019-07-25|2019-07-25|
+-----------+----------+

Method 2: Using `to_timestamp` function

The `to_timestamp` function converts a string to a timestamp type, which includes both date and time information.

Example:


from pyspark.sql import SparkSession
from pyspark.sql.functions import to_timestamp

# Initialize SparkSession
spark = SparkSession.builder.appName("String to Timestamp Conversion").getOrCreate()

# Sample data
data = [("2023-10-01 12:45:30",), ("2021-05-12 04:23:50",), ("2019-07-25 19:30:00",)]
columns = ["timestamp_string"]

# Create DataFrame
df = spark.createDataFrame(data, columns)

# Convert string to timestamp
df = df.withColumn("timestamp", to_timestamp(df["timestamp_string"], "yyyy-MM-dd HH:mm:ss"))

# Show the DataFrame
df.show()

+-------------------+-------------------+
|   timestamp_string|          timestamp|
+-------------------+-------------------+
|2023-10-01 12:45:30|2023-10-01 12:45:30|
|2021-05-12 04:23:50|2021-05-12 04:23:50|
|2019-07-25 19:30:00|2019-07-25 19:30:00|
+-------------------+-------------------+

Both methods are useful depending on whether you need just the date or both date and time.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts who are highly skilled in Apache Spark, PySpark, and Machine Learning. They are also proficient in Python, Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They aren't just experts; they are passionate teachers. They are dedicated to making complex data concepts easy to understand through engaging and simple tutorials with examples.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top