How to Rename Column Names in a DataFrame Using Spark Scala?

Renaming column names in a DataFrame using Spark Scala is a common task in data processing. You can achieve this with the `withColumnRenamed` method. Below, I will provide a detailed explanation along with appropriate code snippets.

Renaming Column Names in a DataFrame Using Spark Scala

Suppose you have the following DataFrame:


import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._

val spark = SparkSession.builder
  .appName("Rename Columns Example")
  .master("local[*]")
  .getOrCreate()

import spark.implicits._

val df = Seq(
  (1, "John", 28),
  (2, "Mike", 35),
  (3, "Sara", 22)
).toDF("Id", "Name", "Age")

df.show()

Output:


 +---+----+---+
 | Id|Name|Age|
 +---+----+---+
 |  1|John| 28|
 |  2|Mike| 35|
 |  3|Sara| 22|
 +---+----+---+

To rename columns, you can use the `withColumnRenamed` method. Here, we will rename the “Id” column to “Identifier”, “Name” column to “FullName”, and “Age” column to “Years”.


val renamedDf = df
  .withColumnRenamed("Id", "Identifier")
  .withColumnRenamed("Name", "FullName")
  .withColumnRenamed("Age", "Years")

renamedDf.show()

Output:


 +----------+--------+-----+
 |Identifier| FullName|Years|
 +----------+--------+-----+
 |         1|    John|   28|
 |         2|    Mike|   35|
 |         3|    Sara|   22|
 +----------+--------+-----+

Explanation

In the given example:

  • We start by creating a Spark session and a DataFrame with some sample data.
  • The `toDF` method is used to create a DataFrame from a sequence of tuples and assign column names: “Id”, “Name”, and “Age”.
  • We then use the `withColumnRenamed` method to rename each column one by one.
  • After renaming, the columns are shown in the new names: “Identifier”, “FullName”, and “Years”.

This is how you can rename column names in a DataFrame using Spark Scala. You can chain multiple `withColumnRenamed` calls to rename several columns, as shown in the example above.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts deeply skilled in Apache Spark, PySpark, and Machine Learning, alongside proficiency in Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They're not just experts; they're passionate educators, dedicated to demystifying complex data concepts through engaging and easy-to-understand tutorials.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top