How to Rename Column Names in a DataFrame Using Spark Scala?

Renaming column names in a DataFrame using Spark Scala is a common task in data processing. You can achieve this with the `withColumnRenamed` method. Below, I will provide a detailed explanation along with appropriate code snippets.

Renaming Column Names in a DataFrame Using Spark Scala

Suppose you have the following DataFrame:


import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._

val spark = SparkSession.builder
  .appName("Rename Columns Example")
  .master("local[*]")
  .getOrCreate()

import spark.implicits._

val df = Seq(
  (1, "John", 28),
  (2, "Mike", 35),
  (3, "Sara", 22)
).toDF("Id", "Name", "Age")

df.show()

Output:


 +---+----+---+
 | Id|Name|Age|
 +---+----+---+
 |  1|John| 28|
 |  2|Mike| 35|
 |  3|Sara| 22|
 +---+----+---+

To rename columns, you can use the `withColumnRenamed` method. Here, we will rename the “Id” column to “Identifier”, “Name” column to “FullName”, and “Age” column to “Years”.


val renamedDf = df
  .withColumnRenamed("Id", "Identifier")
  .withColumnRenamed("Name", "FullName")
  .withColumnRenamed("Age", "Years")

renamedDf.show()

Output:


 +----------+--------+-----+
 |Identifier| FullName|Years|
 +----------+--------+-----+
 |         1|    John|   28|
 |         2|    Mike|   35|
 |         3|    Sara|   22|
 +----------+--------+-----+

Explanation

In the given example:

  • We start by creating a Spark session and a DataFrame with some sample data.
  • The `toDF` method is used to create a DataFrame from a sequence of tuples and assign column names: “Id”, “Name”, and “Age”.
  • We then use the `withColumnRenamed` method to rename each column one by one.
  • After renaming, the columns are shown in the new names: “Identifier”, “FullName”, and “Years”.

This is how you can rename column names in a DataFrame using Spark Scala. You can chain multiple `withColumnRenamed` calls to rename several columns, as shown in the example above.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts who are highly skilled in Apache Spark, PySpark, and Machine Learning. They are also proficient in Python, Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They aren't just experts; they are passionate teachers. They are dedicated to making complex data concepts easy to understand through engaging and simple tutorials with examples.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top