How to Retrieve the Name of a DataFrame Column in PySpark?

Retrieving the name of a DataFrame column in PySpark is relatively straightforward. PySpark DataFrames have a `columns` attribute that returns a list of names of each column in the DataFrame.

Using the `columns` Attribute

You can use the `columns` attribute directly on the DataFrame object. Here is an example:


from pyspark.sql import SparkSession

# Initialize SparkSession
spark = SparkSession.builder.appName("example").getOrCreate()

# Create a DataFrame
data = [(1, "Alice"), (2, "Bob")]
columns = ["ID", "Name"]
df = spark.createDataFrame(data, columns)

# Retrieve column names
column_names = df.columns
print(column_names)

The output will be a list of column names:


['ID', 'Name']

Using the `dtypes` Attribute

Another way to retrieve columns, along with their data types, is by using the `dtypes` attribute:


# Retrieve column names along with their data types
column_info = df.dtypes
print(column_info)

The output will be a list of tuples where each tuple contains the column name and its data type:


[('ID', 'bigint'), ('Name', 'string')]

Using Schema

If you want more detailed information about columns, you can use the schema attribute of the DataFrame:


# Retrieve schema
schema_info = df.schema
print(schema_info)

The output will be:


StructType(List(StructField(ID,LongType,true),StructField(Name,StringType,true)))

You can also iterate through `schema` to get each column’s details separately:


for field in df.schema.fields:
    print(f"Column Name: {field.name}, Data Type: {field.dataType}")

The output will be:


Column Name: ID, Data Type: LongType
Column Name: Name, Data Type: StringType

These are the common methods to retrieve the name of a DataFrame column in PySpark. You can choose any of them based on your specific use case and the level of detail required.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts who are highly skilled in Apache Spark, PySpark, and Machine Learning. They are also proficient in Python, Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They aren't just experts; they are passionate teachers. They are dedicated to making complex data concepts easy to understand through engaging and simple tutorials with examples.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top