How to Retrieve the Name of a DataFrame Column in PySpark?

Retrieving the name of a DataFrame column in PySpark is relatively straightforward. PySpark DataFrames have a `columns` attribute that returns a list of names of each column in the DataFrame.

Using the `columns` Attribute

You can use the `columns` attribute directly on the DataFrame object. Here is an example:


from pyspark.sql import SparkSession

# Initialize SparkSession
spark = SparkSession.builder.appName("example").getOrCreate()

# Create a DataFrame
data = [(1, "Alice"), (2, "Bob")]
columns = ["ID", "Name"]
df = spark.createDataFrame(data, columns)

# Retrieve column names
column_names = df.columns
print(column_names)

The output will be a list of column names:


['ID', 'Name']

Using the `dtypes` Attribute

Another way to retrieve columns, along with their data types, is by using the `dtypes` attribute:


# Retrieve column names along with their data types
column_info = df.dtypes
print(column_info)

The output will be a list of tuples where each tuple contains the column name and its data type:


[('ID', 'bigint'), ('Name', 'string')]

Using Schema

If you want more detailed information about columns, you can use the schema attribute of the DataFrame:


# Retrieve schema
schema_info = df.schema
print(schema_info)

The output will be:


StructType(List(StructField(ID,LongType,true),StructField(Name,StringType,true)))

You can also iterate through `schema` to get each column’s details separately:


for field in df.schema.fields:
    print(f"Column Name: {field.name}, Data Type: {field.dataType}")

The output will be:


Column Name: ID, Data Type: LongType
Column Name: Name, Data Type: StringType

These are the common methods to retrieve the name of a DataFrame column in PySpark. You can choose any of them based on your specific use case and the level of detail required.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts deeply skilled in Apache Spark, PySpark, and Machine Learning, alongside proficiency in Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They're not just experts; they're passionate educators, dedicated to demystifying complex data concepts through engaging and easy-to-understand tutorials.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top