How to Add and Delete Columns in Pandas DataFrames

Working with data in Python often involves using Pandas DataFrames, which are powerful and flexible data structures that allow for easy manipulation of structured data. Two common operations when working with DataFrames are adding and deleting columns. Adding columns can be useful for inserting new computed fields or merging data from different sources, while deleting columns can help in removing unnecessary data or simplifying the dataset. In this guide, we will explore various methods to add and delete columns in Pandas DataFrames, ensuring that you have a comprehensive toolkit to manage your data effectively.

Contents hide

1 Prerequisites

2 Adding Columns to a DataFrame

2.1 Adding a New Column with Scalars

2.2 Adding a Column With a List or Array

2.3 Adding a Column from Another Column

2.4 Using the assign() Method

3 Deleting Columns from a DataFrame

3.1 Using the del Keyword

3.2 Using the pop() Method

3.3 Using the drop() Method

4 About Editorial Team

5 You Might Also Like:

Prerequisites

Before we begin, make sure you have the Pandas library installed in your Python environment. If you haven’t done so, you can install it using pip:


pip install pandas

After installation, you can import Pandas in your script and create a simple DataFrame to work with throughout this guide:


import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

print(df)

The output of the DataFrame should look like this:

Adding Columns to a DataFrame

Adding a New Column with Scalars

You can add a new column to a DataFrame by simply assigning a scalar value to a column that does not exist yet. This will populate the entire column with the scalar value you provide.


df['D'] = 10

print(df)

The DataFrame now includes the new column ‘D’:


   A  B  C   D
0  1  4  7  10
1  2  5  8  10
2  3  6  9  10

Adding a Column With a List or Array

If you want to assign different values to the entries in the new column, you can use a list or an array. Make sure that the length of the list or array matches the number of rows in the DataFrame.


df['E'] = [20, 30, 40]

print(df)

And the DataFrame now includes the column ‘E’ with specified values:


   A  B  C   D   E
0  1  4  7  10  20
1  2  5  8  10  30
2  3  6  9  10  40

Adding a Column from Another Column

You can also create a new column by performing operations on existing columns. For instance, this is how you can add a new column ‘F’ as a sum of columns ‘A’ and ‘B’:


df['F'] = df['A'] + df['B']

print(df)

As a result, the DataFrame ‘F’ is the sum of columns ‘A’ and ‘B’:


   A  B  C   D   E   F
0  1  4  7  10  20   5
1  2  5  8  10  30   7
2  3  6  9  10  40   9

Using the assign() Method

The assign() method provides a more functional approach to adding columns. This method returns a new DataFrame with added columns and retains the original DataFrame unchanged.


new_df = df.assign(G=lambda x: x['A'] * 2)

print(new_df)

The new DataFrame ‘new_df’ has an additional column ‘G’:


   A  B  C   D   E   F   G
0  1  4  7  10  20   5   2
1  2  5  8  10  30   7   4
2  3  6  9  10  40   9   6

Deleting Columns from a DataFrame

Using the del Keyword

The simplest method to delete a column from a DataFrame is to use the del keyword:


del df['D']

print(df)

Column ‘D’ has been removed from our DataFrame:


   A  B  C   E   F
0  1  4  7  20   5
1  2  5  8  30   7
2  3  6  9  40   9

Using the pop() Method

Another way to delete a column is to use the pop() method. This method not only deletes the column but also returns it in case you need to keep the removed data.


column_e = df.pop('E')

print(df)
print(column_e)

The DataFrame after ‘E’ is popped out followed by the content of ‘E’:


   A  B  C   F
0  1  4  7   5
1  2  5  8   7
2  3  6  9   9


0    20
1    30
2    40
Name: E, dtype: int64

Using the drop() Method

The drop() method is the most flexible way to remove a column. You can specify the axis=1 parameter to drop a column and axis=0 to drop a row. This method returns a new DataFrame by default, leaving the original DataFrame unchanged. To modify the original DataFrame in place, you can use the inplace=True argument.


df = df.drop('F', axis=1)

print(df)

Now the DataFrame lacks column ‘F’:

In conclusion, adding and deleting columns are essential operations when manipulating data in Pandas DataFrames. By using the methods described above, you can easily tailor your DataFrames to your specific requirements, making data analysis and processing both efficient and intuitive. Keep in mind that while some operations modify the DataFrame in place, others return a modified copy, allowing you to preserve the original dataset when necessary. With these techniques in your data wrangling toolkit, you’re well-equipped to handle a wide variety of data management tasks.

About Editorial Team

Our Editorial Team is made up of tech enthusiasts who are highly skilled in Apache Spark, PySpark, and Machine Learning. They are also proficient in Python, Pandas, R, Hive, PostgreSQL, Snowflake, and Databricks. They aren't just experts; they are passionate teachers. They are dedicated to making complex data concepts easy to understand through engaging and simple tutorials with examples.