How to get column names in Pandas Dataframe

Table of ContentsHide

Get Column names in Pandas DataFrame
1. Pandas Get column names using column attribute
2. Get a list from Pandas DataFrame column headers
3. Get Pandas Column names with datatype
4. Get the list of columns from Pandas Dataframe based on specific Datatype
5. Get Pandas Dataframe Columns names sorted
6. Pandas Get Column Names With NaN
Conclusion

Pandas DataFrame isTwo-dimensional, size-mutable, potentially heterogeneous tabular data. Pandas DataFrame consists of rows and columns to store the data. Each column will have its own header name that can be used to identify the columns.

This tutorial will explore different methods available to get column names in Pandas Dataframe with examples.

Get Column names in Pandas DataFrame

Let us consider a simple dataframe that we will be using throughout the tutorial.

# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })print(df)

Output

 Software_Names Rating Total_Qty_In_Stock Unit_Price Total_Sales0 Windows Defender 4.2 10 23.55 31 AVG Antivirus 3.7 4 NaN 12 Mcafee Antivirus 4.0 8 32.78 73 Kaspersky Security 4.5 3 33.00 54 Norton Antivirus 3.0 5 NaN 115 Bit Defender 4.7 20 45.00 14

Pandas Get column names using column attribute

The easiest way to get the column names in Pandas Dataframe is using the Columns attribute. The df.columns attribute returns all the column labels of the dataframe.

Syntax

df.columns

Let us check how it works with an example.

# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] }) # print all the columns in the dataframeprint(df.columns)

Output

Index(['Software_Names', 'Rating', 'Total_Qty_In_Stock', 'Unit_Price', 'Total_Sales'], dtype='object')

Get a list from Pandas DataFrame column headers

If you are using Python 3.5 and above or the latest Pandas version 1.4 or above, you could use df.columns.values that return all the columns as NumPy array or list.

Syntax

See Also

Pandas Get Column Names from DataFrame

df.columns.values

Let us check how it works with an example.

# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })column_list = df.columns.values# print all the columns in the dataframeprint(column_list)

Output

['Software_Names' 'Rating' 'Total_Qty_In_Stock' 'Unit_Price' 'Total_Sales']

If you are using an older version of Python and Pandas, you need to convert the NumPy array into a list using the tolist() method.

Syntax

df.columns.values.tolist()

Let us check how it works with an example.

# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })column_list = df.columns.values.tolist()# print all the columns in the dataframeprint(column_list)

Output

['Software_Names' 'Rating' 'Total_Qty_In_Stock' 'Unit_Price' 'Total_Sales']

Another way to get the list of column headers from Pandas Dataframe is using the list() method.

We can pass the Dataframe object to the list() method, and it returns all the column headers as a list.

Syntax

columns_list = list(df)

# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })column_list = list(df)# pandas print column namesprint(column_list)

Output

['Software_Names' 'Rating' 'Total_Qty_In_Stock' 'Unit_Price' 'Total_Sales']

Get Pandas Column names with datatype

We may need to fetch the column name with its type in specific situations. In that case, we can use the dtypes attribute. This returns a Series with the data type of each column in the dataframe.

Syntax

df.dtypes

Let us check how it works with an example.

# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })# pandas print column names with datatypeprint(df.dtypes)

Output

Software_Names objectRating float64Total_Qty_In_Stock int64Unit_Price float64Total_Sales int64dtype: object

Get the list of columns from Pandas Dataframe based on specific Datatype

Here let us check how toget a list from dataframe column headers based on the data type of the column.

For instance, if we need to fetch all the columns names of datatype int64. We can useselect_dtypes()method available in the dataframe. Theselect_dtypes()method returns a subset of the DataFrame’s columns based on the column dtypes.

Syntax

DataFrame.select_dtypes(include=None,exclude=None)

Let us check how it works with an example.

# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })# pandas print column names based on datatypeprint(df.select_dtypes('int64').columns.values)

Output

['Total_Qty_In_Stock' 'Total_Sales']

Get Pandas Dataframe Columns names sorted

The sorted() method accepts the dataframe and returns a list of column names or headers sorted alphabetically.

Syntax

sorted(df)

Let us check how it works with an example.

# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })# pandas print column names sorted alphabeticallyprint(sorted(df))

Output

['Rating', 'Software_Names', 'Total_Qty_In_Stock', 'Total_Sales', 'Unit_Price']

Pandas Get Column Names With NaN

We can also get all the column headers with NaN. In Pandas, the missing values are denoted using the NaN.

We can use isna() and isnull() methods in Pandas to get all the columns with missing data.

The isna() method returns a boolean same-sized object indicating if the values are NA. NA values, such as None or numpy.NaN, gets mapped to True values. Everything else gets mapped to False values.

Syntax

df.isna().any()

Let us check how it works with an example.

# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })# pandas print column names which are NaNprint(df.isna().any())

Output

Software_Names FalseRating FalseTotal_Qty_In_Stock FalseUnit_Price TrueTotal_Sales Falsedtype: bool

Syntax

df.isnull().any()

This isnull() function takes a scalar or array-like object and indicates whether values are missing (NaN in numeric arrays, None or NaN in object arrays, NaT in datetimelike).

Let us check how it works with an example.

# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })# pandas print column names which are NaNprint(df.isnull().any())

Output

Software_Names FalseRating FalseTotal_Qty_In_Stock FalseUnit_Price TrueTotal_Sales Falsedtype: bool

Conclusion

Pandas Datafrmae consists of rows and columns to store data. Each columns will have its own header name to identify the column.

We have used multiple ways to get the column names in Pandas Dataframe using attributes and methods such as df.columns, df.columns.values, df.columns.values.tolist(), list(df) etc.

How to get column names in Pandas Dataframe - ItsMyCode (2024)

Table of ContentsHide

Get Column names in Pandas DataFrame

Pandas Get column names using column attribute

Get a list from Pandas DataFrame column headers

Get Pandas Column names with datatype

Get the list of columns from Pandas Dataframe based on specific Datatype

Get Pandas Dataframe Columns names sorted

Pandas Get Column Names With NaN

Conclusion