Table of ContentsHide
- Get Column names in Pandas DataFrame
- Pandas Get column names using column attribute
- Get a list from Pandas DataFrame column headers
- Get Pandas Column names with datatype
- Get the list of columns from Pandas Dataframe based on specific Datatype
- Get Pandas Dataframe Columns names sorted
- Pandas Get Column Names With NaN
- Conclusion
Pandas DataFrame isTwo-dimensional, size-mutable, potentially heterogeneous tabular data. Pandas DataFrame consists of rows and columns to store the data. Each column will have its own header name that can be used to identify the columns.
This tutorial will explore different methods available to get column names in Pandas Dataframe with examples.
Get Column names in Pandas DataFrame
Let us consider a simple dataframe that we will be using throughout the tutorial.
# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })print(df)
Output
Software_Names Rating Total_Qty_In_Stock Unit_Price Total_Sales0 Windows Defender 4.2 10 23.55 31 AVG Antivirus 3.7 4 NaN 12 Mcafee Antivirus 4.0 8 32.78 73 Kaspersky Security 4.5 3 33.00 54 Norton Antivirus 3.0 5 NaN 115 Bit Defender 4.7 20 45.00 14
Pandas Get column names using column attribute
The easiest way to get the column names in Pandas Dataframe is using the Columns attribute. The df.columns
attribute returns all the column labels of the dataframe.
Syntax
df.columns
Let us check how it works with an example.
# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] }) # print all the columns in the dataframeprint(df.columns)
Output
Index(['Software_Names', 'Rating', 'Total_Qty_In_Stock', 'Unit_Price', 'Total_Sales'], dtype='object')
Get a list from Pandas DataFrame column headers
If you are using Python 3.5 and above or the latest Pandas version 1.4 or above, you could use df.columns.values
that return all the columns as NumPy array or list.
Syntax
df.columns.values
Let us check how it works with an example.
# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })column_list = df.columns.values# print all the columns in the dataframeprint(column_list)
Output
['Software_Names' 'Rating' 'Total_Qty_In_Stock' 'Unit_Price' 'Total_Sales']
If you are using an older version of Python and Pandas, you need to convert the NumPy array into a list using the tolist()
method.
Syntax
df.columns.values.tolist()
Let us check how it works with an example.
# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })column_list = df.columns.values.tolist()# print all the columns in the dataframeprint(column_list)
Output
['Software_Names' 'Rating' 'Total_Qty_In_Stock' 'Unit_Price' 'Total_Sales']
Another way to get the list of column headers from Pandas Dataframe is using the list()
method.
We can pass the Dataframe object to the list()
method, and it returns all the column headers as a list.
Syntax
columns_list = list(df)
# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })column_list = list(df)# pandas print column namesprint(column_list)
Output
['Software_Names' 'Rating' 'Total_Qty_In_Stock' 'Unit_Price' 'Total_Sales']
Get Pandas Column names with datatype
We may need to fetch the column name with its type in specific situations. In that case, we can use the dtypes attribute. This returns a Series with the data type of each column in the dataframe.
Syntax
df.dtypes
Let us check how it works with an example.
# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })# pandas print column names with datatypeprint(df.dtypes)
Output
Software_Names objectRating float64Total_Qty_In_Stock int64Unit_Price float64Total_Sales int64dtype: object
Get the list of columns from Pandas Dataframe based on specific Datatype
Here let us check how toget a list from dataframe column headers based on the data type of the column.
For instance, if we need to fetch all the columns names of datatype int64
. We can useselect_dtypes()
method available in the dataframe. Theselect_dtypes()
method returns a subset of the DataFrame’s columns based on the column dtypes.
Syntax
DataFrame.select_dtypes(include=None,exclude=None)
Let us check how it works with an example.
# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })# pandas print column names based on datatypeprint(df.select_dtypes('int64').columns.values)
Output
['Total_Qty_In_Stock' 'Total_Sales']
Get Pandas Dataframe Columns names sorted
The sorted()
method accepts the dataframe and returns a list of column names or headers sorted alphabetically.
Syntax
sorted(df)
Let us check how it works with an example.
# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })# pandas print column names sorted alphabeticallyprint(sorted(df))
Output
['Rating', 'Software_Names', 'Total_Qty_In_Stock', 'Total_Sales', 'Unit_Price']
Pandas Get Column Names With NaN
We can also get all the column headers with NaN
. In Pandas, the missing values are denoted using the NaN
.
We can use isna()
and isnull()
methods in Pandas to get all the columns with missing data.
The isna()
method returns a boolean same-sized object indicating if the values are NA. NA values, such as None
or numpy.NaN
, gets mapped to True
values. Everything else gets mapped to False
values.
Syntax
df.isna().any()
Let us check how it works with an example.
# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })# pandas print column names which are NaNprint(df.isna().any())
Output
Software_Names FalseRating FalseTotal_Qty_In_Stock FalseUnit_Price TrueTotal_Sales Falsedtype: bool
Syntax
df.isnull().any()
This isnull()
function takes a scalar or array-like object and indicates whether values are missing (NaN
in numeric arrays, None
or NaN
in object arrays, NaT
in datetimelike).
Let us check how it works with an example.
# import pandas libraryimport numpy as npimport pandas as pd# create pandas DataFramedf = pd.DataFrame({'Software_Names': ['Windows Defender', 'AVG Antivirus', 'Mcafee Antivirus', 'Kaspersky Security', 'Norton Antivirus', 'Bit Defender'], 'Rating': [4.2, 3.7, 4, 4.5, 3, 4.7], 'Total_Qty_In_Stock': [10, 4, 8, 3, 5, 20], 'Unit_Price': [23.55, np.nan, 32.78, 33.0, np.nan, 45], 'Total_Sales': [3, 1, 7, 5, 11, 14] })# pandas print column names which are NaNprint(df.isnull().any())
Output
Software_Names FalseRating FalseTotal_Qty_In_Stock FalseUnit_Price TrueTotal_Sales Falsedtype: bool
Conclusion
Pandas Datafrmae consists of rows and columns to store data. Each columns will have its own header name to identify the column.
We have used multiple ways to get the column names in Pandas Dataframe using attributes and methods such as df.columns
, df.columns.values
, df.columns.values.tolist()
, list(df)
etc.