You can select and get rows, columns, and elements in pandas.DataFrame
and pandas.Series
by index (numbers and names) using []
(square brackets).
Contents
- Select columns by column numbers/names using []
- [Column name]: Get a single column as pandas.Series
- [List of column names]: Get single or multiple columns as pandas.DataFrame
- loc and iloc
- Select rows by row numbers/names using []
- [Slice of row number/name]: Get single or multiple rows as pandas.DataFrame
- loc and iloc
- [Boolean array/Series]: Get True rows as pandas.DataFrame
- Select elements of pandas.Series by numbers/names
- [Number/name]: Get the value of a single element
- [List of numbers/names]: Get single or multiple elements as pandas.Series
- [Slice of numbers/names]: Get single or multiple elements as pandas.Series
- [Boolean array/Series]: Get True elements as pandas.Series
- Select elements of pandas.DataFrame
- Caution when row and column names are integers
You can use at
, iat
, loc
, and iloc
to select a range more explicitly. It is also possible to select columns by slice and rows by row name/number or a list of them.
- pandas: Get/Set values with loc, iloc, at, iat
Use set_index()
to assign a column to index
.
- pandas: Assign existing column to the DataFrame index with set_index()
The sample code in this article is based on pandas version 2.0.3
. The following DataFrame
is used as an example.
import pandas as pdprint(pd.__version__)# 2.0.3df = pd.DataFrame({'col_0': ['00', '10', '20', '30', '40'], 'col_1': ['01', '11', '21', '31', '41'], 'col_2': ['02', '12', '22', '32', '42'], 'col_3': ['03', '13', '23', '33', '43']}, index=['row_0', 'row_1', 'row_2', 'row_3', 'row_4'])print(df)# col_0 col_1 col_2 col_3# row_0 00 01 02 03# row_1 10 11 12 13# row_2 20 21 22 23# row_3 30 31 32 33# row_4 40 41 42 43
source: pandas_index.py
Select columns by column numbers/names using []
[Column name]
: Get a single column as pandas.Series
You can get the column as Series
by specifying the column name in []
.
print(df['col_2'])print(type(df['col_2']))# row_0 02# row_1 12# row_2 22# row_3 32# row_4 42# Name: col_2, dtype: object# <class 'pandas.core.series.Series'>
source: pandas_index.py
You may also specify a column name like .<column_name>
. Note that if the column name conflicts with existing method names, the method takes precedence.
print(df.col_2)# row_0 02# row_1 12# row_2 22# row_3 32# row_4 42# Name: col_2, dtype: object
source: pandas_index.py
[List of column names]
: Get single or multiple columns as pandas.DataFrame
You can get multiple columns as DataFrame
by specifying a list of column names in []
. The columns will be in the order of the specified list.
print(df[['col_2', 'col_0']])print(type(df[['col_2', 'col_0']]))# col_2 col_0# row_0 02 00# row_1 12 10# row_2 22 20# row_3 32 30# row_4 42 40# <class 'pandas.core.frame.DataFrame'>
source: pandas_index.py
If you specify a list with one element, a single-column DataFrame
is returned, not a Series
.
print(df[['col_2']])print(type(df[['col_2']]))# col_2# row_0 02# row_1 12# row_2 22# row_3 32# row_4 42# <class 'pandas.core.frame.DataFrame'>
source: pandas_index.py
loc
and iloc
You can also use loc
to specify a slice based on column names, and iloc
to select columns by their numbers, either individually or as a range (list or slice).
- pandas: Get/Set values with loc, iloc, at, iat
print(df.loc[:, 'col_1':'col_3'])# col_1 col_2 col_3# row_0 01 02 03# row_1 11 12 13# row_2 21 22 23# row_3 31 32 33# row_4 41 42 43print(df.iloc[:, 2])# row_0 02# row_1 12# row_2 22# row_3 32# row_4 42# Name: col_2, dtype: object
source: pandas_index.py
Select rows by row numbers/names using []
[Slice of row number/name]
: Get single or multiple rows as pandas.DataFrame
You can get multiple rows as a DataFrame
by specifying a slice in []
.
print(df[1:4])print(type(df[1:4]))# col_0 col_1 col_2 col_3# row_1 10 11 12 13# row_2 20 21 22 23# row_3 30 31 32 33# <class 'pandas.core.frame.DataFrame'>
source: pandas_index.py
You may specify a negative value and step
(start:stop:step
) as in a normal slice. For example, you can use slices to extract odd or even rows.
- How to slice a list, string, tuple in Python
print(df[:-3])# col_0 col_1 col_2 col_3# row_0 00 01 02 03# row_1 10 11 12 13print(df[::2])# col_0 col_1 col_2 col_3# row_0 00 01 02 03# row_2 20 21 22 23# row_4 40 41 42 43print(df[1::2])# col_0 col_1 col_2 col_3# row_1 10 11 12 13# row_3 30 31 32 33
source: pandas_index.py
An error is raised if a row number is specified alone instead of a slice.
# print(df[1])# KeyError: 1
source: pandas_index.py
If only one row is selected, a single-row DataFrame
is returned, not a Series
.
print(df[1:2])print(type(df[1:2]))# col_0 col_1 col_2 col_3# row_1 10 11 12 13# <class 'pandas.core.frame.DataFrame'>
source: pandas_index.py
You may also specify a slice of row name instead of row number. In the case of a slice with row name, the stop
row is included.
print(df['row_1':'row_3'])# col_0 col_1 col_2 col_3# row_1 10 11 12 13# row_2 20 21 22 23# row_3 30 31 32 33
source: pandas_index.py
loc
and iloc
You can also use loc
to select rows by their names, and iloc
to select rows by their numbers, either individually or in a list. See the following article for details.
- pandas: Get/Set values with loc, iloc, at, iat
print(df.loc[['row_1', 'row_3']])# col_0 col_1 col_2 col_3# row_1 10 11 12 13# row_3 30 31 32 33print(df.iloc[1])# col_0 10# col_1 11# col_2 12# col_3 13# Name: row_1, dtype: object
source: pandas_index.py
[Boolean array/Series]
: Get True
rows as pandas.DataFrame
By specifying a Boolean array (list
or numpy.ndarray
) in []
, you can extract the True
rows as DataFrame
.
l_bool = [True, False, False, True, True]print(df[l_bool])print(type(df[l_bool]))# col_0 col_1 col_2 col_3# row_0 00 01 02 03# row_3 30 31 32 33# row_4 40 41 42 43# <class 'pandas.core.frame.DataFrame'>
source: pandas_index_boolean_indexing.py
If there is only one True
, a single-row DataFrame
is returned, not a Series
.
l_bool = [True, False, False, False, False]print(df[l_bool])print(type(df[l_bool]))# col_0 col_1 col_2 col_3# row_0 00 01 02 03# <class 'pandas.core.frame.DataFrame'>
source: pandas_index_boolean_indexing.py
An error is raised if the number of elements does not match.
# print(df[[True, False, False]])# ValueError: Item wrong length 3 instead of 5.
source: pandas_index_boolean_indexing.py
You can also specify the Boolean Series
. Rows are extracted based on labels, not order.
s_bool = pd.Series([True, False, False, True, True], index=reversed(df.index))print(s_bool)# row_4 True# row_3 False# row_2 False# row_1 True# row_0 True# dtype: boolprint(df[s_bool])# col_0 col_1 col_2 col_3# row_0 00 01 02 03# row_1 10 11 12 13# row_4 40 41 42 43
source: pandas_index_boolean_indexing.py
An error is raised if the number of elements or labels does not match.
s_bool_wrong = pd.Series([True, False, False, True, True], index=['A', 'B', 'C', 'D', 'E'])# print(df[s_bool_wrong])# IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
source: pandas_index_boolean_indexing.py
Using a Boolean Series
, you can select rows by conditions. Refer to the following article for details.
- pandas: Select rows by multiple conditions
Select elements of pandas.Series
by numbers/names
Consider the following Series
as an example.
s = df['col_0']print(s)# row_0 00# row_1 10# row_2 20# row_3 30# row_4 40# Name: col_0, dtype: object
source: pandas_index_series.py
[Number/name]
: Get the value of a single element
You can get the value of the element by specifying the numbers (positions) or names (labels).
print(s[3])print(type(s[3]))# 30# <class 'str'>
source: pandas_index_series.py
When specifying by number, you can use a negative value to indicate the position from the end. -1
represents the last element.
You may also specify the name like .<name>
. Note that if the name conflicts with existing method names, the method takes precedence.
print(s[-1])# 40print(s['row_0'])# 00print(s.row_0)# 00
source: pandas_index_series.py
[List of numbers/names]
: Get single or multiple elements as pandas.Series
You can select multiple values as Series
by specifying a list of numbers/names. The elements will be in the order of the specified list. If a list with one element is specified, a single-element Series
is returned.
print(s[[3, 1]])print(type(s[[3, 1]]))# row_3 30# row_1 10# Name: col_0, dtype: object# <class 'pandas.core.series.Series'>print(s[[1]])print(type(s[[1]]))# row_1 10# Name: col_0, dtype: object# <class 'pandas.core.series.Series'>
source: pandas_index_series.py
You can specify a list of names.
print(s[['row_3', 'row_1']])# row_3 30# row_1 10# Name: col_0, dtype: objectprint(s[['row_1']])# row_1 10# Name: col_0, dtype: object
source: pandas_index_series.py
[Slice of numbers/names]
: Get single or multiple elements as pandas.Series
You can also select multiple values as Series
by specifying a slice of numbers/names. If only one element is selected, a single-element Series
is returned.
print(s[1:3])print(type(s[1:3]))# row_1 10# row_2 20# Name: col_0, dtype: object# <class 'pandas.core.series.Series'>print(s[1:2])print(type(s[1:2]))# row_1 10# Name: col_0, dtype: object# <class 'pandas.core.series.Series'>
source: pandas_index_series.py
In the case of name slices, the stop
element of start:stop
is also included.
print(s['row_1':'row_3'])# row_1 10# row_2 20# row_3 30# Name: col_0, dtype: objectprint(s['row_1':'row_1'])# row_1 10# Name: col_0, dtype: object
source: pandas_index_series.py
[Boolean array/Series]
: Get True
elements as pandas.Series
By specifying a Boolean array (list
or numpy.ndarray
) in []
, you can extract the True
elements as Series
.
l_bool = [True, False, False, True, True]print(s[l_bool])print(type(s[l_bool]))# row_0 00# row_3 30# row_4 40# Name: col_0, dtype: object# <class 'pandas.core.series.Series'>
source: pandas_index_boolean_indexing.py
If there is only one True
, a single-element Series
is returned.
l_bool = [True, False, False, False, False]print(s[l_bool])print(type(s[l_bool]))# row_0 00# Name: col_0, dtype: object# <class 'pandas.core.series.Series'>
source: pandas_index_boolean_indexing.py
An error is raised if the number of elements does not match.
# print(s[[True, False, False]])# IndexError: Boolean index has wrong length: 3 instead of 5
source: pandas_index_boolean_indexing.py
You can also specify the Boolean Series
. Elements are extracted based on labels, not order.
s_bool = pd.Series(l_bool, index=reversed(df.index))print(s_bool)# row_4 True# row_3 False# row_2 False# row_1 False# row_0 False# dtype: boolprint(s[s_bool])# row_4 40# Name: col_0, dtype: object
source: pandas_index_boolean_indexing.py
An error is raised if the number of elements or labels does not match.
s_bool_wrong = pd.Series([True, False, False, True, True], index=['A', 'B', 'C', 'D', 'E'])# print(s[s_bool_wrong])# IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
source: pandas_index_boolean_indexing.py
Select elements of pandas.DataFrame
You can get the value of an element from a DataFrame
by first extracting a Series
and then retrieving the value from that Series
.
print(df['col_1']['row_2'])# 21
source: pandas_index.py
You may also extract any group by slices or lists.
print(df['row_1':'row_3'][['col_1', 'col_3']])# col_1 col_3# row_1 11 13# row_2 21 23# row_3 31 33
source: pandas_index.py
However, using this approach ([...][...]
), known as chained indexing, might trigger a SettingWithCopyWarning
when you assign values.
- pandas: How to fix SettingWithCopyWarning: A value is trying to be set on ...
You can select rows or columns at once with at
, iat
, loc
, or iloc
.
- pandas: Get/Set values with loc, iloc, at, iat
print(df.at['row_2', 'col_1'])# 21print(df.loc['row_1':'row_3', ['col_1', 'col_3']])# col_1 col_3# row_1 11 13# row_2 21 23# row_3 31 33
source: pandas_index.py
Caution when row and column names are integers
Be careful when row and column names are integers.
Consider the following DataFrame
as an example.
df = pd.DataFrame([[0, 10, 20], [30, 40, 50], [60, 70, 80]], index=[2, 0, 1], columns=[1, 2, 0])print(df)# 1 2 0# 2 0 10 20# 0 30 40 50# 1 60 70 80
source: pandas_index_num.py
If you use a [scalar value]
or [list]
, the specified value is interpreted as a column name.
print(df[0])# 2 20# 0 50# 1 80# Name: 0, dtype: int64print(df[[0, 2]])# 0 2# 2 20 10# 0 50 40# 1 80 70
source: pandas_index_num.py
If [slice]
, the specified value is considered a row number, not a row name. Negative values are also allowed.
print(df[:2])# 1 2 0# 2 0 10 20# 0 30 40 50print(df[-2:])# 1 2 0# 0 30 40 50# 1 60 70 80
source: pandas_index_num.py
Use loc
or iloc
to clearly specify whether it is a name (label) or a number (position). In the case of loc
, the stop
element of start:stop
is also included.
print(df.loc[:2])# 1 2 0# 2 0 10 20print(df.iloc[:2])# 1 2 0# 2 0 10 20# 0 30 40 50
source: pandas_index_num.py
For Series
:
s = df[2]print(s)# 2 10# 0 40# 1 70# Name: 2, dtype: int64
source: pandas_index_num.py
In Series
, the specified value is considered a name, not a number.
print(s[0])# 40
source: pandas_index_num.py
Use at
or iat
to clearly specify whether it is a name or a number.
print(s.at[0])# 40print(s.iat[0])# 10
source: pandas_index_num.py
Note that specifying [-1]
is considered as referring to a label named -1
, not the last element. You can use iat
.
# print(s[-1])# KeyError: -1print(s.iat[-1])# 70
source: pandas_index_num.py
Thus, it is better to use at
, iat
, loc
, or iloc
when the row or column name is an integer.