pandas: Select rows/columns by index (numbers and names) | note.nkmk.me (2024)

You can select and get rows, columns, and elements in pandas.DataFrame and pandas.Series by index (numbers and names) using [] (square brackets).

Contents

  • Select columns by column numbers/names using []
    • [Column name]: Get a single column as pandas.Series
    • [List of column names]: Get single or multiple columns as pandas.DataFrame
    • loc and iloc
  • Select rows by row numbers/names using []
    • [Slice of row number/name]: Get single or multiple rows as pandas.DataFrame
    • loc and iloc
    • [Boolean array/Series]: Get True rows as pandas.DataFrame
  • Select elements of pandas.Series by numbers/names
    • [Number/name]: Get the value of a single element
    • [List of numbers/names]: Get single or multiple elements as pandas.Series
    • [Slice of numbers/names]: Get single or multiple elements as pandas.Series
    • [Boolean array/Series]: Get True elements as pandas.Series
  • Select elements of pandas.DataFrame
  • Caution when row and column names are integers

You can use at, iat, loc, and iloc to select a range more explicitly. It is also possible to select columns by slice and rows by row name/number or a list of them.

  • pandas: Get/Set values with loc, iloc, at, iat

Use set_index() to assign a column to index.

  • pandas: Assign existing column to the DataFrame index with set_index()

The sample code in this article is based on pandas version 2.0.3. The following DataFrame is used as an example.

import pandas as pdprint(pd.__version__)# 2.0.3df = pd.DataFrame({'col_0': ['00', '10', '20', '30', '40'], 'col_1': ['01', '11', '21', '31', '41'], 'col_2': ['02', '12', '22', '32', '42'], 'col_3': ['03', '13', '23', '33', '43']}, index=['row_0', 'row_1', 'row_2', 'row_3', 'row_4'])print(df)# col_0 col_1 col_2 col_3# row_0 00 01 02 03# row_1 10 11 12 13# row_2 20 21 22 23# row_3 30 31 32 33# row_4 40 41 42 43

Select columns by column numbers/names using []

[Column name]: Get a single column as pandas.Series

You can get the column as Series by specifying the column name in [].

print(df['col_2'])print(type(df['col_2']))# row_0 02# row_1 12# row_2 22# row_3 32# row_4 42# Name: col_2, dtype: object# <class 'pandas.core.series.Series'>

You may also specify a column name like .<column_name>. Note that if the column name conflicts with existing method names, the method takes precedence.

print(df.col_2)# row_0 02# row_1 12# row_2 22# row_3 32# row_4 42# Name: col_2, dtype: object

[List of column names]: Get single or multiple columns as pandas.DataFrame

You can get multiple columns as DataFrame by specifying a list of column names in []. The columns will be in the order of the specified list.

print(df[['col_2', 'col_0']])print(type(df[['col_2', 'col_0']]))# col_2 col_0# row_0 02 00# row_1 12 10# row_2 22 20# row_3 32 30# row_4 42 40# <class 'pandas.core.frame.DataFrame'>

If you specify a list with one element, a single-column DataFrame is returned, not a Series.

print(df[['col_2']])print(type(df[['col_2']]))# col_2# row_0 02# row_1 12# row_2 22# row_3 32# row_4 42# <class 'pandas.core.frame.DataFrame'>

loc and iloc

You can also use loc to specify a slice based on column names, and iloc to select columns by their numbers, either individually or as a range (list or slice).

  • pandas: Get/Set values with loc, iloc, at, iat
print(df.loc[:, 'col_1':'col_3'])# col_1 col_2 col_3# row_0 01 02 03# row_1 11 12 13# row_2 21 22 23# row_3 31 32 33# row_4 41 42 43print(df.iloc[:, 2])# row_0 02# row_1 12# row_2 22# row_3 32# row_4 42# Name: col_2, dtype: object

Select rows by row numbers/names using []

[Slice of row number/name]: Get single or multiple rows as pandas.DataFrame

You can get multiple rows as a DataFrame by specifying a slice in [].

print(df[1:4])print(type(df[1:4]))# col_0 col_1 col_2 col_3# row_1 10 11 12 13# row_2 20 21 22 23# row_3 30 31 32 33# <class 'pandas.core.frame.DataFrame'>

You may specify a negative value and step (start:stop:step) as in a normal slice. For example, you can use slices to extract odd or even rows.

  • How to slice a list, string, tuple in Python
print(df[:-3])# col_0 col_1 col_2 col_3# row_0 00 01 02 03# row_1 10 11 12 13print(df[::2])# col_0 col_1 col_2 col_3# row_0 00 01 02 03# row_2 20 21 22 23# row_4 40 41 42 43print(df[1::2])# col_0 col_1 col_2 col_3# row_1 10 11 12 13# row_3 30 31 32 33

An error is raised if a row number is specified alone instead of a slice.

# print(df[1])# KeyError: 1

If only one row is selected, a single-row DataFrame is returned, not a Series.

print(df[1:2])print(type(df[1:2]))# col_0 col_1 col_2 col_3# row_1 10 11 12 13# <class 'pandas.core.frame.DataFrame'>

You may also specify a slice of row name instead of row number. In the case of a slice with row name, the stop row is included.

print(df['row_1':'row_3'])# col_0 col_1 col_2 col_3# row_1 10 11 12 13# row_2 20 21 22 23# row_3 30 31 32 33

loc and iloc

You can also use loc to select rows by their names, and iloc to select rows by their numbers, either individually or in a list. See the following article for details.

  • pandas: Get/Set values with loc, iloc, at, iat
print(df.loc[['row_1', 'row_3']])# col_0 col_1 col_2 col_3# row_1 10 11 12 13# row_3 30 31 32 33print(df.iloc[1])# col_0 10# col_1 11# col_2 12# col_3 13# Name: row_1, dtype: object

[Boolean array/Series]: Get True rows as pandas.DataFrame

By specifying a Boolean array (list or numpy.ndarray) in [], you can extract the True rows as DataFrame.

l_bool = [True, False, False, True, True]print(df[l_bool])print(type(df[l_bool]))# col_0 col_1 col_2 col_3# row_0 00 01 02 03# row_3 30 31 32 33# row_4 40 41 42 43# <class 'pandas.core.frame.DataFrame'>

If there is only one True, a single-row DataFrame is returned, not a Series.

l_bool = [True, False, False, False, False]print(df[l_bool])print(type(df[l_bool]))# col_0 col_1 col_2 col_3# row_0 00 01 02 03# <class 'pandas.core.frame.DataFrame'>

An error is raised if the number of elements does not match.

# print(df[[True, False, False]])# ValueError: Item wrong length 3 instead of 5.

You can also specify the Boolean Series. Rows are extracted based on labels, not order.

s_bool = pd.Series([True, False, False, True, True], index=reversed(df.index))print(s_bool)# row_4 True# row_3 False# row_2 False# row_1 True# row_0 True# dtype: boolprint(df[s_bool])# col_0 col_1 col_2 col_3# row_0 00 01 02 03# row_1 10 11 12 13# row_4 40 41 42 43

An error is raised if the number of elements or labels does not match.

s_bool_wrong = pd.Series([True, False, False, True, True], index=['A', 'B', 'C', 'D', 'E'])# print(df[s_bool_wrong])# IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

Using a Boolean Series, you can select rows by conditions. Refer to the following article for details.

  • pandas: Select rows by multiple conditions

Select elements of pandas.Series by numbers/names

Consider the following Series as an example.

s = df['col_0']print(s)# row_0 00# row_1 10# row_2 20# row_3 30# row_4 40# Name: col_0, dtype: object

[Number/name]: Get the value of a single element

You can get the value of the element by specifying the numbers (positions) or names (labels).

print(s[3])print(type(s[3]))# 30# <class 'str'>

When specifying by number, you can use a negative value to indicate the position from the end. -1 represents the last element.

You may also specify the name like .<name>. Note that if the name conflicts with existing method names, the method takes precedence.

print(s[-1])# 40print(s['row_0'])# 00print(s.row_0)# 00

[List of numbers/names]: Get single or multiple elements as pandas.Series

You can select multiple values as Series by specifying a list of numbers/names. The elements will be in the order of the specified list. If a list with one element is specified, a single-element Series is returned.

print(s[[3, 1]])print(type(s[[3, 1]]))# row_3 30# row_1 10# Name: col_0, dtype: object# <class 'pandas.core.series.Series'>print(s[[1]])print(type(s[[1]]))# row_1 10# Name: col_0, dtype: object# <class 'pandas.core.series.Series'>

You can specify a list of names.

print(s[['row_3', 'row_1']])# row_3 30# row_1 10# Name: col_0, dtype: objectprint(s[['row_1']])# row_1 10# Name: col_0, dtype: object

[Slice of numbers/names]: Get single or multiple elements as pandas.Series

You can also select multiple values as Series by specifying a slice of numbers/names. If only one element is selected, a single-element Series is returned.

print(s[1:3])print(type(s[1:3]))# row_1 10# row_2 20# Name: col_0, dtype: object# <class 'pandas.core.series.Series'>print(s[1:2])print(type(s[1:2]))# row_1 10# Name: col_0, dtype: object# <class 'pandas.core.series.Series'>

In the case of name slices, the stop element of start:stop is also included.

print(s['row_1':'row_3'])# row_1 10# row_2 20# row_3 30# Name: col_0, dtype: objectprint(s['row_1':'row_1'])# row_1 10# Name: col_0, dtype: object

[Boolean array/Series]: Get True elements as pandas.Series

By specifying a Boolean array (list or numpy.ndarray) in [], you can extract the True elements as Series.

l_bool = [True, False, False, True, True]print(s[l_bool])print(type(s[l_bool]))# row_0 00# row_3 30# row_4 40# Name: col_0, dtype: object# <class 'pandas.core.series.Series'>

If there is only one True, a single-element Series is returned.

l_bool = [True, False, False, False, False]print(s[l_bool])print(type(s[l_bool]))# row_0 00# Name: col_0, dtype: object# <class 'pandas.core.series.Series'>

An error is raised if the number of elements does not match.

# print(s[[True, False, False]])# IndexError: Boolean index has wrong length: 3 instead of 5

You can also specify the Boolean Series. Elements are extracted based on labels, not order.

s_bool = pd.Series(l_bool, index=reversed(df.index))print(s_bool)# row_4 True# row_3 False# row_2 False# row_1 False# row_0 False# dtype: boolprint(s[s_bool])# row_4 40# Name: col_0, dtype: object

An error is raised if the number of elements or labels does not match.

s_bool_wrong = pd.Series([True, False, False, True, True], index=['A', 'B', 'C', 'D', 'E'])# print(s[s_bool_wrong])# IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

Select elements of pandas.DataFrame

You can get the value of an element from a DataFrame by first extracting a Series and then retrieving the value from that Series.

print(df['col_1']['row_2'])# 21

You may also extract any group by slices or lists.

print(df['row_1':'row_3'][['col_1', 'col_3']])# col_1 col_3# row_1 11 13# row_2 21 23# row_3 31 33

However, using this approach ([...][...]), known as chained indexing, might trigger a SettingWithCopyWarning when you assign values.

  • pandas: How to fix SettingWithCopyWarning: A value is trying to be set on ...

You can select rows or columns at once with at, iat, loc, or iloc.

  • pandas: Get/Set values with loc, iloc, at, iat
print(df.at['row_2', 'col_1'])# 21print(df.loc['row_1':'row_3', ['col_1', 'col_3']])# col_1 col_3# row_1 11 13# row_2 21 23# row_3 31 33

Caution when row and column names are integers

Be careful when row and column names are integers.

Consider the following DataFrame as an example.

df = pd.DataFrame([[0, 10, 20], [30, 40, 50], [60, 70, 80]], index=[2, 0, 1], columns=[1, 2, 0])print(df)# 1 2 0# 2 0 10 20# 0 30 40 50# 1 60 70 80

If you use a [scalar value] or [list], the specified value is interpreted as a column name.

print(df[0])# 2 20# 0 50# 1 80# Name: 0, dtype: int64print(df[[0, 2]])# 0 2# 2 20 10# 0 50 40# 1 80 70

If [slice], the specified value is considered a row number, not a row name. Negative values are also allowed.

print(df[:2])# 1 2 0# 2 0 10 20# 0 30 40 50print(df[-2:])# 1 2 0# 0 30 40 50# 1 60 70 80

Use loc or iloc to clearly specify whether it is a name (label) or a number (position). In the case of loc, the stop element of start:stop is also included.

print(df.loc[:2])# 1 2 0# 2 0 10 20print(df.iloc[:2])# 1 2 0# 2 0 10 20# 0 30 40 50

For Series:

s = df[2]print(s)# 2 10# 0 40# 1 70# Name: 2, dtype: int64

In Series, the specified value is considered a name, not a number.

print(s[0])# 40

Use at or iat to clearly specify whether it is a name or a number.

print(s.at[0])# 40print(s.iat[0])# 10

Note that specifying [-1] is considered as referring to a label named -1, not the last element. You can use iat.

# print(s[-1])# KeyError: -1print(s.iat[-1])# 70

Thus, it is better to use at, iat, loc, or iloc when the row or column name is an integer.

pandas: Select rows/columns by index (numbers and names) | note.nkmk.me (2024)
Top Articles
Latest Posts
Article information

Author: Twana Towne Ret

Last Updated:

Views: 5906

Rating: 4.3 / 5 (44 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Twana Towne Ret

Birthday: 1994-03-19

Address: Apt. 990 97439 Corwin Motorway, Port Eliseoburgh, NM 99144-2618

Phone: +5958753152963

Job: National Specialist

Hobby: Kayaking, Photography, Skydiving, Embroidery, Leather crafting, Orienteering, Cooking

Introduction: My name is Twana Towne Ret, I am a famous, talented, joyous, perfect, powerful, inquisitive, lovely person who loves writing and wants to share my knowledge and understanding with you.