pandas.DataFrame.sort_values#
- DataFrame.sort_values(by, *, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key=None)[source]#
Sort by the values along either axis.
- Parameters:
- bystr or list of str
Name or list of names to sort by.
if axis is 0 or ‘index’ then by may contain index levels and/or column labels.
if axis is 1 or ‘columns’ then by may contain column levels and/or index labels.
- axis“{0 or ‘index’, 1 or ‘columns’}”, default 0
Axis to be sorted.
- ascendingbool or list of bool, default True
Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.
- inplacebool, default False
If True, perform operation in-place.
- kind{‘quicksort’, ‘mergesort’, ‘heapsort’, ‘stable’}, default ‘quicksort’
Choice of sorting algorithm. See also
numpy.sort()
for more information. mergesort and stable are the only stable algorithms. For DataFrames, this option is only applied when sorting on a single column or label.- na_position{‘first’, ‘last’}, default ‘last’
Puts NaNs at the beginning if first; last puts NaNs at the end.
- ignore_indexbool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
- keycallable, optional
Apply the key function to the values before sorting. This is similar to the key argument in the builtin
sorted()
function, with the notable difference that this key function should be vectorized. It should expect aSeries
and return a Series with the same shape as the input. It will be applied to each column in by independently.
- Returns:
- DataFrame or None
DataFrame with sorted values or None if
inplace=True
.
See also
DataFrame.sort_index
Sort a DataFrame by the index.
Series.sort_values
Similar method for a Series.
Examples
>>> df = pd.DataFrame( ... { ... "col1": ["A", "A", "B", np.nan, "D", "C"], ... "col2": [2, 1, 9, 8, 7, 4], ... "col3": [0, 1, 9, 4, 2, 3], ... "col4": ["a", "B", "c", "D", "e", "F"], ... } ... ) >>> df col1 col2 col3 col4 0 A 2 0 a 1 A 1 1 B 2 B 9 9 c 3 NaN 8 4 D 4 D 7 2 e 5 C 4 3 F
Sort by a single column
In this case, we are sorting the rows according to values in
col1
:>>> df.sort_values(by=["col1"]) col1 col2 col3 col4 0 A 2 0 a 1 A 1 1 B 2 B 9 9 c 5 C 4 3 F 4 D 7 2 e 3 NaN 8 4 D
Sort by multiple columns
You can also provide multiple columns to
by
argument, as shown below. In this example, the rows are first sorted according tocol1
, and then the rows that have an identical value incol1
are sorted according tocol2
.>>> df.sort_values(by=["col1", "col2"]) col1 col2 col3 col4 1 A 1 1 B 0 A 2 0 a 2 B 9 9 c 5 C 4 3 F 4 D 7 2 e 3 NaN 8 4 D
Sort in a descending order
The sort order can be reversed using
ascending
argument, as shown below:>>> df.sort_values(by="col1", ascending=False) col1 col2 col3 col4 4 D 7 2 e 5 C 4 3 F 2 B 9 9 c 0 A 2 0 a 1 A 1 1 B 3 NaN 8 4 D
Placing any
NA
firstNote that in the above example, the rows that contain an
NA
value in theircol1
are placed at the end of the dataframe. This behavior can be modified viana_position
argument, as shown below:>>> df.sort_values(by="col1", ascending=False, na_position="first") col1 col2 col3 col4 3 NaN 8 4 D 4 D 7 2 e 5 C 4 3 F 2 B 9 9 c 0 A 2 0 a 1 A 1 1 B
Customized sort order
The
key
argument allows for a further customization of sorting behaviour. For example, you may want to ignore the letter’s case when sorting strings:>>> df.sort_values(by="col4", key=lambda col: col.str.lower()) col1 col2 col3 col4 0 A 2 0 a 1 A 1 1 B 2 B 9 9 c 3 NaN 8 4 D 4 D 7 2 e 5 C 4 3 F
Another typical example is natural sorting. This can be done using
natsort
package, which provides sorted indices according to their natural order, as shown below:>>> df = pd.DataFrame( ... { ... "time": ["0hr", "128hr", "72hr", "48hr", "96hr"], ... "value": [10, 20, 30, 40, 50], ... } ... ) >>> df time value 0 0hr 10 1 128hr 20 2 72hr 30 3 48hr 40 4 96hr 50 >>> from natsort import index_natsorted >>> index_natsorted(df["time"]) [0, 3, 2, 4, 1] >>> df.sort_values( ... by="time", ... key=lambda x: np.argsort(index_natsorted(x)), ... ) time value 0 0hr 10 3 48hr 40 2 72hr 30 4 96hr 50 1 128hr 20