You may use df.sort_values in order to sort Pandas DataFrame.
In this short guide, you’ll see 4 examples of sorting:
- A column in an ascending order
- A column in a descending order
- By multiple columns – Case 1
- By multiple columns – Case 2
To start with a simple example, let’s say that you have the following data about cars:
Brand | Price | Year |
HH | 22000 | 2015 |
TT | 25000 | 2013 |
FF | 27000 | 2018 |
AA | 35000 | 2018 |
You can then capture that data in Python by creating the following DataFrame:
import pandas as pd data = {'Brand': ['HH', 'TT', 'FF', 'AA'], 'Price': [22000, 25000, 27000, 35000], 'Year': [2015, 2013, 2018, 2018] } df = pd.DataFrame(data) print(df)
And if you run the above Python code, you’ll get the following DataFrame:
Brand Price Year
0 HH 22000 2015
1 TT 25000 2013
2 FF 27000 2018
3 AA 35000 2018
Next, you’ll see how to sort that DataFrame using 4 different examples.
Example 1: Sort Pandas DataFrame in an ascending order
Let’s say that you want to sort the DataFrame, such that the Brand will be displayed in an ascending order. In that case, you’ll need to add the following syntax to the code:
df.sort_values(by=['Brand'], inplace=True)
Note that unless specified, the values will be sorted in an ascending order by default.
The full Python code would look like this:
import pandas as pd data = {'Brand': ['HH', 'TT', 'FF', 'AA'], 'Price': [22000, 25000, 27000, 35000], 'Year': [2015, 2013, 2018, 2018] } df = pd.DataFrame(data) # Sort Brand in ascending order df.sort_values(by=['Brand'], inplace=True) print(df)
When you run the code, you’ll notice that the Brand will indeed get sorted in an ascending order, where ‘AA’ would be the first record, while ‘TT’ would be the last:
Brand Price Year
3 AA 35000 2018
2 FF 27000 2018
0 HH 22000 2015
1 TT 25000 2013
Example 2: Sort Pandas DataFrame in a descending order
Alternatively, you can sort the Brand column in a descending order. To do that, simply add the condition of ascending=False in the following manner:
df.sort_values(by=['Brand'], inplace=True, ascending=False)
And the complete Python code would be:
import pandas as pd data = {'Brand': ['HH', 'TT', 'FF', 'AA'], 'Price': [22000, 25000, 27000, 35000], 'Year': [2015, 2013, 2018, 2018] } df = pd.DataFrame(data) # Sort Brand in descending order df.sort_values(by=['Brand'], inplace=True, ascending=False) print(df)
You’ll now notice that ‘TT’ would be the first record, while ‘AA’ would be the last (as you would expect to get when applying a descending order for our sample):
Brand Price Year
1 TT 25000 2013
0 HH 22000 2015
2 FF 27000 2018
3 AA 35000 2018
Example 3: Sort by multiple columns – case 1
But what if you want to sort by multiple columns?
In that case, you may use the following template to sort by multiple columns:
df.sort_values(by=['First Column', 'Second Column',...], inplace=True)
Suppose that you want to sort by both the ‘Year’ and the ‘Price.’ Since you have two records where the Year is 2018 (i.e., for the ‘FF’ and the ‘AA’ brands), then sorting by a second column – the ‘Price’ column – would be useful:
df.sort_values(by=['Year', 'Price'], inplace=True)
Here is the Python code that you may use:
import pandas as pd data = {'Brand': ['HH', 'TT', 'FF', 'AA'], 'Price': [22000, 25000, 27000, 35000], 'Year': [2015, 2013, 2018, 2018] } df = pd.DataFrame(data) # Sort by multiple columns: Year and Price df.sort_values(by=['Year', 'Price'], inplace=True) print(df)
Notice that all the records are now sorted by both the year and the price in an ascending order, so ‘FF’ would appear before ‘AA’:
Brand Price Year
1 TT 25000 2013
0 HH 22000 2015
2 FF 27000 2018
3 AA 35000 2018
Also note that the ‘Year’ column takes the priority when performing the sorting, as it was placed in the df.sort_values before the ‘Price’ column.
Example 4: Sort by multiple columns – case 2
Finally, let’s sort by the columns of ‘Year’ and ‘Brand’ as follows:
df.sort_values(by=['Year', 'Brand'], inplace=True)
The complete Python code would look like this:
import pandas as pd data = {'Brand': ['HH', 'TT', 'FF', 'AA'], 'Price': [22000, 25000, 27000, 35000], 'Year': [2015, 2013, 2018, 2018] } df = pd.DataFrame(data) # Sort by multiple columns: Year and Brand df.sort_values(by=['Year', 'Brand'], inplace=True) print(df)
You’ll now see that all the records are sorted by both the year and the brand in an ascending order, so this time ‘AA’ would appear prior to ‘FF’:
Brand Price Year
1 TT 25000 2013
0 HH 22000 2015
3 AA 35000 2018
2 FF 27000 2018
You may want to check the Pandas Documentation to learn more about sorting values in Pandas DataFrame.