閱讀(2.5k) 書簽贊(2) 我要糾錯

Pandas 排序

2022-07-13 10:43 更新

Pands 提供了兩種排序方法，分別是按標簽排序和按數(shù)值排序。本節(jié)講解 Pandas 的排序操作。

下面創(chuàng)建一組 DataFrame 數(shù)據，如下所示：

import pandas as pd
import numpy as np
#行標簽亂序排列，列標簽亂序排列
unsorted_df=pd.DataFrame(np.random.randn(10,2),index=[1,6,4,2,3,5,9,8,0,7],columns=['col2','col1'])
print(unsorted_df)

輸出結果：

       col2      col1
1 -0.053290 -1.442997
6 -0.203066 -0.702727
4  0.111759  0.965251
2 -0.896778  1.100156
3 -0.458899 -0.890152
5 -0.222691 -0.144881
9 -0.921674  0.510045
8 -0.130748 -0.734237
0  0.617717  0.456848
7  0.804284  0.653961

上述示例，行標簽和數(shù)值元素均未排序，下面分別使用標簽排序、數(shù)值排序對其進行操作。

按標簽排序

使用 sort_index() 方法對行標簽排序，指定軸參數(shù)（axis）或者排序順序?；蛘呖梢詫?DataFrame 進行排序。默認情況下，按照行標簽序排序。

import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],columns = ['col2','col1'])
sorted_df=unsorted_df.sort_index()
print(sorted_df)

輸出結果：

       col2      col1
0  2.113698 -0.299936
1 -0.550613  0.501497
2  0.056210  0.451781
3  0.074262 -1.249118
4 -0.038484 -0.078351
5  0.812215 -0.757685
6  0.687233 -0.356840
7 -0.483742  0.632428
8 -1.576988 -1.425604
9  0.776720  1.182877

1) 排序順序

通過將布爾值傳遞給ascending參數(shù)，可以控制排序的順序（行號順序）。示例如下：

import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],columns = ['col2','col1'])
sorted_df = unsorted_df.sort_index(ascending=False)
print(sorted_df)

輸出結果：

       col2      col1
9  2.389933  1.152328
8 -0.374969  0.182293
7 -0.823322 -0.104431
6 -0.566627 -1.020679
5  1.021873  0.315927
4  0.127070 -1.598591
3  0.258097  0.389310
2 -1.027768 -0.582664
1  0.766471 -0.043638
0  0.482486 -0.512309

按列標簽排序

通過給 axis 軸參數(shù)傳遞 0 或 1，可以對列標簽進行排序。默認情況下，axis=0 表示按行排序；而 axis=1 則表示按列排序。

import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame(np.random.randn(10,2),index=[1,4,6,2,3,5,9,8,0,7],columns = ['col2','col1'])
sorted_df=unsorted_df.sort_index(axis=1)
print (sorted_df)

輸出結果：

       col1      col2
1 -1.424992 -0.062026
4 -0.083513  1.884481
6 -1.335838  0.838729
2 -0.085384  0.178404
3  1.198965  0.089953
5  1.400264  0.213751
9 -0.992759  0.015740
8  1.586437 -0.406583
0 -0.842969  0.490832
7 -0.310137  0.485835

按值排序

與標簽排序類似，sort_values() 表示按值排序。它接受一個by參數(shù)，該參數(shù)值是要排序數(shù)列的 DataFrame 列名。示例如下：

import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame({'col1':[2,1,1,1],'col2':[1,3,2,4]})
sorted_df = unsorted_df.sort_values(by='col1')
print (sorted_df)

輸出結果：

   col1  col2
1     1     3
2     1     2
3     1     4
0     2     1

注意：當對 col1 列排序時，相應的 col2 列的元素值和行索引也會隨 col1 一起改變。by 參數(shù)可以接受一個列表參數(shù)值，如下所示：

import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame({'col1':[2,1,1,1],'col2':[1,3,2,4]})
sorted_df = unsorted_df.sort_values(by=['col1','col2'])
print (sorted_df）

輸出結果：

   col1  col2
2     1     2
1     1     3
3     1     4
0     2     1

排序算法

sort_values() 提供了參數(shù)kind用來指定排序算法。這里有三種排序算法：

mergesort
heapsort
quicksort

默認為 quicksort(快速排序) ，其中 Mergesort 歸并排序是最穩(wěn)定的算法。

import pandas as pd
import numpy as np
unsorted_df = pd.DataFrame({'col1':[2,1,1,1],'col2':[1,3,2,4]})
sorted_df = unsorted_df.sort_values(by='col1' ,kind='mergesort')
print (sorted_df)

輸出結果：

   col1  col2
1     1     3
2     1     2
3     1     4
0     2     1

以上內容是否對您有幫助：

← Pandas 遍歷

Pandas 去重 →

寫筆記

我要補充