BUG: Mergesort is unstable when ascending=False #6399
Labels
Algos
Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff
Bug
Numeric Operations
Arithmetic, Comparison, and Logical operations
Milestone
The Issue
When using
DataFrame.sort_by(kind='mergesort')
, sorting is supposed to be stable. Unfortunately, that is not the case whenascending=False
.More Info
Inside the
sort_by()
source code,argsort()
is called on the sorted column. Then,ascending
is checked and if it is False, the indexes are simply reversed. Here is the relevant code snippet inpandas/core/frame.py
:Since numpy always sorts in ascending order, this actually guarantees sorting is always unstable! Check this out:
Clearly, simply reversing the indices doesn't work. We need to sort a reversed
k
, reverse the indices, and then subtract the indices from the highest index so they correspond to the originalk
:The workaround in my code is to stable sort descending is reverse the DataFrame, sort ascending, and reverse again.
What is the best way to fix this? This is probably an easy fix, but I've never contributed to pandas, so I need to set up my fork and make sure I can run tests before working on a pull request.
My Versions
Here are my versions:
The text was updated successfully, but these errors were encountered: