Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: axis argument in groupby ops #50405

Closed
rhshadrach opened this issue Dec 23, 2022 · 1 comment · Fixed by #52018
Closed

DEPR: axis argument in groupby ops #50405

rhshadrach opened this issue Dec 23, 2022 · 1 comment · Fixed by #52018
Labels
Deprecate Functionality to remove in pandas Groupby
Milestone

Comments

@rhshadrach
Copy link
Member

Some groupby methods have an axis argument in addition to the axis argument of DataFrame.groupby. When these two axis arguments do not agree (e.g. df.groupby(..., axis=0).skew(axis=1)), it provides the same operation that can be achieved more efficiently without using groupby. More explicitly,

  • df.groupby(..., axis=1).op(axis=0) will operate column by column, the same as DataFrame.op(axis=0)
  • df.groupby(..., axis=0).op(axis=1) will operate row by row, the same as DataFrame.op(axis=1)

There is some reshaping that is necessary to get exact equality (see the code below) because groupby shapes the indices in a specific way.

In addition to this, various groupby ops either fail or effectively ignore argument (it is used in the code, but in such a way that it does not influence the result).

I propose we deprecate the axis argument across groupby ops. For those ops where it is necessary, the future behavior will have the op's axis value the same as that provided to groupby. In other words, all ops either act as df.groupby(..., axis=0).op(axis=0) or df.groupby(..., axis=1).op(axis=1).

Code:

`df.groupby(..., axis=0).op(axis=1)`
import inspect

import pandas._testing as tm
from pandas.core.groupby.base import reduction_kernels, groupby_other_methods, transformation_kernels
from pandas.tests.groupby import get_groupby_method_args

size = 10
df = pd.DataFrame(
    {
        'a': np.random.randint(0, 3, size),
        'b': np.random.randint(0, 10, size),
        'c': np.random.randint(0, 10, size),
        'd': np.random.randint(0, 10, size),
    }
)
gb = df.groupby('a', axis=0)

kernels = reduction_kernels | groupby_other_methods | transformation_kernels
for kernel in kernels:
    if kernel in ('plot',):
        continue
    try:
        op = getattr(gb, kernel)
        if not callable(op):
            continue
        arguments = inspect.signature(op).parameters.keys()
        if 'axis' not in arguments:
            continue
        args = get_groupby_method_args(kernel, df)
        result = op(*args, axis=1)
        
        if kernel in ('skew', 'idxmax', 'idxmin', 'fillna', 'take'):
            expected = getattr(df[['b', 'c', 'd']], kernel)(*args, axis=1)
        else:
            expected = getattr(df, kernel)(*args, axis=1)
    except Exception as err:
        print('Failed:', kernel, err)
    else:
        if kernel in ('skew', 'idxmax', 'idxmin'):
            expected = expected.to_frame('values')
        if kernel in ('diff', 'skew', 'idxmax', 'idxmin', 'take'):
            expected = expected.set_index(df['a'], append=True)
            expected = expected.swaplevel(0, 1).sort_index()
        if kernel in ('skew', 'idxmax', 'idxmin'):
            expected = expected['values'].rename(None)
        try:
            tm.assert_equal(result, expected)
        except Exception as err:
            print(kernel)
`df.groupby(..., axis=1).op(axis=0)`
import inspect

import pandas._testing as tm
from pandas.core.groupby.base import reduction_kernels, groupby_other_methods, transformation_kernels
from pandas.tests.groupby import get_groupby_method_args

size = 10
df = pd.DataFrame(
    {
        'a': np.random.randint(0, 3, size),
        'b': np.random.randint(0, 10, size),
        'c': np.random.randint(0, 10, size),
        'd': np.random.randint(0, 10, size),
    }
)
gb = df.groupby([0, 0, 0, 1], axis=1)

kernels = reduction_kernels | groupby_other_methods | transformation_kernels
for kernel in kernels:
    if kernel in ('plot',):
        continue
    if kernel in ('rank', 'cummax', 'cummin', 'cumprod', 'cumsum', 'shift', 'diff', 'pct_change'):
        # Already ignores the axis argument
        continue
    if kernel in ('idxmin', 'idxmax', 'skew'):
        # Raises when using .groupby(..., axis=1).op(axis=0)
        continue
    try:
        op = getattr(gb, kernel)
        if not callable(op):
            continue
        arguments = inspect.signature(op).parameters.keys()
        if 'axis' not in arguments:
            continue
        args = get_groupby_method_args(kernel, df)
        result = op(*args, axis=0)
        expected = getattr(df, kernel)(*args, axis=0)
    except Exception as err:
        print('Failed:', kernel, err)
    else:
        try:
            tm.assert_equal(result, expected)
        except Exception as err:
            print(kernel)
@rhshadrach rhshadrach added Groupby Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action labels Dec 23, 2022
@rhshadrach rhshadrach changed the title DEPR: axis argument in groupby DEPR: axis argument in groupby ops Dec 23, 2022
@jbrockmendel
Copy link
Member

+1

@rhshadrach rhshadrach removed the Needs Discussion Requires discussion from core team before further action label Mar 17, 2023
@rhshadrach rhshadrach added this to the 2.1 milestone Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Groupby
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants