Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DataFrameGroupBy.sum ignores min_count for boolean data type #34051

Closed
3 tasks done
dsaxton opened this issue May 7, 2020 · 1 comment · Fixed by #34056
Closed
3 tasks done

BUG: DataFrameGroupBy.sum ignores min_count for boolean data type #34051

dsaxton opened this issue May 7, 2020 · 1 comment · Fixed by #34056
Labels
Bug Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@dsaxton
Copy link
Member

dsaxton commented May 7, 2020

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

Behavior is from master:

import pandas as pd

df = pd.DataFrame({"a": [1, 2], "b": pd.array([True, True])})
df.groupby("a").sum(min_count=2)

gives

      b
a      
1  True
2  True

but expected output is

      b
a      
1  <NA>
2  <NA>

It looks to me like there's an attempt to compute a Cythonized result which fails, after which point the min_count argument is forgotten.

@dsaxton dsaxton added Bug Needs Triage Issue that has not been reviewed by a pandas team member Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 7, 2020
@jreback jreback added this to the 1.1 milestone May 10, 2020
@simonjayhawkins
Copy link
Member

This also occurs for Timedelta

>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1537.g6be51cb65b'
>>>
>>> df = pd.DataFrame(
...     {"foo": ["a", "a", "a"], "bar": [pd.Timedelta("1D"), pd.Timedelta("2D"), None]}
... )
>>>
>>> df
  foo    bar
0   a 1 days
1   a 2 days
2   a    NaT
>>>
>>> grp = df.groupby("foo")["bar"]
>>>
>>> grp.sum(min_count=3)
foo
a   3 days
Name: bar, dtype: timedelta64[ns]
>>>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants