Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Regression on SeriesGrouper using Timestamp index with pandas 1.3.0 #42390

Closed
3 tasks done
philpep opened this issue Jul 5, 2021 · 2 comments · Fixed by #43054
Closed
3 tasks done

BUG: Regression on SeriesGrouper using Timestamp index with pandas 1.3.0 #42390

philpep opened this issue Jul 5, 2021 · 2 comments · Fixed by #43054
Labels
Datetime Datetime data dtype Groupby Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@philpep
Copy link

philpep commented Jul 5, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
import numpy as np


def agg(series):
    if series.isna().values.all():
        return None
    return np.sum(series)


df = pd.DataFrame([1.0], index=[pd.Timestamp("2018-01-16 00:00:00+00:00")])
df.groupby(lambda x: 1).agg(agg)

Problem description

This raise with 1.3.0 and succeeded with 1.2.5

Traceback (most recent call last):
  File "/tmp/t.py", line 12, in <module>
    df.groupby(lambda x: 1).agg(agg)
  File "/home/phil/src/pandas/pandas/core/groupby/generic.py", line 1010, in aggregate
    result = gba.agg()
  File "/home/phil/src/pandas/pandas/core/apply.py", line 164, in agg
    return self.agg_list_like()
  File "/home/phil/src/pandas/pandas/core/apply.py", line 355, in agg_list_like
    new_res = colg.aggregate(arg)
  File "/home/phil/src/pandas/pandas/core/groupby/generic.py", line 249, in aggregate
    ret = self._aggregate_multiple_funcs(func)
  File "/home/phil/src/pandas/pandas/core/groupby/generic.py", line 303, in _aggregate_multiple_funcs                                                                                        
    results[key] = self.aggregate(func)
  File "/home/phil/src/pandas/pandas/core/groupby/generic.py", line 265, in aggregate
    return self._python_agg_general(func, *args, **kwargs)
  File "/home/phil/src/pandas/pandas/core/groupby/groupby.py", line 1310, in _python_agg_general                                                                                             
    result = self.grouper.agg_series(obj, f)
  File "/home/phil/src/pandas/pandas/core/groupby/ops.py", line 1028, in agg_series
    result = self._aggregate_series_fast(obj, func)
  File "/home/phil/src/pandas/pandas/core/groupby/ops.py", line 1053, in _aggregate_series_fast                                                                                              
    result, _ = sgrouper.get_result()
  File "pandas/_libs/reduction.pyx", line 281, in pandas._libs.reduction.SeriesGrouper.get_result                                                                                            
    cached_series, cached_index, islider, vslider)
  File "pandas/_libs/reduction.pyx", line 88, in pandas._libs.reduction._BaseGrouper._apply_to_group                                                                                         
    res = self.f(cached_series)
  File "/home/phil/src/pandas/pandas/core/groupby/groupby.py", line 1296, in <lambda>
    f = lambda x: func(x, *args, **kwargs)
  File "/tmp/t.py", line 6, in agg
    if series.isna().values.all():
  File "/home/phil/src/pandas/pandas/core/series.py", line 5161, in isna
    return generic.NDFrame.isna(self)
  File "/home/phil/src/pandas/pandas/core/generic.py", line 7145, in isna
    return isna(self).__finalize__(self, method="isna")
  File "/home/phil/src/pandas/pandas/core/dtypes/missing.py", line 138, in isna
    return _isna(obj)
  File "/home/phil/src/pandas/pandas/core/dtypes/missing.py", line 177, in _isna
    result, index=obj.index, name=obj.name, copy=False
  File "/home/phil/src/pandas/pandas/core/series.py", line 430, in __init__
    com.require_length_match(data, index)
  File "/home/phil/src/pandas/pandas/core/common.py", line 528, in require_length_match
    "Length of values "
ValueError: Length of values (1) does not match length of index (0)

Output of pd.show_versions()

INSTALLED VERSIONS

commit : c8b79e1bd30fb4d36e5c1f894757d59c57837528
python : 3.7.3.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.0-17-amd64
Version : #1 SMP Debian 4.19.194-2 (2021-06-21)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 1.3.0+4.gc8b79e1bd3
numpy : 1.21.0
pytz : 2021.1
dateutil : 2.8.1
pip : 21.1.3
setuptools : 57.1.0
Cython : 0.29.23
pytest : 6.2.4
hypothesis : 6.14.1
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@philpep philpep added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 5, 2021
philpep added a commit to philpep/pandas that referenced this issue Jul 5, 2021
…v#42390)

This fixes a regression introduced in c355ed1 where cache is not
initialized with correct state of islider and vslider.

On Timestamp index this trigger a "ValueError Length of values does not match length of index"

Closes pandas-dev#42390

Signed-off-by: Philippe Pepiot <phil@lowatt.fr>
philpep added a commit to philpep/pandas that referenced this issue Jul 5, 2021
…v#42390)

This fixes a regression introduced in c355ed1 where cache is not
initialized with correct state of islider and vslider.

On Timestamp index this trigger a "ValueError Length of values does not match length of index"

Closes pandas-dev#42390

Signed-off-by: Philippe Pepiot <phil@lowatt.fr>
philpep added a commit to philpep/pandas that referenced this issue Jul 5, 2021
…v#42390)

This fixes a regression introduced in c355ed1 where cache is not
initialized with correct state of islider and vslider.

On Timestamp index this trigger a "ValueError Length of values does not match length of index"

Closes pandas-dev#42390

Signed-off-by: Philippe Pepiot <phil@lowatt.fr>
philpep added a commit to philpep/pandas that referenced this issue Jul 6, 2021
…v#42390)

This fixes a regression introduced in c355ed1 where cache is not
initialized with correct state of islider and vslider. The first call of
{v,i}slider.move() must be done before initializing the cache.

On Timestamp index this trigger a "ValueError Length of values does not match length of index"

Closes pandas-dev#42390

Signed-off-by: Philippe Pepiot <phil@lowatt.fr>
@rhshadrach rhshadrach added Regression Functionality that used to work in a prior pandas version Groupby Datetime Datetime data dtype and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 7, 2021
philpep added a commit to philpep/pandas that referenced this issue Jul 7, 2021
…v#42390)

This fixes a regression introduced in c355ed1 where cache is not
initialized with correct state of islider and vslider. The first call of
{v,i}slider.move() must be done before initializing the cache.

On Timestamp index this trigger a "ValueError Length of values does not match length of index"

Closes pandas-dev#42390

Signed-off-by: Philippe Pepiot <phil@lowatt.fr>
philpep added a commit to philpep/pandas that referenced this issue Jul 7, 2021
…v#42390)

This fixes a regression introduced in c355ed1 where cache is not
initialized with correct state of islider and vslider. The first call of
{v,i}slider.move() must be done before initializing the cache.

On Timestamp index this trigger a "ValueError Length of values does not match length of index"

Closes pandas-dev#42390

Signed-off-by: Philippe Pepiot <phil@lowatt.fr>
@jreback jreback added this to the 1.3.2 milestone Jul 28, 2021
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Jul 28, 2021
@simonjayhawkins
Copy link
Member

This raise with 1.3.0 and succeeded with 1.2.5

first bad commit: [c355ed1] REF: simplify libreduction (#41262)

cc @jbrockmendel

@simonjayhawkins
Copy link
Member

@jbrockmendel i've closed #42391 as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
4 participants