Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Timestamp array comparison may raise RecursionError #15183

Closed
sinhrks opened this issue Jan 21, 2017 · 2 comments · Fixed by #18829
Closed

BUG: Timestamp array comparison may raise RecursionError #15183

sinhrks opened this issue Jan 21, 2017 · 2 comments · Fixed by #18829
Labels
Bug Timezones Timezone data dtype
Milestone

Comments

@sinhrks
Copy link
Member

sinhrks commented Jan 21, 2017

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd

n1 = np.array([pd.Timestamp('2011-01-01 00:00:00'),
               pd.Timestamp('2011-01-03 00:00:00-0500', tz='US/Eastern')], dtype=object)

pd.Timestamp('2011-01-03 00:00:00-0500', tz='US/Eastern') == n1 
# RecursionError: maximum recursion depth exceeded

Problem description

Related to #12780. This issue causes datetime-like .replace to raise SystemError. Recursion is caused by this line:

https://github.com/pandas-dev/pandas/blob/master/pandas/tslib.pyx#L1040

Expected Output

raises TypeError.

Output of pd.show_versions()

# Paste the output here pd.show_versions() here
@sinhrks sinhrks added Bug Timezones Timezone data dtype labels Jan 21, 2017
@jreback
Copy link
Contributor

jreback commented Jan 21, 2017

that's not too friendly!

@jreback jreback added this to the 0.20.0 milestone Jan 21, 2017
@jreback jreback modified the milestones: 0.20.0, 0.21.0 Mar 23, 2017
@jbrockmendel
Copy link
Member

I think the error I just hit is related. In py 3.5.2 (pandas 0.20.2):

>>> periods = pd.Series([pd.Period('2018Q2', 'Q-DEC'), pd.Period('2021', '5A-DEC')], name='Period')
>>> df = pd.DataFrame(periods)
>>> df.set_index('Period', append=True)
pandas._libs.period.IncompatibleFrequency: Input has different freq=Q-DEC from Period(freq=5A-DEC)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/pandas/core/frame.py", line 2836, in set_index
    index = MultiIndex.from_arrays(arrays, names=names)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/indexes/multi.py", line 1100, in from_arrays
    labels, levels = _factorize_from_iterables(arrays)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2193, in _factorize_from_iterables
    return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
  File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2193, in <listcomp>
    return map(list, lzip(*[_factorize_from_iterable(it) for it in iterables]))
  File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 2165, in _factorize_from_iterable
    cat = Categorical(values, ordered=True)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/categorical.py", line 298, in __init__
    codes, categories = factorize(values, sort=True)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/algorithms.py", line 567, in factorize
    assume_unique=True)
  File "/usr/local/lib/python3.5/site-packages/pandas/core/algorithms.py", line 486, in safe_sort
    sorter = values.argsort()
  File "pandas/_libs/period.pyx", line 723, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11713)
  File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 375, in __ne__
    return not self == other
  File "/usr/local/lib/python3.5/site-packages/pandas/tseries/offsets.py", line 364, in __eq__
    if isinstance(other, compat.string_types):
SystemError: <built-in function isinstance> returned a result with an error set

In the real-life version when the array of pd.Period objects is longer, the error message includes a ton of these:

During handling of the above exception, another exception occurred:

SystemError: <built-in function isinstance> returned a result with an error set

Halfway through the traceback is a call inside Categorical.__init__ to factorize(values, sort=True). That call is in a try/except that catches a TypeError and falls back to factorize(values, sort=False). Trying that out directly, factorize with sort=False appears just fine, so it is the sort=True that is causing the problem.

I don't do much with py3, so don't know what to make of the returned a result with an error set message. In py2 we get a NotImplementedError from the original set_index call. If we focus just on the factorize(values, sort=True) call, we get:

>>> values = periods
>>> codes, categories = factorize(values, sort=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/site-packages/pandas/core/algorithms.py", line 567, in factorize
    assume_unique=True)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/algorithms.py", line 486, in safe_sort
    sorter = values.argsort()
  File "pandas/_libs/period.pyx", line 725, in pandas._libs.period._Period.__richcmp__ (pandas/_libs/period.c:11842)
pandas._libs.period.IncompatibleFrequency: Input has different freq=Q-DEC from Period(freq=5A-DEC)

I'll follow up on this later. For the purposes of this and related IncompatibleFrequency errors, I think defining an ordering has been avoided because there is not One Obvious Way. But for the purposes of making this Just Work, essentially any ordering would work.

@jreback jreback modified the milestones: 0.21.0, Next Major Release Sep 23, 2017
jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Dec 18, 2017
@jreback jreback modified the milestones: Next Major Release, 0.22.0 Dec 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants