Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Timestamp subtraction raises TypeError, Non informative #31774

Closed
sbrugman opened this issue Feb 7, 2020 · 8 comments · Fixed by #32499
Closed

REGR: Timestamp subtraction raises TypeError, Non informative #31774

sbrugman opened this issue Feb 7, 2020 · 8 comments · Fixed by #32499
Labels
Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@sbrugman
Copy link
Contributor

sbrugman commented Feb 7, 2020

Code Sample

import pandas as pd
print(pd.Timestamp('2101-01-01 00:00:00') - pd.Timestamp('1688-01-01 00:00:00'))

Outputs:

TypeError: unsupported operand type(s) for -: 'Timestamp' and 'Timestamp'

Problem description

The difference between the timestamps is too large for the Timedelta object. This can be traced back to these lines:

return Timedelta(self.value - other.value)

For the example we can see that

print(pd.Timedelta(pd.Timestamp('2101-01-01 00:00:00').value - pd.Timestamp('1688-01-01 00:00:00').value))

results in
OverflowError: int too big to convert

Expected Output

Ideally, the expected output is:

150845 days 00:00:00

Note the expected output matches that of pandas 0.25.3.

Alternatively, the error message should be at least somewhat informative, for example by propagating the OverflowError (removing the try/except block from L302-305) or raising a new TypeError (similar to L296).

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.1
numpy : 1.18.0
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 42.0.2.post20191201
Cython : None
pytest : 5.3.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : 0.3.2
gcsfs : None
lxml.etree : None
matplotlib : 3.1.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.3.2
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : None
numba : 0.46.0

@sbrugman sbrugman changed the title BUG: Timestamp subtraction raises TypeError, Non informative REGR: Timestamp subtraction raises TypeError, Non informative Feb 7, 2020
@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Feb 7, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.2 milestone Feb 7, 2020
@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Feb 7, 2020

So before, this returned a datetime.timedelta, as that can hold such big values.

cc @jbrockmendel do you know if this was removed intentionally?

@jbrockmendel
Copy link
Member

do you know if this was removed intentionally?

I don't think so. Best guess is it was either when changing the behavior to return NotImplemented in more cases, or just a cleanup gone wrong

@TomAugspurger
Copy link
Contributor

Mmm what behavior do we want going forward? Returning a datetime.timedelta only sometimes seems not great... I don't think we would introduce that behavior if it wasn't what we previously did.

@jbrockmendel
Copy link
Member

Returning a timedelta instead of raising also has the problem that in introduces an inconsistency between Timestamp vs DTA/DTI/Series[dt64], so i lean towards not restoring the older behavior.

But we can definitely do a better exception message.

@TomAugspurger
Copy link
Contributor

That sounds good to me. As a workaround, users can explicitly convert to a datetime.timedelta prior to performing the operation.

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Mar 6, 2020
@TomAugspurger
Copy link
Contributor

What behavior do we want long-term? With #32499, I'm (re-)introducing a value-dependent behavior where the return time of Timestamp - datetime.datetime depends on the values,

# out-of-bounds -> datetime.timedelta
In [5]: (pd.Timestamp.max - datetime.datetime(1960, 1, 1))
Out[5]: datetime.timedelta(days=110404, seconds=85636, microseconds=854775)

# in-bounds -> datetime.Timedelta
In [6]: (pd.Timestamp.max - datetime.datetime(2000, 1, 1))
Out[6]: Timedelta('95794 days 23:47:16.854775')

Do we want to deprecate Out[5], so that Timestamp - datetime.datetime is always a Timedelta?

I think for 1.0.2 we're fine to return a datetime.timedelta there without a warning, but we should maybe add a deprecation for 1.1.

@TomAugspurger
Copy link
Contributor

Perhaps since we're subclassing the builtins, it's OK to return a datetime.timedelta depending on the value?

@jbrockmendel
Copy link
Member

If we're issuing a warning i think its fine.

@jreback jreback added the Datetime Datetime data dtype label Mar 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants