Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #17965 datetime64 comparison #18188

Merged
merged 1 commit into from
Nov 14, 2017

Conversation

watercrossing
Copy link
Contributor

def test_comparison_to_datetime64(self):
# GH issue #17965
df = DataFrame({'dates': date_range('2000-01-01', '2000-01-05')})
date = df['dates'].unique()[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right test? date should be a datetime.date, and I think this is a datetime64[ns].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

date is numpy.datetime64('2000-01-01T00:00:00.000000000'), which is consistent with the output of pd.date_range().values, which are datetime64[ns] too. Do you suggest this should be an array of datetime.date objects?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#17965 is about comparing with a single datetime.date object:

In [20]: pd.Timestamp('20130101').to_pydatetime().date()>=df.A
ValueError: cannot set a Timestamp with a non-timestamp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paramaterize this test with date, datetime, np.datetime64, and Timestamp.
check for < and ==

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! I was not actually fixing the Issue, but rather this comment: #17965 (comment) - but it actually applies to both.

@TomAugspurger
Copy link
Contributor

Python is inconsistent here:

python                                                                                                                                                                                                                               Thu Nov  9 05:53:21 2017
Python 3.6.1 (default, Apr  4 2017, 09:40:21)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import datetime
>>> datetime.datetime(2017, 1, 1) == datetime.date(2017, 1, 1)
False
>>> datetime.datetime(2017, 1, 1) >= datetime.date(2017, 1, 1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't compare datetime.datetime to datetime.date

Strange.... We should be a bit clearer on our policy if we're differing from Python. Are we saying that datetime.datetimes compare with a datetime.date by promoting the date to a datetime at midnight?

@TomAugspurger TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions Datetime Datetime data dtype labels Nov 9, 2017
@watercrossing
Copy link
Contributor Author

I don't think your comparison is correct - semantically the test is comparing two dates, not datetimes:
datetime.date(2013,11,4) == datetime.date(2013,11,4) is True - so we are not deviating from Python?

@jreback
Copy link
Contributor

jreback commented Nov 9, 2017

Strange.... We should be a bit clearer on our policy if we're differing from Python. Are we saying that datetime.datetimes compare with a datetime.date by promoting the date to a datetime at midnight?

python is inconcistent here, datetime.date should just compare to datetime.datetime, we just coerce everything to Timestamp in any event.

@@ -61,6 +61,7 @@ Bug Fixes
- Bug in :class:`DatetimeIndex` subtracting datetimelike from DatetimeIndex could fail to overflow (:issue:`18020`)
- Bug in ``pd.Series.rolling.skew()`` and ``rolling.kurt()`` with all equal values has floating issue (:issue:`18044`)
- Bug in ``pd.DataFrameGroupBy.count()`` when counting over a datetimelike column (:issue:`13393`)
- Bug in ``_libs.index.convert_scalar()`` when comparing datetimes to np.datetime64 (:issue:`17965`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a user useful note

@@ -549,6 +549,8 @@ cpdef convert_scalar(ndarray arr, object value):
if arr.descr.type_num == NPY_DATETIME:
if isinstance(value, np.ndarray):
pass
if isinstance(value, np.datetime64):
return Timestamp(value).value
elif isinstance(value, datetime):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compbine np.datetime64 and datetime into the same clause

def test_comparison_to_datetime64(self):
# GH issue #17965
df = DataFrame({'dates': date_range('2000-01-01', '2000-01-05')})
date = df['dates'].unique()[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

paramaterize this test with date, datetime, np.datetime64, and Timestamp.
check for < and ==

# GH issue #17965
df = DataFrame({'dates': date_range('2000-01-01', '2000-01-05')})
date = df['dates'].unique()[0]
tm.assert_frame_equal(df.loc[df['dates'] == date], df.iloc[:1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use

result =
expected = 
tm.assert_frame_equal(result, expected)

@@ -330,3 +330,9 @@ def test_loc_datetime_length_one(self):

result = df.loc['2016-10-01T00:00:00':]
tm.assert_frame_equal(result, df)

def test_comparison_to_datetime64(self):
# GH issue #17965
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the frame from the original issue (which uses a NaT) as well.

@jreback
Copy link
Contributor

jreback commented Nov 9, 2017

@watercrossing the comparison is a non-converted datetime.date vs a Timestamp, which is what boxed versions of datetime64[ns] are. pls use the original issues exactly.

testing against a column of datetime.date is possible as well, though this is more tricky to construct.

@jbrockmendel
Copy link
Member

Reiterating suggestion to coerce datetime.date to Period with freq=D

@jreback
Copy link
Contributor

jreback commented Nov 9, 2017

Reiterating suggestion to coerce datetime.date to Period with freq=D

not in this PR maybe at some point

@watercrossing watercrossing force-pushed the timestampComparison branch 2 times, most recently from 8b01fe3 to ddfe7dc Compare November 10, 2017 11:06
@@ -61,6 +61,7 @@ Bug Fixes
- Bug in :class:`DatetimeIndex` subtracting datetimelike from DatetimeIndex could fail to overflow (:issue:`18020`)
- Bug in ``pd.Series.rolling.skew()`` and ``rolling.kurt()`` with all equal values has floating issue (:issue:`18044`)
- Bug in ``pd.DataFrameGroupBy.count()`` when counting over a datetimelike column (:issue:`13393`)
- Bug in ``_libs.index.convert_scalar()``: Comparison between ``datetime64[ns]`` series and ``datetimelike`` (:issue:`17965`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't show internal functions here, rather the end-user effect, something like

a boolean comparison with a datetime.date and a datetime64[ns] dtype Series

comparisonTS = Timestamp('20130101')

@pytest.mark.parametrize('datetimelike,opExp', product(
[comparisonTS, comparisonTS.to_pydatetime(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

directly construct these, e.g. Timestamp('20130101'), datetime.datetime(2013, 1, 1) etc

@pytest.mark.parametrize('datetimelike,opExp', product(
[comparisonTS, comparisonTS.to_pydatetime(),
comparisonTS.to_pydatetime().date(), comparisonTS.to_datetime64()],
[(op.lt, [True, False, False, False]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't pass opExp, rather pass op (e.g. the op.lt) and the expected (the boolean result)

Copy link
Contributor Author

@watercrossing watercrossing Nov 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can do, but i am not sure something like:

@pytest.mark.parametrize('datetimelike,op,expected', 
    [(x, y[0], y[1]) for x, y in product( 
        [pd.Timestamp('20130101'), datetime.datetime(2013, 1, 1),
         datetime.date(2013, 1, 1), np.datetime64('2013-01-01', 'ns')],
        [(op.lt, [True, False, False, False]),
         (op.le, [True, True, False, False]),
         (op.eq, [False, True, False, False]),
         (op.gt, [False, False, False, True])])])

is more readable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry didn't see the product. instead use multiple paramatrize to achieve product.

eg..

@pytest.mark.parametrize(.....)
@pytest.mark.parametrize('datetimelike'

@jreback
Copy link
Contributor

jreback commented Nov 10, 2017

rebase as well

@codecov
Copy link

codecov bot commented Nov 13, 2017

Codecov Report

Merging #18188 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18188      +/-   ##
==========================================
- Coverage   91.42%   91.38%   -0.05%     
==========================================
  Files         163      163              
  Lines       50068    50068              
==========================================
- Hits        45777    45756      -21     
- Misses       4291     4312      +21
Flag Coverage Δ
#multiple 89.19% <ø> (-0.03%) ⬇️
#single 40.35% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9050e38...a56f7fe. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 13, 2017

Codecov Report

Merging #18188 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18188      +/-   ##
==========================================
+ Coverage   91.38%   91.38%   +<.01%     
==========================================
  Files         164      164              
  Lines       49884    49884              
==========================================
+ Hits        45584    45586       +2     
+ Misses       4300     4298       -2
Flag Coverage Δ
#multiple 89.19% <ø> (+0.02%) ⬆️
#single 39.42% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.38% <0%> (-0.1%) ⬇️
pandas/plotting/_converter.py 65.25% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7f4c960...d9f12f4. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small doc-change. ping on green.

@@ -62,6 +62,7 @@ Bug Fixes
- Bug in ``pd.Series.rolling.skew()`` and ``rolling.kurt()`` with all equal values has floating issue (:issue:`18044`)
- Bug in ``pd.DataFrameGroupBy.count()`` when counting over a datetimelike column (:issue:`13393`)
- Bug in ``pd.concat`` when empty and non-empty DataFrames or Series are concatenated (:issue:`18178` :issue:`18187`)
- Bug in a boolean comparison with a ``datetime.datetime``, ``datetime.date``, ``np.datetime64`` and a ``datetime64[ns]`` dtype Series (:issue:`17965`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just list datetime.datetime (as that's the actual bug)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to the indexing section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Done.

@jreback jreback added this to the 0.21.1 milestone Nov 13, 2017
@jreback
Copy link
Contributor

jreback commented Nov 13, 2017

ok ping on green.

@watercrossing
Copy link
Contributor Author

@jreback: I have changed the whatsnew and rebased.

@TomAugspurger
Copy link
Contributor

@watercrossing let us know if / when you see all the CI workers finish and we'll merge.

@jreback
Copy link
Contributor

jreback commented Nov 13, 2017

@watercrossing let us know if / when you see all the CI workers finish and we'll merge.

@TomAugspurger optimist!

@watercrossing
Copy link
Contributor Author

@TomAugspurger / @jreback: Ping!

@jreback jreback merged commit 77f10f0 into pandas-dev:master Nov 14, 2017
@jreback
Copy link
Contributor

jreback commented Nov 14, 2017

thanks @watercrossing

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

COMPAT: datetime.date comparisons with datetime64[ns]
4 participants