Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DatetimeIndex has become unhashable in 1.3.1? #42844

Closed
2 of 3 tasks
yohplala opened this issue Aug 1, 2021 · 8 comments · Fixed by #44530
Closed
2 of 3 tasks

BUG: DatetimeIndex has become unhashable in 1.3.1? #42844

yohplala opened this issue Aug 1, 2021 · 8 comments · Fixed by #44530
Labels
Bug Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@yohplala
Copy link

yohplala commented Aug 1, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Hi,
Following code below was working with pandas 1.2.5.
It raises now an error with pandas 1.3.1.

Code Sample, a copy-pastable example

import pandas as pd
import random

# Right data
ts_open = pd.DatetimeIndex([
                           pd.Timestamp('2021/01/01 00:37'),
                           pd.Timestamp('2021/01/01 00:40'),
                           pd.Timestamp('2021/01/01 01:00'),
                           pd.Timestamp('2021/01/01 03:45'),
                           pd.Timestamp('2021/01/01 03:59'),
                           pd.Timestamp('2021/01/01 05:20')])
length = len(ts_open)
random.seed(1)
volume = random.sample(range(1, length+1), length)
df_smpl = pd.DataFrame({'volume' : volume, 'ts_open': ts_open})

# Left data
ts_full = pd.date_range(start=pd.Timestamp('2021/01/01 00:00'),
                        periods=7, freq='1h')
ts_event = ts_full[:-1]
length_ipr = len(ts_event)
random.seed(2)
volume = random.sample(range(20, length_ipr+20), length_ipr)
df_ipr = pd.DataFrame({'volume' : volume, 'timestamp': ts_event})

at = ts_full[1:]
df_smpl.index = df_smpl['ts_open']

# Test
df_full = pd.merge_asof(df_ipr, df_smpl,
                        left_on=at, right_index=True,
                        allow_exact_matches=False, direction='backward')

Error message:

Traceback (most recent call last):

  File "<ipython-input-152-b5845a330a4b>", line 30, in <module>
    df_full = pd.merge_asof(df_ipr, df_smpl,

  File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 581, in merge_asof
    op = _AsOfMerge(

  File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 1736, in __init__
    _OrderedMerge.__init__(

  File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 1619, in __init__
    _MergeOperation.__init__(

  File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 682, in __init__
    self._validate_specification()

  File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py", line 1782, in _validate_specification
    if left_on in self.left.columns

  File "/home/yoh/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 4572, in __contains__
    hash(key)

TypeError: unhashable type: 'DatetimeIndex'

Output of pd.show_versions()

INSTALLED VERSIONS

commit : c7f7443
python : 3.8.8.final.0
python-bits : 64
OS : Linux
OS-release : 5.8.0-63-generic
Version : #71~20.04.1-Ubuntu SMP Thu Jul 15 17:46:08 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.3.1
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.2
pip : 21.1.3
setuptools : 52.0.0.post20210125
Cython : 0.29.24
pytest : 6.2.4
hypothesis : None
sphinx : 4.0.2
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.22.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : 2021.07.0
fastparquet : 0.7.0
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : None
tables : None
tabulate : 0.8.9
xarray : 0.19.0
xlrd : None
xlwt : None
numba : 0.53.1

@yohplala yohplala added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 1, 2021
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Aug 2, 2021
@simonjayhawkins
Copy link
Member

Thanks @yohplala for the report

Following code below was working with pandas 1.2.5.

first bad commit: [cd0224d] BUG: Raise ValueError for non numerical join columns in merge_asof (#34488)

but looks like the traceback has since changed. was..

2021-08-02T13:13:55.0632373Z 1.3.0.dev0+356.gcd0224d26c
2021-08-02T13:13:55.0654274Z Traceback (most recent call last):
2021-08-02T13:13:55.0655519Z   File "../42844.py", line 37, in <module>
2021-08-02T13:13:55.0656179Z     df_full = pd.merge_asof(
2021-08-02T13:13:55.0657055Z   File "/home/runner/work/pandas/pandas/pandas/core/reshape/merge.py", line 557, in merge_asof
2021-08-02T13:13:55.0657893Z     op = _AsOfMerge(
2021-08-02T13:13:55.0658704Z   File "/home/runner/work/pandas/pandas/pandas/core/reshape/merge.py", line 1670, in __init__
2021-08-02T13:13:55.0659566Z     _OrderedMerge.__init__(
2021-08-02T13:13:55.0660833Z   File "/home/runner/work/pandas/pandas/pandas/core/reshape/merge.py", line 1564, in __init__
2021-08-02T13:13:55.0661751Z     _MergeOperation.__init__(
2021-08-02T13:13:55.0662624Z   File "/home/runner/work/pandas/pandas/pandas/core/reshape/merge.py", line 656, in __init__
2021-08-02T13:13:55.0663522Z     self._validate_specification()
2021-08-02T13:13:55.0664509Z   File "/home/runner/work/pandas/pandas/pandas/core/reshape/merge.py", line 1713, in _validate_specification
2021-08-02T13:13:55.0666096Z     self.left[self.left_on[0]].dtype
2021-08-02T13:13:55.0666970Z   File "/home/runner/work/pandas/pandas/pandas/core/frame.py", line 3037, in __getitem__
2021-08-02T13:13:55.0668021Z     indexer = self.loc._get_listlike_indexer(key, axis=1, raise_missing=True)[1]
2021-08-02T13:13:55.0669128Z   File "/home/runner/work/pandas/pandas/pandas/core/indexing.py", line 1269, in _get_listlike_indexer
2021-08-02T13:13:55.0670220Z     self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
2021-08-02T13:13:55.0671554Z   File "/home/runner/work/pandas/pandas/pandas/core/indexing.py", line 1311, in _validate_read_indexer
2021-08-02T13:13:55.0672552Z     raise KeyError(f"None of [{key}] are in the [{axis_name}]")
2021-08-02T13:13:55.0675081Z KeyError: "None of [DatetimeIndex(['2021-01-01 01:00:00', '2021-01-01 02:00:00',\n               '2021-01-01 03:00:00', '2021-01-01 04:00:00',\n               '2021-01-01 05:00:00', '2021-01-01 06:00:00'],\n              dtype='datetime64[ns]', freq='H')] are in the [columns]"

cc @phofl

@simonjayhawkins simonjayhawkins added Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode Datetime Datetime data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 2, 2021
@simonjayhawkins simonjayhawkins added this to the 1.3.2 milestone Aug 2, 2021
@phofl
Copy link
Member

phofl commented Aug 2, 2021

@yohplala Could you please narrow your example? It should be minimal and reproducible

@phofl
Copy link
Member

phofl commented Aug 2, 2021

@simonjayhawkins I don't think that this is supposed to work. Docs say

left_onlabel

    Field name to join on in left DataFrame.

while an Index is provided here.

@yohplala
Copy link
Author

yohplala commented Aug 2, 2021

@yohplala Could you please narrow your example? It should be minimal and reproducible

Hi, yes, will see to that this evening then.

@simonjayhawkins I don't think that this is supposed to work.

:( (being sad here) It was working with 1.2.5 (the failure happens with test cases of mine that are double-checked: results were correct then)

@yohplala
Copy link
Author

yohplala commented Aug 2, 2021

@yohplala Could you please narrow your example? It should be minimal and reproducible

Hi Patrick,
please, find enclosed reduced example.

import pandas as pd

# Right data
ts_right = pd.DatetimeIndex([pd.Timestamp('2021/01/01 00:37'),
                             pd.Timestamp('2021/01/01 01:40')])
right = pd.DataFrame({'val' : [2,6], 'ts': ts_right}, index=ts_right)

# Left data
ts_left = pd.date_range(start=pd.Timestamp('2021/01/01 00:00'),
                        periods=3, freq='1h')
left = pd.DataFrame({'val' : [4,8,7], 'timestamp': ts_left})

# Test
merged = pd.merge_asof(left, right, left_on = ts_left, right_index=True,
                       allow_exact_matches=False, direction='backward')

You are right about the root cause (DatetimeIndex beoing provided). Using directly a column of the left DataFrame works ok.

@simonjayhawkins
Copy link
Member

changing milestone to 1.3.5

@simonjayhawkins simonjayhawkins modified the milestones: 1.3.4, 1.3.5 Oct 16, 2021
@jbrockmendel
Copy link
Member

@phofl looks you you worked on this code recently-ish. Is this a usage we should or do support?

@phofl
Copy link
Member

phofl commented Nov 18, 2021

Not sure, I have something in the pipeline to restore but I would be in favor of deprecating for 2.0. we have on tests for passing an array into merge_asof and only two for merge, which were added ages ago

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants