Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REGR: Buffer dtype mismatch, expected 'Python object' but got a string #38900

Closed
2 of 3 tasks
forgetso opened this issue Jan 2, 2021 · 5 comments · Fixed by #39065
Closed
2 of 3 tasks

REGR: Buffer dtype mismatch, expected 'Python object' but got a string #38900

forgetso opened this issue Jan 2, 2021 · 5 comments · Fixed by #39065
Labels
Regression Functionality that used to work in a prior pandas version replace replace method Strings String extension data type and string data
Milestone

Comments

@forgetso
Copy link

forgetso commented Jan 2, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

# Your code here
>>> import pandas as pd
>>> df = pd.DataFrame(['o',"o'doul's",'oad','oaf','oafish'])
>>> df = df.astype('|S')
>>> df
             0
0         b'o'
1  b"o'doul's"
2       b'oad'
3       b'oaf'
4    b'oafish'
>>> import numpy as np
>>> df.replace({None:np.nan})

Problem description

This was working in 1.1.5

    df[c].replace({None: np.NaN}, inplace=True)
  File "/home/user/anaconda3/envs/uniqueness/lib/python3.8/site-packages/pandas/core/series.py", line 4479, in replace
    return super().replace(
  File "/home/user/anaconda3/envs/uniqueness/lib/python3.8/site-packages/pandas/core/generic.py", line 6840, in replace
    return self.replace(
  File "/home/user/anaconda3/envs/uniqueness/lib/python3.8/site-packages/pandas/core/series.py", line 4479, in replace
    return super().replace(
  File "/home/user/anaconda3/envs/uniqueness/lib/python3.8/site-packages/pandas/core/generic.py", line 6889, in replace
    new_data = self._mgr.replace_list(
  File "/home/user/anaconda3/envs/uniqueness/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 664, in replace_list
    bm = self.apply(
  File "/home/user/anaconda3/envs/uniqueness/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 427, in apply
    applied = getattr(b, f)(**kwargs)
  File "/home/user/anaconda3/envs/uniqueness/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 910, in _replace_list
    [b.convert(numeric=False, copy=True) for b in result]
  File "/home/user/anaconda3/envs/uniqueness/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 910, in <listcomp>
    [b.convert(numeric=False, copy=True) for b in result]
  File "/home/user/anaconda3/envs/uniqueness/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 2558, in convert
    values = f(None, self.values.ravel(), None)
  File "/home/user/anaconda3/envs/uniqueness/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 2542, in f
    values = soft_convert_objects(
  File "/home/user/anaconda3/envs/uniqueness/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1139, in soft_convert_objects
    values = lib.maybe_convert_objects(values, convert_datetime=True)
  File "pandas/_libs/lib.pyx", line 2125, in pandas._libs.lib.maybe_convert_objects
ValueError: Buffer dtype mismatch, expected 'Python object' but got a string

Expected Output

Nones are replaced with nans

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit : b5958ee python : 3.8.5.final.0 python-bits : 64 OS : Linux OS-release : 4.15.0-128-generic Version : #131-Ubuntu SMP Wed Dec 9 06:57:35 UTC 2020 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8

pandas : 1.1.5
numpy : 1.19.4
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.2
Cython : 3.0a6
pytest : 6.1.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : None
pymysql : None
psycopg2 : 2.8.5 (dt dec pq3 ext lo64)
jinja2 : 3.0.0a1
IPython : 7.17.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.20
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.52.0

@forgetso forgetso added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 2, 2021
@jreback
Copy link
Contributor

jreback commented Jan 2, 2021

pls show a complete example
eg how the original frame is constructed

@simonjayhawkins simonjayhawkins added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 2, 2021
@forgetso
Copy link
Author

forgetso commented Jan 2, 2021

pls show a complete example
eg how the original frame is constructed

I've added an example df showing how to replicate the bug.

@mzeitlin11
Copy link
Member

Also raises with same example for a simpler frame like
df = pd.DataFrame(['o'])

First bad commit 3f51060, PR (#38151)

@mzeitlin11 mzeitlin11 added Regression Functionality that used to work in a prior pandas version Strings String extension data type and string data replace replace method and removed Bug Needs Info Clarification about behavior needed to assess issue labels Jan 2, 2021
@simonjayhawkins
Copy link
Member

First bad commit 3f51060, PR (#38151)

cc @jbrockmendel

@simonjayhawkins simonjayhawkins added this to the 1.2.1 milestone Jan 3, 2021
@jbrockmendel
Copy link
Member

It isn't clear to me how #38151 broke this, but the/a fix is in ObjectBlock._maybe_coerce_values, where we need to check for issubclass(values.dtype.type, (str, bytes)) instead of issubclass(values.dtype.type, str)

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Jan 6, 2021
@simonjayhawkins simonjayhawkins changed the title BUG: Buffer dtype mismatch, expected 'Python object' but got a string REGR: Buffer dtype mismatch, expected 'Python object' but got a string Jan 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version replace replace method Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants