Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: read_csv float_precision="round_trip" parser does not handle initial/trailing spaces #43713

Closed
3 tasks done
ales-erjavec opened this issue Sep 23, 2021 · 0 comments · Fixed by #43714
Closed
3 tasks done
Labels
Bug IO CSV read_csv, to_csv
Milestone

Comments

@ales-erjavec
Copy link
Contributor

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import io

import numpy as np
import pandas as pd

DATA = """\
id\tnum\n
1\t1.2 \n
1\t 2.1\n
2\t 1 \n
2\t 1.2 \n
"""

df = pd.read_csv(
    io.StringIO(DATA),
    float_precision="round_trip", skipinitialspace=True,
    sep="\t", header=0,
    dtype={1: np.float64},
)
print(df)

Issue Description

read_csv(..., float_precision="round_trip", ) does parse fields with initial/trailing space and raises and error

Traceback (most recent call last):
  File "pandas/_libs/parsers.pyx", line 1141, in pandas._libs.parsers.TextReader._convert_tokens
    false_values=false_values)
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ales/.config/JetBrains/PyCharmCE2021.2/scratches/scratch_2.py", line 14, in <module>
    df = pd.read_csv(
  File "/home/ales/.envs/orange3/lib/python3.9/site-packages/pandas/io/parsers.py", line 686, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/ales/.envs/orange3/lib/python3.9/site-packages/pandas/io/parsers.py", line 458, in _read
    data = parser.read(nrows)
  File "/home/ales/.envs/orange3/lib/python3.9/site-packages/pandas/io/parsers.py", line 1196, in read
    ret = self._engine.read(nrows)
  File "/home/ales/.envs/orange3/lib/python3.9/site-packages/pandas/io/parsers.py", line 2155, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
    cdef:
  File "pandas/_libs/parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
    else:
  File "pandas/_libs/parsers.pyx", line 941, in pandas._libs.parsers.TextReader._read_rows
    self.parser.line_fields[i] + \
  File "pandas/_libs/parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_column_data
    col_res, na_count = self._convert_with_dtype(
  File "pandas/_libs/parsers.pyx", line 1149, in pandas._libs.parsers.TextReader._convert_tokens
    
ValueError: cannot safely convert passed user dtype of float64 for object dtyped data in column 1

Expected Behavior

float_precision="round_trip" should work the same as other float parsers.

Installed Versions

INSTALLED VERSIONS
------------------
commit           : e218f059caaf5d921c4c71b7a7889adfe6c74b9d
python           : 3.9.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.14.2-1-MANJARO
Version          : #1 SMP PREEMPT Wed Sep 8 14:11:01 UTC 2021
machine          : x86_64
processor        : 
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.4.0.dev0+497.ge218f059ca
numpy            : 1.21.2
pytz             : 2021.1
dateutil         : 2.8.2
pip              : 21.1.3
setuptools       : 56.0.0
Cython           : 0.29.24
pytest           : 6.2.5
hypothesis       : 6.21.6
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : 3.0.1
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 3.0.1
IPython          : 7.26.0
pandas_datareader: None
bs4              : None
bottleneck       : 1.3.2
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.4.3
numexpr          : None
odfpy            : None
openpyxl         : 3.0.7
pandas_gbq       : None
pyarrow          : None
pyxlsb           : None
s3fs             : None
scipy            : 1.7.1
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : 2.0.1
xlwt             : None
numba            : None
@ales-erjavec ales-erjavec added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 23, 2021
@jreback jreback added this to the 1.4 milestone Sep 28, 2021
@jreback jreback added IO CSV read_csv, to_csv and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants