BUG: read_csv float_precision="round_trip" parser does not handle initial/trailing spaces #43713

ales-erjavec · 2021-09-23T10:02:23Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import io

import numpy as np
import pandas as pd

DATA = """\
id\tnum\n
1\t1.2 \n
1\t 2.1\n
2\t 1 \n
2\t 1.2 \n
"""

df = pd.read_csv(
    io.StringIO(DATA),
    float_precision="round_trip", skipinitialspace=True,
    sep="\t", header=0,
    dtype={1: np.float64},
)
print(df)

Issue Description

read_csv(..., float_precision="round_trip", ) does parse fields with initial/trailing space and raises and error

Traceback (most recent call last):
  File "pandas/_libs/parsers.pyx", line 1141, in pandas._libs.parsers.TextReader._convert_tokens
    false_values=false_values)
TypeError: Cannot cast array data from dtype('O') to dtype('float64') according to the rule 'safe'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ales/.config/JetBrains/PyCharmCE2021.2/scratches/scratch_2.py", line 14, in <module>
    df = pd.read_csv(
  File "/home/ales/.envs/orange3/lib/python3.9/site-packages/pandas/io/parsers.py", line 686, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/ales/.envs/orange3/lib/python3.9/site-packages/pandas/io/parsers.py", line 458, in _read
    data = parser.read(nrows)
  File "/home/ales/.envs/orange3/lib/python3.9/site-packages/pandas/io/parsers.py", line 1196, in read
    ret = self._engine.read(nrows)
  File "/home/ales/.envs/orange3/lib/python3.9/site-packages/pandas/io/parsers.py", line 2155, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
    cdef:
  File "pandas/_libs/parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
    else:
  File "pandas/_libs/parsers.pyx", line 941, in pandas._libs.parsers.TextReader._read_rows
    self.parser.line_fields[i] + \
  File "pandas/_libs/parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_column_data
    col_res, na_count = self._convert_with_dtype(
  File "pandas/_libs/parsers.pyx", line 1149, in pandas._libs.parsers.TextReader._convert_tokens
    
ValueError: cannot safely convert passed user dtype of float64 for object dtyped data in column 1

Expected Behavior

float_precision="round_trip" should work the same as other float parsers.

Installed Versions

INSTALLED VERSIONS
------------------
commit           : e218f059caaf5d921c4c71b7a7889adfe6c74b9d
python           : 3.9.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.14.2-1-MANJARO
Version          : #1 SMP PREEMPT Wed Sep 8 14:11:01 UTC 2021
machine          : x86_64
processor        : 
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 1.4.0.dev0+497.ge218f059ca
numpy            : 1.21.2
pytz             : 2021.1
dateutil         : 2.8.2
pip              : 21.1.3
setuptools       : 56.0.0
Cython           : 0.29.24
pytest           : 6.2.5
hypothesis       : 6.21.6
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : 3.0.1
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 3.0.1
IPython          : 7.26.0
pandas_datareader: None
bs4              : None
bottleneck       : 1.3.2
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.4.3
numexpr          : None
odfpy            : None
openpyxl         : 3.0.7
pandas_gbq       : None
pyarrow          : None
pyxlsb           : None
s3fs             : None
scipy            : 1.7.1
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : 2.0.1
xlwt             : None
numba            : None

The text was updated successfully, but these errors were encountered:

ales-erjavec added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 23, 2021

ales-erjavec mentioned this issue Sep 23, 2021

BUG: round_trip parser initial/trailing whitespace #43714

Merged

4 tasks

jreback added this to the 1.4 milestone Sep 28, 2021

jreback added IO CSV read_csv, to_csv and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 28, 2021

jreback closed this as completed in #43714 Oct 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: read_csv float_precision="round_trip" parser does not handle initial/trailing spaces #43713

BUG: read_csv float_precision="round_trip" parser does not handle initial/trailing spaces #43713

ales-erjavec commented Sep 23, 2021

BUG: read_csv float_precision="round_trip" parser does not handle initial/trailing spaces #43713

BUG: read_csv float_precision="round_trip" parser does not handle initial/trailing spaces #43713

Comments

ales-erjavec commented Sep 23, 2021

Reproducible Example

Issue Description

Expected Behavior

Installed Versions