Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug on .to_string(index=False) #24980

Closed
simontorres opened this issue Jan 28, 2019 · 15 comments · Fixed by #36094
Closed

Bug on .to_string(index=False) #24980

simontorres opened this issue Jan 28, 2019 · 15 comments · Fixed by #36094
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string Regression Functionality that used to work in a prior pandas version Strings String extension data type and string data
Milestone

Comments

@simontorres
Copy link

Code Sample, a copy-pastable example if possible

Related issue (very old though) #11833

import pandas

print("pandas.__version__ == {:s}".format(pandas.__version__))

columns = ['name', 'age']
values = [
    ['name_one', '31'],
    ['name_two', '32']]
data_frame = pandas.DataFrame(values, columns=columns)

# filter

filtered = data_frame[(data_frame['name'] == 'name_one') & (data_frame['age'] == '31')]
# single value comes with a blank at the beginning when running 0.24.0

print("==={:s}===".format(filtered.name.to_string(index=False)))

The output of this running on pandas 0.24.0 is:
(I put this in a file called pandasbug.py)

(goodman_pipeline) [simon@ctioy9 sandbox]$ python3.7 pandasbug.py 
pandas.__version__ == 0.24.0
=== name_one===

Problem description

Exactly when 0.24.0 was released my automatic builds started to fail. The problem, a file not found.
This was erroneously reported by me on astropy/ccdproc#658
I'm using a pandas data frame to filter a set of reference files and when extracting the file name and join it to a full path it turns out a non-existing path such as:
/full/path/to/[unwanted-blank-here]file_name.fits (/full/path/to/ file_name.fits)

This travis build will show the real-world error caused, if you can access it.

Expected Output

In my system I have pandas 0.20.3 working with python 3.6 and the latests working build ran pandas 0.23.4

(goodman_pipeline) [simon@ctioy9 sandbox]$ python3.6 pandasbug.py 
pandas.__version__ == 0.20.3
===name_one===

Output of pd.show_versions()

>>> import pandas >>> pandas.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-1.el7.elrepo.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.24.0
pytest: 4.1.1
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.2
numpy: 1.15.2
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: None
sphinx: 1.8.3
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@simontorres simontorres changed the title Bug on ``.to_string(index=False) Bug on .to_string(index=False) Jan 28, 2019
@simontorres simontorres changed the title Bug on .to_string(index=False) Bug on .to_string(index=False) Jan 28, 2019
@mroeschke
Copy link
Member

Here's a simpler example. Agreed that the extra whitespace is not expected. Investigations and PR's welcome!

In [28]: pd.Series('a').to_string(index=False)
Out[28]: ' a'

In [29]: pd.__version__
Out[29]: '0.25.0.dev0+12.g2b16e2e6c.dirty'

@mroeschke mroeschke added Bug Output-Formatting __repr__ of pandas objects, to_string Strings String extension data type and string data labels Jan 28, 2019
@jorisvandenbossche jorisvandenbossche added the Regression Functionality that used to work in a prior pandas version label Jan 28, 2019
@jorisvandenbossche jorisvandenbossche added this to the 0.24.1 milestone Jan 28, 2019
@charlesdong1991
Copy link
Member

charlesdong1991 commented Jan 29, 2019

seems this is brought up on purpose in pandas/io/formats/format.py, default looks like is None:

if leading_space is False:
     # False specifically, so that the default is
     # to include a space if we get here.
     tpl = u'{v}'
else:
     tpl = u' {v}'

@TomAugspurger
Copy link
Contributor

@charlesdong1991 that was added for a different reason (formatting ExtensionArrays IIRC).

We could maybe pass leading_space=self.leading_space in Seriesormatter._get_formatted_values in io/formats/format.py.

@charlesdong1991
Copy link
Member

charlesdong1991 commented Jan 29, 2019

@TomAugspurger thanks for your reply... yes, i did try setting leading_space to False if self.index is False in Seriesformatter._get_formatted_values, and this error did go away...

@TomAugspurger
Copy link
Contributor

Pushing to 0.24.2

@simontorres
Copy link
Author

Any news on this? I'm still having the same issue on version 0.25.3

@jreback
Copy link
Contributor

jreback commented Jan 7, 2020

there is an open LR #29670 but i believe was quite tricky changing this - help welcome

@simontorres
Copy link
Author

Thanks for your reply. I was just checking it and it seems there is only one conflict. I will wait if someone already experimented fixes it or I will become an expert myself and try to help :)

@onshek
Copy link
Contributor

onshek commented Sep 2, 2020

Is anyone here still working on this?

@simontorres
Copy link
Author

Not really, I just fixed my version to one in which this does not exists. I wish I could fix it but I haven't had the chance.

@charlesdong1991
Copy link
Member

@onshek Hi, feel free to work on it, there is a stale PR #29670 which you could reference as a starting point. #29670 i recall could solve the issue correctly, however, there are still several comments you need to address, the comments are in #25000 . Hope it helps you to start!

@onshek
Copy link
Contributor

onshek commented Sep 3, 2020

@onshek Hi, feel free to work on it, there is a stale PR #29670 which you could reference as a starting point. #29670 i recall could solve the issue correctly, however, there are still several comments you need to address, the comments are in #25000 . Hope it helps you to start!

Hi @charlesdong1991 , it seems in #29670 you have already solved the errors, so what's the problem at that time?

@charlesdong1991
Copy link
Member

it was quite long ago, I cannot remember the details, I think the main concern back then was the solution in #29670 is kind of a hotfix, and might not be generalized.

There are some comments in #25000 and you can take a look @onshek A PR to solve it is certainly welcome!

@onshek
Copy link
Contributor

onshek commented Sep 3, 2020

it was quite long ago, I cannot remember the details, I think the main concern back then was the solution in #29670 is kind of a hotfix, and might not be generalized.

There are some comments in #25000 and you can take a look @onshek A PR to solve it is certainly welcome!

I've changed all leading_space = 'compat' to leading_space = True and this passed all tests (223 in total) modified and added in #29670. It seems there's no need to change leading_space: Optional[bool] = None to leading_space: Union[str, bool] = "compat". I'll make a PR after I take care of the rest codes in test_to_latex.py if you don't mind.

@charlesdong1991
Copy link
Member

go-ahead for opening a PR, and make any changes you need. I don't mind at all ^^ @onshek

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string Regression Functionality that used to work in a prior pandas version Strings String extension data type and string data
Projects
None yet
7 participants