Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Error in read_excel() function with some ods files #45598

Closed
2 of 3 tasks
f4kt opened this issue Jan 24, 2022 · 7 comments · Fixed by #46050
Closed
2 of 3 tasks

BUG: Error in read_excel() function with some ods files #45598

f4kt opened this issue Jan 24, 2022 · 7 comments · Fixed by #46050
Assignees
Labels
Bug IO Excel read_excel, to_excel
Milestone

Comments

@f4kt
Copy link

f4kt commented Jan 24, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

pd.read_excel('exported_with_qgis.ods')

Issue Description

When taking an ods file generated from qgis (exporting a vector layer in ods format), its content.xml (internal file from ods) has newlines between XML elements, like this:

<table:table-row>
<table:table-cell office:value-type="string">
<text:p>CODIGO</text:p>
</table:table-cell>
...

instead of being like this:

<table:table-row><table:table-cell office:value-type="string"><text:p>CODIGO</text:p></table:table-cell>
...

then if I open with LibreOffice calc that file it's ok, but when doing:

pd.read_excel(ods_file)

then there is this traceback:

AttributeError: 'Text' object has no attribute 'qname'

Solution:
rewrite line no. 103 of https://github.com/pandas-dev/pandas/blob/main/pandas/io/excel/_odfreader.py

-sheet_cells = [x for x in sheet_row.childNodes if x.qname in cell_names]
+sheet_cells = [x for x in sheet_row.childNodes if 'qname' in dir(x) and x.qname in cell_names]

Expected Behavior

The expected behaviour is to ignore the newlines. LibreOffice calc opens correctly that kind of ods

Installed Versions

INSTALLED VERSIONS
------------------
commit           : 73c68257545b5f8530b7044f56647bd2db92e2ba
python           : 3.9.7.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.13.0-22-generic
Version          : #22-Ubuntu SMP Fri Nov 5 13:21:36 UTC 2021
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : es_ES.UTF-8
LOCALE           : es_ES.UTF-8

pandas           : 1.3.3
numpy            : 1.19.5
pytz             : 2021.1
dateutil         : 2.8.1
pip              : 20.3.4
setuptools       : 52.0.0
Cython           : None
pytest           : 6.0.2
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : 4.6.3
html5lib         : 1.1
pymysql          : None
psycopg2         : 2.8.6 (dt dec pq3 ext lo64)
jinja2           : 3.0.0
IPython          : None
pandas_datareader: None
bs4              : 4.9.3
bottleneck       : None
fsspec           : None
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.4
numexpr          : None
odfpy            : None
openpyxl         : 3.0.3
pandas_gbq       : None
pyarrow          : None
pyxlsb           : None
s3fs             : None
scipy            : 1.7.3
sqlalchemy       : 1.3.22
tables           : None
tabulate         : 0.8.9
xarray           : None
xlrd             : 1.2.0
xlwt             : 1.2.0
numba            : 0.54.1

@f4kt f4kt added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 24, 2022
@rhshadrach
Copy link
Member

Thanks for the report - can you provide a sample file?

@rhshadrach rhshadrach added IO Excel read_excel, to_excel Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 27, 2022
@f4kt
Copy link
Author

f4kt commented Jan 27, 2022

Of course! sample.ods

@rhshadrach rhshadrach removed the Needs Info Clarification about behavior needed to assess issue label Jan 28, 2022
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Jan 28, 2022
@rhshadrach
Copy link
Member

Thanks for the sample file - as far as I can tell as long as the XML is valid, it satisfies the format for ODS, and the sample file contains valid XML. Agreed this should be fixed, are you interested in putting up a PR @f4kt?

@f4kt
Copy link
Author

f4kt commented Jan 31, 2022

By these days I'm too busy for that, Richard, but thanks in advance for the offer :)

@dimitra-karadima
Copy link
Contributor

Hello @rhshadrach I am intrested in contributing to pandas, so can I take this issue, since it is still open? If you have any suggestions in mind please let me know!

@dimitra-karadima
Copy link
Contributor

take

@dimitra-karadima
Copy link
Contributor

Hello @f4kt if you can provide me a sample file with just a 3x3 table, it would be perfect for the test case I wanna add.

dimitra-karadima added a commit to dimitra-karadima/pandas that referenced this issue Feb 18, 2022
@jreback jreback modified the milestones: Contributions Welcome, 1.5 Mar 1, 2022
phofl pushed a commit that referenced this issue Mar 3, 2022
* BUG: error in read_excel with some ods files #45598

* BUG: use hasattr instead of dir

* DOC: add issue number in new test case

* DOC: remove comment

Co-authored-by: Dimitra Karadima <dkaradima@convertgroup.com>
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this issue Jul 13, 2022
…-dev#46050)

* BUG: error in read_excel with some ods files pandas-dev#45598

* BUG: use hasattr instead of dir

* DOC: add issue number in new test case

* DOC: remove comment

Co-authored-by: Dimitra Karadima <dkaradima@convertgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants