Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-unique column names in fixed format HDF5 #7761

Closed
filmor opened this issue Jul 15, 2014 · 5 comments · Fixed by #7788
Closed

Non-unique column names in fixed format HDF5 #7761

filmor opened this issue Jul 15, 2014 · 5 comments · Fixed by #7788
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore
Milestone

Comments

@filmor
Copy link
Contributor

filmor commented Jul 15, 2014

This leads to an error on read, writing is possible however. Since the format has a fundamental problem with this case if there are columns with the same name but different data types this is not fixable.

However, df.to_hdf should just not write the respective dataframe if the column index is not unique.

@jreback
Copy link
Contributor

jreback commented Jul 15, 2014

can u put up an example for the tests

@jreback jreback added this to the 0.15.0 milestone Jul 15, 2014
@filmor
Copy link
Contributor Author

filmor commented Jul 17, 2014

df = pd.DataFrame(columns=["a"])
df2 = pd.concat([df, df], axis=1)
df2.to_hdf("test.h5", "test")
pd.read_hdf("test.h5", "test")

Only the last call fails with an InvalidIndexError: Reindexing only valid with uniquely valued Index objects in Index.get_indexer. You could silence the error by using get_indexer_non_unique instead but as described I think the reading will be broken if you have different datatypes in the columns.

It can be fixed by adding a check for the uniqueness of the column index in the fixed format. Note that this problem does not occur for the table format.

@jreback
Copy link
Contributor

jreback commented Jul 17, 2014

perfect. want to do a PR to validate and raise ValueError (could also be NotImplementedError) on writing?

@filmor
Copy link
Contributor Author

filmor commented Jul 17, 2014

I'll build one with tests, yes. Is it enough to fix this in BlockManagerFixed.write?

@jreback
Copy link
Contributor

jreback commented Jul 17, 2014

@filmor I believe so. Put in a test with table format as well (if you can't find a test with that is similar; if you do then put the tests together)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants