Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame.__finalize__ not called in pd.concat #6927

Closed
wcbeard opened this issue Apr 22, 2014 · 2 comments · Fixed by #6931
Closed

DataFrame.__finalize__ not called in pd.concat #6927

wcbeard opened this issue Apr 22, 2014 · 2 comments · Fixed by #6931
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@wcbeard
Copy link
Contributor

wcbeard commented Apr 22, 2014

When I assign metadata to a df

import numpy as np
import pandas as pd
np.random.seed(10)

pd.DataFrame._metadata = ['filename']
df1 = pd.DataFrame(np.random.randint(0, 4, (3, 2)), columns=list('ab'))
df1.filenames = {'a': 'f1', 'b': 'f2'}
df1
       a  b
    0  1  1
    1  0  3
    2  0  1

and define a __finalize__ that prints when it's called

def finalize_df(self, other, method=None, **kwargs):
    print 'finalize called'
    for name in self._metadata:
        object.__setattr__(self, name, getattr(other, name, None))
    return self

pd.DataFrame.__finalize__ = finalize_df

nothing is preserved when pd.concat is called:

stacked = pd.concat([df1, df1])  # Nothing printed
stacked
       a  b
    0  1  1
    1  0  3
    2  0  1
    0  1  1
    1  0  3
    2  0  1
stacked.finalize  # => AttributeError 

For this specific case it seems reasonable that __finalize__ should be used since all of the elements are from the same dataframe, though I'm not sure about the general use since concat can also take types other than a DataFrame. But should we/do we have some method to stack dataframes that preserves metadata?

Similar to #6923.

@jreback
Copy link
Contributor

jreback commented Apr 22, 2014

#6927 wll fix/test this; but this is all up to the user at this point. E.g. the default will do nothing. If their is some reasonable set of rules we can adopt them in the future. The problem is just about everything is arbitrary at this point.

@jreback
Copy link
Contributor

jreback commented Apr 22, 2014

In [1]:         o = Series(range(3),range(3))

In [2]:         o.name = 'foo'

In [3]:         o2 = Series(range(3),range(3))

In [4]:         o2.name = 'bar'

In [5]:         Series._metadata = ['name','filename']

In [6]:         o.filename = 'foo'

In [7]:         o2.filename = 'bar'

:        def finalize(self, other, method=None, **kwargs):
:            for name in self._metadata:
:                if method == 'concat' and name == 'filename':
:                    value = '+'.join([ getattr(o,name) for o in other.objs if getattr(o,name,None) ])
:                    object.__setattr__(self, name, value)
:                else:
:                    object.__setattr__(self, name, getattr(other, name, None))
:
:            return self

In [13]: Series.__finalize__ = finalize

In [14]: result = pd.concat([o, o2])

In [15]: result.name

In [16]: result.filename
Out[16]: 'foo+bar'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
2 participants