Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concat doesn't work with mixed Series/DataFrames #2385

Closed
dhirschfeld opened this issue Nov 29, 2012 · 16 comments · Fixed by #6438
Closed

concat doesn't work with mixed Series/DataFrames #2385

dhirschfeld opened this issue Nov 29, 2012 · 16 comments · Fixed by #6438
Labels
API Design Docs Enhancement Error Reporting Incorrect or improved errors from pandas
Milestone

Comments

@dhirschfeld
Copy link
Contributor

concat thows an AssertionError when passed a ciollection of mixed Series and Dataframes e.g.

s1 = pd.TimeSeries(np.sin(linspace(0, 2*pi, 100)), 
                   index=pd.date_range('01-Jan-2013', 
periods=100, freq='H')) 

s2 = pd.TimeSeries(np.cos(linspace(0, 2*pi, 100)), 
                  index=pd.date_range('01-Jan-2013', 
periods=100, freq='H')) 

df = pd.DataFrame(np.cos(linspace(0, 2*pi, 
100)).reshape(-1,1), 
                   index=pd.date_range('01-Jan-2013', 
periods=100, freq='H')) 

In [23]: pd.concat([df,df], axis=1).shape 
Out[23]: (100, 2) 

In [24]: pd.concat([s1,s2], axis=1).shape 
Out[24]: (100, 2) 

In [25]: pd.concat([s1,s2,s1], axis=1).shape 
Out[25]: (100, 3) 

In [26]: pd.concat([s1,df,s2], axis=1).shape 
Traceback (most recent call last):

  File "<ipython-input-9-4512588e71f2>", line 1, in <module>
    pd.concat([s1,df,s2], axis=1).shape

  File "c:\dev\code\pandas\pandas\tools\merge.py", line 881, in concat
    return op.get_result()

  File "c:\dev\code\pandas\pandas\tools\merge.py", line 960, in get_result
    columns=self.new_axes[1])

  File "c:\dev\code\pandas\pandas\core\frame.py", line 376, in __init__
    mgr = self._init_dict(data, index, columns, dtype=dtype)

  File "c:\dev\code\pandas\pandas\core\frame.py", line 505, in _init_dict
    dtype=dtype)

  File "c:\dev\code\pandas\pandas\core\frame.py", line 5181, in _arrays_to_mgr
    mgr = BlockManager(blocks, axes)

  File "c:\dev\code\pandas\pandas\core\internals.py", line 499, in __init__
    self._verify_integrity()

  File "c:\dev\code\pandas\pandas\core\internals.py", line 584, in _verify_integrity
    raise AssertionError('Block shape incompatible with manager')

AssertionError: Block shape incompatible with manager

In [27]: pd.__version__
Out[27]: '0.10.0.dev-fbd77d5'
@wesm
Copy link
Member

wesm commented Jan 19, 2013

Well that's not very helpful. Should be improved at some point

@hayd
Copy link
Contributor

hayd commented Mar 28, 2013

Am I right in saying these give better errors in dev?

pd.concat([s1,df,s2])
ValueError: arrays must have same number of dimensions

pd.concat([s1,df,s2], axis=1)
ValueError: Shape of passed values is (3,), indices imply (3, 100)

@jreback
Copy link
Contributor

jreback commented Mar 28, 2013

I like the first one, prob should catch the error in concat and put up a better message for the 2nd one though....

@hayd
Copy link
Contributor

hayd commented Mar 28, 2013

Agreed.

@jreback
Copy link
Contributor

jreback commented Mar 28, 2013

ok...let's move to 0.12 then...

@dhirschfeld
Copy link
Contributor Author

It's not the incorrect error which is the problem, but the fact that an error is thrown at all. This makes it difficult/tedious to write generic code that works whether or not the input is a DataFrame or a TimeSeries.

In this case the TimeSeries should be treated as a DataFrame with one column where the column label is taken from the TimeSeries name. It the axis parameter was 0 then the TimeSeries should be treated as a DataFrame with one row where the row label is taken from the TimeSeries name.

@jreback
Copy link
Contributor

jreback commented Apr 2, 2013

@dhirschfeld I think we will have a detailed look at this, but nothing is stopping you from pre-conditining, e.g.

 pd.concat([ pd.DataFrame(x) for x in [s1,df,s2] ], axis=1).shape

The DataFrame constructor does exactly what you want for a series with axis=1, and passed thru a DataFrame
(axis=0 you would need to pass the orient so it transposes I think)

@jorisvandenbossche
Copy link
Member

I also stumbled on this, but I saw there is already this issue for it.

I think in current master it is still very confusing. An example (with the series being a column from a dataframe, full example: http://nbviewer.ipython.org/5868420/bug_pandas-concat-series.ipynb ):

  • concat with [df1, df2[col]]: IndexError: list index out of range
  • concat with [df1[col], df2]: AttributeError: 'DataFrame' object has no attribute 'name'

I think at the moment it is not clear from the docs that this is not allowed (mixing dataframes with series/columns), and when you do it, you don't get a very informative error message.

@hayd
Copy link
Contributor

hayd commented Sep 1, 2013

So should work like this:

In [1]: df = pd.DataFrame([[1, 2], [3, 4]])

In [2]: s = pd.Series([5, 6], name=3)

In [3]: pd.concat([df, pd.DataFrame(s).T])  # but with [df, s]
Out[3]:
   0  1
0  1  2
1  3  4
3  5  6

In [4]: pd.concat([df, pd.DataFrame(s)], axis=1)  # but with [df. s]
Out[4]:
   0  1  3
0  1  2  5
1  3  4  6

# also both with [s, df]

@jreback
Copy link
Contributor

jreback commented Sep 1, 2013

series must have a name FYI (otherwise needs to raise)

@hayd
Copy link
Contributor

hayd commented Sep 1, 2013

atm it's a bit sketchy with name:

In [5]: s1 = pd.Series([5, 6])

In [6]: pd.concat([s, s1], axis=1)   # loses name of s
Out[6]:
   0  1
0  5  5
1  6  6

Not sure should always raise w/o name.

@jreback
Copy link
Contributor

jreback commented Sep 1, 2013

@hayd that's the problem, it "works" but really should not (as its assigned as if the column index doesn't exist). I would just be careful about it, maybe only allowing it if as_index=False or series has a name

@jtratner
Copy link
Contributor

jtratner commented Sep 1, 2013

not exactly the same situation, but append raises if the appended series doesn't have a name and axis=0, so certainly precedent for it.

@hayd
Copy link
Contributor

hayd commented Feb 22, 2014

Was suggestion above that these should be the same (not quite there in master): #2385 (comment)

df = pd.DataFrame([[1, 2], [3, 4]])
s = pd.Series([5, 6], name=3)

In [6]: pd.concat([df, s])
Out[6]:
   0   1
0  1   2
1  3   4
0  5 NaN
1  6 NaN

In [7]: pd.concat([df, pd.DataFrame(s).T])
Out[7]:
   0  1
0  1  2
1  3  4
3  5  6

open as sep issue?

Definitely looking good though!

@jreback
Copy link
Contributor

jreback commented Feb 22, 2014

I'll add them as tests

@jreback
Copy link
Contributor

jreback commented Feb 22, 2014

@hayd It seems that your above example should work, but how does concat know to transpose the input? (e.g. should the series be aligned column-wise (as is the default), or row-wise as you suggest?)

I could put a note that all alignment would be column-wise. (also maybe could add a keywork, align to deal with this, e.g. you could specify align='index' if you want row-wise alignment), but seems kind of over-kill

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Docs Enhancement Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants