REGR: fix error caused by deprecation warning in pd.merge when joining two Series #47946

nsarang · 2022-08-03T18:02:05Z

Tests added and passed if fixing a bug or adding a new feature
All code checks passed.

There is a small bug/typo in the warning of message of pd.merge when merging two Series of different levels which results in an exception.

Thrown error message:
AttributeError: 'Series' object has no attribute 'columns'

phofl

Could you post the current warning?

nsarang · 2022-08-03T18:14:46Z

@phofl There's no warning. An exception occurs while processing the warning, and I've included the message in my original post.

The reason for the exception is that left and right might be a Series and they don't have the columns attribute. While _left and _right are the converted DataFrames.

phofl · 2022-08-03T18:16:20Z

Sorry, did not understand you then.

could you please add a test?

phofl · 2022-08-03T18:17:36Z

And a whatsnew in 1.4.4, this is a regression

nsarang · 2022-08-03T18:19:17Z

@phofl No worries. I'll add one in a few hours.

jorisvandenbossche · 2022-08-12T09:28:27Z

@nsarang would you have time to add a test ?

nsarang · 2022-08-12T15:05:52Z

@jorisvandenbossche Forgot about this one!
I've added the test now. The explanation for the test is that using to_frame on a Series with a multi-level name will result in a multi-level column DataFrame (to_frame is used in pd.merge._validate_operand). So it'll create the right condition where the levels of left and right in pd.merge are not equal.

phofl · 2022-08-12T15:10:18Z

pandas/tests/reshape/merge/test_merge.py

@@ -2167,7 +2169,7 @@ def test_merge_series(on, left_on, right_on, left_index, right_index, nm):
        name=nm,
    )
    expected = DataFrame(
-        {"A": [2, 4], "B": [1, 3]},
+        {"A": [2, 4], nm: [1, 3]},


Could you please add a dedicated test for this?

I'll add one

pandas/tests/reshape/merge/test_merge.py

phofl

Thx, looks good now. Could you add a whatsnew to v1.4.4,rst?

nsarang · 2022-08-14T15:15:27Z

Sure

nsarang · 2022-08-14T15:21:15Z

@phofl I can't reference the PR in the release notes since its path is not added to sphinx.ext.extlinks. Do you want me to add it?

phofl · 2022-08-14T15:46:29Z

It is sufficient if you add something like Fixed regression in :func:`merge` when...

to https://github.com/pandas-dev/pandas/blob/main/doc/source/whatsnew/v1.4.4.rst

You don't have to do anything else. Maybe you did not merge main in a while?

nsarang · 2022-08-14T16:35:07Z

Fine by me. Just wanted to make sure it's okay that the release note is not having a reference.

Maybe you did not merge main in a while?

I don't follow. Can you explain?

jorisvandenbossche · 2022-08-19T08:35:57Z

I pushed a small patch to ensure we catch the warning in the test (ideally, our tests don't cause warnings from pandas, if a warning is expected, it should be either asserted or explicitly ignored).

However, doing that also made we wonder why we are actually raising a deprecation warning here. The case from the test:

In [3]: merge(a, b, on=["outer", "inner"])
<ipython-input-3-4caf266e26f3>:1: FutureWarning: merging between different levels is deprecated and will be removed in a future version. (1 levels on the left, 2 on the right)
  merge(a, b, on=["outer", "inner"])
Out[3]: 
             A  (B, C)
outer inner           
a     1      2       1
b     1      4       3

In [4]: a
Out[4]: 
             A
outer inner   
a     0      1
      1      2
b     0      3
      1      4

In [5]: b
Out[5]: 
outer  inner
a      1        1
       2        2
b      1        3
       2        4
Name: (B, C), dtype: int64

So the original a and b don't have different number of levels in the columns. It's only because the Series is converted to a DataFrame that pandas itself introduces that. It's of course quite a corner case ..

jorisvandenbossche

Regardless of my previous comment, this PR certainly already avoids the error, so this looks good to merge. Thanks @nsarang!

phofl · 2022-08-19T12:18:27Z

@jorisvandenbossche

We wanted to get rid of value dependent behaviour. Your op looks "correct", but if "a" also has a midx, then the tuple name is converted into a MultiIndex itself:

a = DataFrame(
    {("A", "B"): 1},
    index=MultiIndex.from_product([["a"], [0, 1]], names=["o", "i"]),
)
b = Series(
    1,
    index=MultiIndex.from_product([["a"], [1, 2]], names=["o", "i"]),
    name=("B", "C"),
)
pd.merge(b, a, on=["o", "i"])

And the more complicated case:

a = DataFrame(
    {("A", "B"): 1},
    index=MultiIndex.from_product([["a"], [0, 1]], names=["o", "i"]),
)
b = Series(
    1,
    index=MultiIndex.from_product([["a"], [1, 2]], names=["o", "i"]),
    name=("B", "C", "D"),
)
pd.merge(a, b, on=["o", "i"])

raises: ValueError: Length of names must match number of levels in MultiIndex.

If the DataFrame has more levels than our tuple name has elements, we flatten the DataFrame MultiIndex columns to a tuple too:

a = DataFrame(
    {("A", "B", "C"): 1},
    index=MultiIndex.from_product([["a"], [0, 1]], names=["o", "i"]),
)
b = Series(
    1,
    index=MultiIndex.from_product([["a"], [1, 2]], names=["o", "i"]),
    name=("B", "C"),
)
pd.merge(a, b, on=["o", "i"])

     (A, B, C)  (B, C)
o i                   
a 1          1       1

phofl · 2022-08-19T12:57:21Z

thx @nsarang

simonjayhawkins · 2022-08-19T13:58:22Z

doc/source/whatsnew/v1.4.4.rst

@@ -19,6 +19,7 @@ Fixed regressions
 - Fixed regression in :meth:`DataFrame.loc` not updating the cache correctly after values were set (:issue:`47867`)
 - Fixed regression in :meth:`DataFrame.loc` not aligning index in some cases when setting a :class:`DataFrame` (:issue:`47578`)
 - Fixed regression in setting ``None`` or non-string value into a ``string``-dtype Series using a mask (:issue:`47628`)
+- Fixed regression in :func:`merge` throwing an error when passing a :class:`Series` with a multi-level name


if there is no issue then the PR number should be added as the issue instead

(imo that should never happen though as an issue should always be raised to document the issue. there are many reasons to do this, help users find it, avoid duplicate issues, allow proper triage and give others an opportunity for discussion, and keep these out of the PR discussion as appears to have happen here)

and also sometimes PRs go stale so the discussion in those cases is effectively lost.

Good point, thanks for pointing out Simon!

I agree with the sentiment. The only reason that I didn't create an issue in the first place was that I thought the bug happened because of a typo and that this PR should be straightforward. Aside from this, I want to point out two things:

Now that you've decided to reference the PR instead of the issue, you may want to add the base path of Github PRs to sphinx.ext.extlinks. From what I'm seeing, only the base path of issues is provided in the configuration file.

FYI, I ran into this bug when I was applying a complex function to a Series after a multi-level GroupBy.

Now that you've decided to reference the PR instead of the issue, you may want to add the base path of Github PRs to sphinx.ext.extlinks. From what I'm seeing, only the base path of issues is provided in the configuration file.

GitHub itself handles those links fine. For example the link for this PR (#47946), also works (redirects) if used with "issues": #47946

Uh I see. Thanks @jorisvandenbossche !

…g two Series (pandas-dev#47946) * Fix bug in warning * add test * add note to whatsnew * Add gh ref * catch warning in the test Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

simonjayhawkins · 2022-08-19T14:39:18Z

And a whatsnew in 1.4.4, this is a regression

and to be clear, this is a regression from 1.0.5?

phofl · 2022-08-19T14:41:17Z

I've added the deprecation in the 1.3 series, which causes the error

simonjayhawkins · 2022-08-19T14:45:13Z

the code sample (test) was raising an exception before that

~/miniconda3/envs/pandas-1.1.5/lib/python3.9/site-packages/pandas/core/reshape/merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    639                 "merging between different levels can give an unintended "
    640                 f"result ({left.columns.nlevels} levels on the left,"
--> 641                 f"{right.columns.nlevels} on the right)"
    642             )
    643             warnings.warn(msg, UserWarning)

~/miniconda3/envs/pandas-1.1.5/lib/python3.9/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5139             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5140                 return self[name]
-> 5141             return object.__getattribute__(self, name)
   5142 
   5143     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'columns'

phofl · 2022-08-19T14:53:48Z

Ok this is weird. I remember testing this and I thought I installed 1.2.5 for testing purposes. But can not reproduce now. Should we avoid backporting then?

simonjayhawkins · 2022-08-19T14:58:33Z

the fix seems low risk so (imo) ok with the backport.

however, if there is an outstanding issue about whether we should be raising a warning here #47946 (comment), then should open an issue about that to ensure it is not forgotten. Whether any further changes get backported would be part of that discussion.

phofl · 2022-08-19T15:02:55Z

Ok, got it.

I think the warning is oldish, the current behaviour is definitely buggy (#47946 (comment)), so in principal I am ok with he warning, but we could also avoid treating tuples as MultiIndexes. Disallowing this would avoid any potential value dependent bugs

…pd.merge when joining two Series (#48154) Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: Nima Sarang <nimasarang@gmail.com>

jorisvandenbossche · 2022-08-20T07:03:23Z

Yes, indeed with the current behaviour in other cases, the warning seems correct.
It's maybe worth a more general question whether we want to continue interpreting a tuple label as MultiIndex (but if you select a Series from a DataFrame with MI columns, you also get such tuple label, so it makes some sense to do that)

…g two Series (pandas-dev#47946) * Fix bug in warning * add test * add note to whatsnew * Add gh ref * catch warning in the test Co-authored-by: Patrick Hoefler <61934744+phofl@users.noreply.github.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

Fix bug in warning

3cf2dd4

phofl reviewed Aug 3, 2022

View reviewed changes

phofl added Regression Functionality that used to work in a prior pandas version Warnings Warnings that appear or should be added to pandas labels Aug 3, 2022

phofl added this to the 1.4.4 milestone Aug 3, 2022

phofl reviewed Aug 12, 2022

View reviewed changes

pandas/tests/reshape/merge/test_merge.py Outdated Show resolved Hide resolved

phofl reviewed Aug 12, 2022

View reviewed changes

pandas/tests/reshape/merge/test_merge.py Outdated Show resolved Hide resolved

add test

45eae35

nsarang force-pushed the patch-1 branch from a755775 to 45eae35 Compare August 12, 2022 22:38

phofl reviewed Aug 13, 2022

View reviewed changes

nsarang and others added 2 commits August 14, 2022 12:35

add note to whatsnew

e2d21e3

Add gh ref

336ec74

phofl approved these changes Aug 18, 2022

View reviewed changes

jorisvandenbossche added 2 commits August 19, 2022 10:25

Merge remote-tracking branch 'upstream/main' into patch-1

ef38210

catch warning in the test

436a0ad

jorisvandenbossche changed the title ~~Fix bug in pd.merge when joining two Series~~ REGR: fix error caused by deprecation warning in pd.merge when joining two Series Aug 19, 2022

jorisvandenbossche approved these changes Aug 19, 2022

View reviewed changes

Merge branch 'main' into patch-1

7e69d02

phofl merged commit 236cedf into pandas-dev:main Aug 19, 2022

This comment was marked as resolved.

Sign in to view

lumberbot-app bot added the Still Needs Manual Backport label Aug 19, 2022

simonjayhawkins reviewed Aug 19, 2022

View reviewed changes

phofl mentioned this pull request Aug 19, 2022

Backport PR #47946: REGR: fix error caused by deprecation warning in pd.merge when joining two Series #48154

Merged

phofl removed the Still Needs Manual Backport label Aug 19, 2022

phofl mentioned this pull request Aug 19, 2022

DOC: Add pr number to whatsnew #48155

Merged

simonjayhawkins added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Aug 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: fix error caused by deprecation warning in pd.merge when joining two Series #47946

REGR: fix error caused by deprecation warning in pd.merge when joining two Series #47946

nsarang commented Aug 3, 2022

phofl left a comment

nsarang commented Aug 3, 2022

phofl commented Aug 3, 2022

phofl commented Aug 3, 2022

nsarang commented Aug 3, 2022

jorisvandenbossche commented Aug 12, 2022

nsarang commented Aug 12, 2022

phofl Aug 12, 2022

nsarang Aug 12, 2022

phofl left a comment

nsarang commented Aug 14, 2022

nsarang commented Aug 14, 2022

phofl commented Aug 14, 2022

nsarang commented Aug 14, 2022

jorisvandenbossche commented Aug 19, 2022

jorisvandenbossche left a comment

phofl commented Aug 19, 2022 •

edited

Loading

phofl commented Aug 19, 2022

This comment was marked as resolved.

simonjayhawkins Aug 19, 2022

simonjayhawkins Aug 19, 2022

jorisvandenbossche Aug 20, 2022

nsarang Aug 21, 2022

jorisvandenbossche Aug 22, 2022 •

edited

Loading

nsarang Aug 22, 2022

simonjayhawkins commented Aug 19, 2022

phofl commented Aug 19, 2022 •

edited

Loading

simonjayhawkins commented Aug 19, 2022

phofl commented Aug 19, 2022

simonjayhawkins commented Aug 19, 2022

phofl commented Aug 19, 2022 •

edited

Loading

jorisvandenbossche commented Aug 20, 2022

REGR: fix error caused by deprecation warning in pd.merge when joining two Series #47946

REGR: fix error caused by deprecation warning in pd.merge when joining two Series #47946

Conversation

nsarang commented Aug 3, 2022

phofl left a comment

Choose a reason for hiding this comment

nsarang commented Aug 3, 2022

phofl commented Aug 3, 2022

phofl commented Aug 3, 2022

nsarang commented Aug 3, 2022

jorisvandenbossche commented Aug 12, 2022

nsarang commented Aug 12, 2022

phofl Aug 12, 2022

Choose a reason for hiding this comment

nsarang Aug 12, 2022

Choose a reason for hiding this comment

phofl left a comment

Choose a reason for hiding this comment

nsarang commented Aug 14, 2022

nsarang commented Aug 14, 2022

phofl commented Aug 14, 2022

nsarang commented Aug 14, 2022

jorisvandenbossche commented Aug 19, 2022

jorisvandenbossche left a comment

Choose a reason for hiding this comment

phofl commented Aug 19, 2022 • edited Loading

phofl commented Aug 19, 2022

This comment was marked as resolved.

simonjayhawkins Aug 19, 2022

Choose a reason for hiding this comment

simonjayhawkins Aug 19, 2022

Choose a reason for hiding this comment

jorisvandenbossche Aug 20, 2022

Choose a reason for hiding this comment

nsarang Aug 21, 2022

Choose a reason for hiding this comment

jorisvandenbossche Aug 22, 2022 • edited Loading

Choose a reason for hiding this comment

nsarang Aug 22, 2022

Choose a reason for hiding this comment

simonjayhawkins commented Aug 19, 2022

phofl commented Aug 19, 2022 • edited Loading

simonjayhawkins commented Aug 19, 2022

phofl commented Aug 19, 2022

simonjayhawkins commented Aug 19, 2022

phofl commented Aug 19, 2022 • edited Loading

jorisvandenbossche commented Aug 20, 2022

phofl commented Aug 19, 2022 •

edited

Loading

jorisvandenbossche Aug 22, 2022 •

edited

Loading

phofl commented Aug 19, 2022 •

edited

Loading

phofl commented Aug 19, 2022 •

edited

Loading