Using df.at() for assignment changes views of certain columns, but not all of them #22372

liesb · 2018-08-15T17:44:42Z

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd

def f2(seed=789, niter=10):
    rng = np.random.RandomState(seed=seed)
    df = pd.DataFrame(index=np.arange(5))
    df['x'] = rng.uniform(0, 2, size=5)
    df['y'] = rng.uniform(0, 2, size=5)
    df['cost'] = rng.uniform(0, 10, size=5)
    # print the initial df
    print(df)
    # for loop: replace the worst param combo and evaluate
    for i in range(niter):
        # idx of param combo with highest cost
        worst_idx = np.argmax(df.cost.values)
        # IF line below is commented out, I don't get the discrepancy
        centroid = df.loc[rng.choice(np.arange(5), size=1)].mean(axis=0)
        for p in ['x', 'y']:
            new_param = 4
            # THIS IS THE OFFENDING CALL to df.at! 
            # Change it to .loc and it doesn't give the same issue
            df.at[worst_idx, p] = new_param
#             df.loc[worst_idx, p] = new_param
        # The at below is not the offending at
        df.at[worst_idx, 'cost'] = rng.uniform(low=0, high=10)
#         df.loc[worst_idx, 'cost'] = rng.uniform(low=0, high=10, size=(1,))
    return df

test = f2()
test
test.cost
test[['cost']]
test.iloc[4]

Problem description

The cost column is displayed incorrectly when we use the command test. It is correct when we use test.cost but not when we use test[['cost']] or test.iloc[4]. The other columns are displayed correctly, it seems.
This doesn't happen when we use .loc() to assign new_param to the x and y columns.
It also doesn't happen when we comment out centroid = df.loc[rng.choice(np.arange(5), size=1)].mean(axis=0).

The current behaviour is problematic because choosing different ways of viewing the data yield different answers, which shouldn't be the case.

Expected Output

The cost column in test would correspond to test.cost.
test[['cost']] and test.cost give the same results
test.iloc[4] displays the 4th element of test.cost

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 36.4.0
Cython: None
numpy: 1.15.0
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

WillAyd · 2018-08-15T18:08:45Z

Quite a few things being thrown out there but unfortunately it's not explicit enough as to what issue(s) you are trying to address. Also the expectation that test[['cost']] and test.cost are the same is incorrect, as the former returns a frame whereas the latter returns a series

Please refer to the contributing guide on how to open bug reports. The biggest thing here is to post a minimally reproducible example:

https://pandas.pydata.org/pandas-docs/stable/contributing.html#bug-reports-and-enhancement-requests

If you can provide the minimally reproducible example and clarify what the issue is exactly feel free to reopen

liesb · 2018-08-15T18:18:14Z

I'm sorry, did you even run the code? While I understand that `test[['cost']]` and test.cost are different objects, they still should contain _the same values_. They don't. I'll work on a more minimal example.

…

On Wed, Aug 15, 2018 at 7:09 PM, William Ayd ***@***.***> wrote: Quite a few things being thrown out there but unfortunately it's not explicit enough as to what issue(s) you are trying to address. Also the expectation that test[['cost']] and test.cost are the same is incorrect, as the former returns a frame whereas the latter returns a series Please refer to the contributing guide on how to open bug reports. The biggest thing here is to post a *minimally* reproducible example: https://pandas.pydata.org/pandas-docs/stable/ contributing.html#bug-reports-and-enhancement-requests If you can provide the minimally reproducible example and clarify what the issue is exactly feel free to reopen — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#22372 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFm_93fsRy4LoVNICqzkz1nC-pSs1d3wks5uRGPmgaJpZM4V-hG4> .

jreback · 2018-08-15T18:23:09Z

@liesb pls post a minimal example - you have lots of things going on which are unrelated
it’s actually very very hard to tell if anything is amiss

liesb · 2018-08-15T19:05:09Z

Minimal example: I'm changing the value in the cost column of df from 2 to 789. However, I don't see this when I use df, df[['cost']] or df.iloc[0]; it is visible when using df.cost.

import numpy as np
import pandas as pd

df = pd.DataFrame(index=[0])
df['x'] = 1
df['cost'] = 2
print(df)
idx = np.argmax(df.cost.values)
centroid = df.loc[[0]]
new_param = 4
df.at[idx, 'x'] = new_param
df.at[idx, 'cost'] = 789
print(df)
df.cost 
df[['cost']]
df.iloc[0]

liesb · 2018-08-15T19:13:57Z

I'm adding the pd.showversions() output again as I used a different machine:

INSTALLED VERSIONS ------------------ commit: None python: 3.6.6.final.0 python-bits: 64 OS: Darwin OS-release: 17.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 40.0.0
Cython: None
numpy: 1.15.0
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

WillAyd · 2018-08-15T21:35:09Z

OK thanks for the example. Just to remove unnecessary prints and make it more concise:

df = pd.DataFrame(index=[0])
df['x'] = 1
df['cost'] = 2
idx = np.argmax(df.cost.values)
centroid = df.loc[[0]]
new_param = 4
df.at[idx, 'x'] = new_param
df.at[idx, 'cost'] = 789

print(df['cost'])  # Print 789
print(df[['cost']])  # Prints 2

It is strange that the frame shows the original value before the .at assignment.

Note that almost any change to this to make it idiomatic in construction or remove elements is fine, for example even doing:

df = pd.DataFrame([[1, 2]], columns=['x', 'cost'], index=[0])
idx = np.argmax(df.cost.values)
centroid = df.loc[[0]]
new_param = 4
df.at[idx, 'x'] = new_param
df.at[idx, 'cost'] = 789

print(df['cost'])  # Print 789
print(df[['cost']])  # Prints 789

My guess is there is some wonky state management going on. Investigation and PRs are always welcome

jbrockmendel · 2020-09-22T03:48:01Z

df._item_cache needs to be cleared, not sure exactly where

jbrockmendel · 2022-01-09T07:24:17Z

Looks like the centroid = df.loc[[0]] line is triggering a "consolidate_inplace" without clearing the _item_cache.

jbrockmendel · 2022-01-09T07:27:50Z

_getitem_iterable calls self.obj._reindex_with_indexers which calls mgr.reindex_indexer which calls _consolidate_inplace

WillAyd added Usage Question Needs Info Clarification about behavior needed to assess issue labels Aug 15, 2018

WillAyd closed this as completed Aug 15, 2018

WillAyd reopened this Aug 15, 2018

WillAyd added Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Info Clarification about behavior needed to assess issue Usage Question labels Aug 15, 2018

WillAyd added Bug and removed Indexing Related to indexing on series/frames, not to indexes themselves labels Aug 15, 2018

mroeschke added the Indexing Related to indexing on series/frames, not to indexes themselves label Nov 2, 2019

jbrockmendel mentioned this issue Jan 9, 2022

BUG: frame[x].loc[y] inconsistent with frame.at[x, y] GH#22372 #45287

Merged

4 tasks

jreback added this to the 1.5 milestone Jan 10, 2022

jreback closed this as completed in #45287 Jan 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using df.at() for assignment changes views of certain columns, but not all of them #22372

Using df.at() for assignment changes views of certain columns, but not all of them #22372

liesb commented Aug 15, 2018

INSTALLED VERSIONS

WillAyd commented Aug 15, 2018

liesb commented Aug 15, 2018 via email •

edited

Loading

jreback commented Aug 15, 2018

liesb commented Aug 15, 2018

liesb commented Aug 15, 2018

WillAyd commented Aug 15, 2018

jbrockmendel commented Sep 22, 2020

jbrockmendel commented Jan 9, 2022

jbrockmendel commented Jan 9, 2022

Using df.at() for assignment changes views of certain columns, but not all of them #22372

Using df.at() for assignment changes views of certain columns, but not all of them #22372

Comments

liesb commented Aug 15, 2018

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

WillAyd commented Aug 15, 2018

liesb commented Aug 15, 2018 via email • edited Loading

jreback commented Aug 15, 2018

liesb commented Aug 15, 2018

liesb commented Aug 15, 2018

WillAyd commented Aug 15, 2018

jbrockmendel commented Sep 22, 2020

jbrockmendel commented Jan 9, 2022

jbrockmendel commented Jan 9, 2022

Output of `pd.show_versions()`

liesb commented Aug 15, 2018 via email •

edited

Loading