BUG: Can't use iloc to set a subset of a dataframe to a two-dimensional categorical data array. #44703

mvashishtha · 2021-12-01T09:17:55Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
from numpy import array
df = pd.DataFrame({"status": ["a", "b", "c"]},  dtype="category")
df.iloc[array([0, 1]), array([0])] = array([['a'], ['a']])

Issue Description

I can't use iloc set a subset of a dataframe to a two-dimensional categorical data array. The reproducible example gives this stack trace:

Click to see stack trace

TypeError                                 Traceback (most recent call last)
<ipython-input-10-d6ae260d158c> in <module>
    2 from numpy import array
    3 df = pd.DataFrame({"status": ["a", "b", "c"]},  dtype="category")
----> 4 df.iloc[array([0, 1]), array([0])] = array([['a'], ['a']])

~/pandas/pandas/core/indexing.py in __setitem__(self, key, value)
  708
  709         iloc = self if self.name == "iloc" else self.obj.iloc
--> 710         iloc._setitem_with_indexer(indexer, value, self.name)
  711
  712     def _validate_key(self, key, axis: int):

~/pandas/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value, name)
 1671             self._setitem_with_indexer_split_path(indexer, value, name)
 1672         else:
-> 1673             self._setitem_single_block(indexer, value, name)
 1674
 1675     def _setitem_with_indexer_split_path(self, indexer, value, name: str):

~/pandas/pandas/core/indexing.py in _setitem_single_block(self, indexer, value, name)
 1919
 1920         # actually do the set
-> 1921         self.obj._mgr = self.obj._mgr.setitem(indexer=indexer, value=value)
 1922         self.obj._maybe_update_cacher(clear=True, inplace=True)
 1923

~/pandas/pandas/core/internals/managers.py in setitem(self, indexer, value)
  335         For SingleBlockManager, this backs s[indexer] = value
  336         """
--> 337         return self.apply("setitem", indexer=indexer, value=value)
  338
  339     def putmask(self, mask, new, align: bool = True):

~/pandas/pandas/core/internals/managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
  302                     applied = b.apply(f, **kwargs)
  303                 else:
--> 304                     applied = getattr(b, f)(**kwargs)
  305             except (TypeError, NotImplementedError):
  306                 if not ignore_failures:

~/pandas/pandas/core/internals/blocks.py in setitem(self, indexer, value)
 1513
 1514         check_setitem_lengths(indexer, value, self.values)
-> 1515         self.values[indexer] = value
 1516         return self
 1517

~/pandas/pandas/core/arrays/_mixins.py in __setitem__(self, key, value)
  249     def __setitem__(self, key, value):
  250         key = check_array_indexer(self, key)
--> 251         value = self._validate_setitem_value(value)
  252         self._ndarray[key] = value
  253

~/pandas/pandas/core/arrays/categorical.py in _validate_setitem_value(self, value)
 1445         if not is_hashable(value):
 1446             # wrap scalars and hashable-listlikes in list
-> 1447             return self._validate_listlike(value)
 1448         else:
 1449             return self._validate_scalar(value)

~/pandas/pandas/core/arrays/categorical.py in _validate_listlike(self, value)
 2080         # something to np.nan
 2081         if len(to_add) and not isna(to_add).all():
-> 2082             raise TypeError(
 2083                 "Cannot setitem on a Categorical with a new "
 2084                 "category, set the categories first"

TypeError: Cannot setitem on a Categorical with a new category, set the categories first

The problem seems to only come up when the dataframe has a single column. If I just add another column to the dataframe, it works:

import pandas as pd
from numpy import array
df = pd.DataFrame({"status": ["a", "b", "c"], "status2": ["d", "e", "f"]},  dtype="category")
df.iloc[array([0, 1]), array([0])] = array([['a'], ['a']])

Expected Behavior

I expect to see that the dataframe has the status column set to "a" for the first two rows.

status
0      a   
1      a   
2      c

Installed Versions

INSTALLED VERSIONS

commit : 4e139f6
python : 3.8.12.final.0
python-bits : 64
OS : Darwin
OS-release : 20.6.0
Version : Darwin Kernel Version 20.6.0: Wed Jun 23 00:26:31 PDT 2021; root:xnu-7195.141.2~5/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.0.dev0+1273.g4e139f69ff
numpy : 1.21.4
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 59.4.0
Cython : 0.29.24
pytest : 6.2.5
hypothesis : 6.29.0
sphinx : 4.3.1
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.6.4
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.30.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.11.0
fastparquet : 0.7.2
gcsfs : 2021.11.0
matplotlib : 3.5.0
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : 6.0.1
pyxlsb : None
s3fs : 2021.11.0
scipy : 1.7.3
sqlalchemy : 1.4.27
tables : 3.6.1
tabulate : 0.8.9
xarray : 0.18.2
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2021-12-01T19:19:17Z

This would be fixed by having 2D EAs. Until then I think we can kludge together a patch in ExtensionBlock.setitem

jbrockmendel · 2021-12-01T19:38:34Z

Yes, can fix this following #44514

mvashishtha · 2022-01-03T17:17:19Z

@jreback Thank you for fixing this! Once the fix is in the pandas version that Modin requires, I'll clean up the workaround modin-project/modin#3801.

jreback · 2022-01-03T23:15:29Z

hah @jbrockmendel actually made the patch here

mvashishtha · 2022-01-04T15:47:38Z

Thanks @jbrockmendel !

mvashishtha added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 1, 2021

mvashishtha changed the title ~~BUG: Can't use iloc set a subset of a dataframe to a two-dimensional categorical data array.~~ BUG: Can't use iloc to set a subset of a dataframe to a two-dimensional categorical data array. Dec 1, 2021

mvashishtha mentioned this issue Dec 1, 2021

Inconsistent behavior with Categorical.add_categories modin-project/modin#3736

Closed

jbrockmendel mentioned this issue Dec 2, 2021

BUG: df.iloc[ndarray, ndarray] = 2dvals raising with EA column #44714

Merged

4 tasks

This was referenced Dec 3, 2021

FIX-#3736: Don't use a 2-d array to set a categorical column. modin-project/modin#3777

Merged

Clean up the workaround for setting a single categorical column to a scalar. modin-project/modin#3801

Closed

jreback added this to the 1.4 milestone Dec 22, 2021

jreback added ExtensionArray Extending pandas with custom dtypes or arrays. Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 22, 2021

jreback closed this as completed in #44714 Dec 24, 2021

mvashishtha mentioned this issue Feb 4, 2022

FIX-#3981, FIX-#3801, FIX-#4149: Stop broadcasting scalars to set items. modin-project/modin#4160

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Can't use iloc to set a subset of a dataframe to a two-dimensional categorical data array. #44703

BUG: Can't use iloc to set a subset of a dataframe to a two-dimensional categorical data array. #44703

mvashishtha commented Dec 1, 2021

INSTALLED VERSIONS

jbrockmendel commented Dec 1, 2021

jbrockmendel commented Dec 1, 2021

mvashishtha commented Jan 3, 2022 •

edited

Loading

jreback commented Jan 3, 2022

mvashishtha commented Jan 4, 2022

BUG: Can't use iloc to set a subset of a dataframe to a two-dimensional categorical data array. #44703

BUG: Can't use iloc to set a subset of a dataframe to a two-dimensional categorical data array. #44703

Comments

mvashishtha commented Dec 1, 2021

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

jbrockmendel commented Dec 1, 2021

jbrockmendel commented Dec 1, 2021

mvashishtha commented Jan 3, 2022 • edited Loading

jreback commented Jan 3, 2022

mvashishtha commented Jan 4, 2022

mvashishtha commented Jan 3, 2022 •

edited

Loading