Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support logical and/or (bitwise ops) for nullable IntegerArray #34463

Closed
3 tasks done
eubene opened this issue May 29, 2020 · 4 comments · Fixed by #46104
Closed
3 tasks done

ENH: Support logical and/or (bitwise ops) for nullable IntegerArray #34463

eubene opened this issue May 29, 2020 · 4 comments · Fixed by #46104
Labels
Enhancement NA - MaskedArrays Related to pd.NA and nullable extension arrays

Comments

@eubene
Copy link

eubene commented May 29, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas: cc6348


Logical and/or (&, |) on integer Series returns an integer Series:

>>> import pandas as pd
>>> df = pd.DataFrame({'A': [1, 1, 0, 0], 'B': [1, 0, 1, 0]})
>>> df.A & df.B
0    1
1    0
2    0
3    0
dtype: int64

Doing this for a nullable IntegerArray, I expect the result to be another IntegerArray, but it raises a TypeError:

>>> df.A.astype('Int64') & df.B.astype('Int64')

Traceback

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "(snip)/pandas/core/ops/common.py", line 65, in new_method
    return method(self, other)
  File "(snip)/pandas/core/ops/__init__.py", line 393, in wrapper
    res_values = logical_op(lvalues, rvalues, op)
  File "(snip)/pandas/core/ops/array_ops.py", line 336, in logical_op
    res_values = op(lvalues, rvalues)
TypeError: unsupported operand type(s) for &: 'IntegerArray' and 'IntegerArray'

Output of pd.show_versions()

INSTALLED VERSIONS

commit : cc63484
python : 3.7.7.final.0
python-bits : 64
OS : Darwin
OS-release : 19.4.0
Version : Darwin Kernel Version 19.4.0: Wed Mar 4 22:28:40 PST 2020; root:xnu-6153.101.6~15/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0.dev0+1710.gcc634842a
numpy : 1.18.4
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 41.2.0
Cython : 0.29.19
pytest : 5.4.2
hypothesis : 5.16.0
sphinx : 3.0.4
blosc : 1.9.1
feather : None
xlsxwriter : 1.2.8
lxml.etree : 4.5.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.14.0
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fastparquet : 0.4.0
gcsfs : None
matplotlib : 3.2.1
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.17
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.49.1

@eubene eubene added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 29, 2020
@jorisvandenbossche jorisvandenbossche added NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels May 29, 2020
@dsaxton
Copy link
Member

dsaxton commented May 29, 2020

Maybe I'm missing something, but even the output for int64 seems buggy / strange (mostly the third element for each):

import pandas as pd

left = pd.Series([0, 1, 2, 3])
right = pd.Series([0, 0, 1, 2])
print(left | right)
# 0    0
# 1    1
# 2    3
# 3    3
# dtype: int64
print(left & right)
# 0    0
# 1    0
# 2    0
# 3    2
# dtype: int64

@eubene
Copy link
Author

eubene commented May 29, 2020

Looks like the strangeness comes from numpy. In any case a separate issue from the error with Int64 arrays I think.

>>> import numpy as np
>>> np.array([0, 1, 2, 3]) | np.array([0, 0, 1, 2])
array([0, 1, 3, 3])
>>> np.array([0, 1, 2, 3]) & np.array([0, 0, 1, 2])
array([0, 0, 0, 2])

@jorisvandenbossche
Copy link
Member

And not even numpy, as python does the same (example for the third elements):

In [2]: 2 | 1
Out[2]: 3

In [3]: 2 & 1
Out[3]: 0

Note that this are bitwise operators.
Refreshing my knowledge on bits, gives (showing up to 4 bits, to interpret from right to left):

2 = 0010
1 = 0001
2 | 1 = 0011 -> 3
2 & 1 = 0000 -> 0

So given that this is a quite specific operation, I think it was simply not a priority to implement and test for the new nullable integer dtype.

@jorisvandenbossche jorisvandenbossche changed the title BUG: Logical and/or raises TypeError for nullable IntegerArray ENH: Support logical and/or (bitwise ops) for nullable IntegerArray May 30, 2020
@dsaxton
Copy link
Member

dsaxton commented May 30, 2020

Note that this are bitwise operators.

Ah that makes sense, I was comparing to the built-in "or"

@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone May 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants