Plotting Int64 columns with nulled integers (NAType) fails #32073

Khris777 · 2020-02-18T09:31:52Z

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [7, 5, np.nan, 3, 2]})
df.plot(x='A', y='B')
df = df.astype('Int64')
df.plot(x='A', y='B')

Problem description

The first plotting command works, the second throws the error message

TypeError: float() argument must be a string or a number, not 'NAType'

Expected Output

NAType should be treated the same way as numpy nan in plotting. Maybe transformed on the fly?

(I'm unsure if this is a pandas, a numpy, or a matplotlib issue, I'm starting here)

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200209
Cython : 0.29.15
pytest : None
hypothesis : None
sphinx : 2.4.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.5.0
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.12.0
pandas_datareader: None
bs4 : 4.8.2
bottleneck : None
fastparquet : 0.3.3
gcsfs : None
lxml.etree : 4.5.0
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.16.0
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
xlsxwriter : 1.2.7
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-02-18T13:50:39Z

Can you post the full traceback?

Khris777 · 2020-02-19T07:51:25Z

Gladly:

Traceback (most recent call last):

  File "C:\Users\My.Name\Documents\Python_Projects\GIT\miscellaneous\temp.py", line 14, in <module>
    df.plot(x='A', y='B')

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\plotting\_core.py", line 847, in __call__
    return plot_backend.plot(data, kind=kind, **kwargs)

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\plotting\_matplotlib\__init__.py", line 61, in plot
    plot_obj.generate()

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\plotting\_matplotlib\core.py", line 263, in generate
    self._make_plot()

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\plotting\_matplotlib\core.py", line 1085, in _make_plot
    **kwds,

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\plotting\_matplotlib\core.py", line 1104, in _plot
    lines = MPLPlot._plot(ax, x, y_values, style=style, **kwds)

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\plotting\_matplotlib\converter.py", line 66, in wrapper
    return func(*args, **kwargs)

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\pandas\plotting\_matplotlib\core.py", line 656, in _plot
    return ax.plot(*args, **kwds)

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\matplotlib\axes\_axes.py", line 1667, in plot
    self.add_line(line)

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\matplotlib\axes\_base.py", line 1902, in add_line
    self._update_line_limits(line)

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\matplotlib\axes\_base.py", line 1924, in _update_line_limits
    path = line.get_path()

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\matplotlib\lines.py", line 1027, in get_path
    self.recache()

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\matplotlib\lines.py", line 675, in recache
    y = _to_unmasked_float_array(yconv).ravel()

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\matplotlib\cbook\__init__.py", line 1388, in _to_unmasked_float_array
    return np.ma.asarray(x, float).filled(np.nan)

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\numpy\ma\core.py", line 7849, in asarray
    subok=False, order=order)

  File "C:\Users\My.Name\AppData\Local\Continuum\miniconda3\envs\main\lib\site-packages\numpy\ma\core.py", line 2795, in __new__
    order=order, subok=True, ndmin=ndmin)

TypeError: float() argument must be a string or a number, not 'NAType'

jorisvandenbossche · 2020-02-19T07:54:51Z

We need to convert to floats with NaNs before passing the data to matplotlib, I suppose.

jeandersonbc · 2020-02-19T16:00:49Z

Hi! Can I make an attempt on this one? I'm looking for issues that can be useful to the community and help me to understand better how pandas works in depth :)

jorisvandenbossche · 2020-02-19T20:46:41Z

@jeandersonbc yes, that's welcome!
If you have any questions, don't hesitate to ask

AnnaDaglis · 2020-02-20T16:37:27Z

take

jorisvandenbossche · 2020-02-20T22:39:33Z

@AnnaDaglis someone else (@jeandersonbc) already just commented he would like to look at it, so I would first give him some time before taking this issue

jeandersonbc · 2020-02-21T09:07:43Z

Thanks @jorisvandenbossche, I didn't manage time to work on the issue since last time that I posted so that's why I didn't reply early.
As far as I can see, it is my understanding from your suggestion that there should be verification for NA values before calling plot_backend.plot(data, kind=kind, **kwargs), right? I'll be submitting a PR later today with some tests to see how it goes.

jorisvandenbossche · 2020-02-21T09:42:42Z

there should be verification for NA values before calling plot_backend.plot(data, kind=kind, **kwargs)

Yes, I think so. But the check might need to be done lower in the stack, as when calling plot_backend.plot(), data is still a dataframe. I think we can focus here on the matplotlib backend included in pandas, and thus we need to ensure converting to float dtype (with to_numpy(float, na_value=np.nan) if it is nullable integer dtype) before passing it to matplotlib's plot().

jeandersonbc · 2020-02-21T12:11:03Z

take

jeandersonbc · 2020-03-02T00:10:38Z

So, I approached the problem by checking the numerical data in _compute_plot_data from matplot's backend. all tests pass, but I wonder if more tests should be added (e.g., at "pandas/tests/plotting/test_frame.py"). Any thoughts? Thanks already!

MarcoGorelli · 2020-05-22T14:13:21Z

Hi @AnnaDaglis - looks like the linked PR went stale (unfortunately), are you still interested in working on this?

cvanweelden · 2020-06-20T10:10:38Z

take

cvanweelden · 2020-07-07T09:34:50Z

I'm no longer working on this, as my minimal fix wasn't accepted and this might need a more structural solution for 3rd party lib operations on nullable types.

vetedde · 2020-07-13T20:20:01Z

take

rkc007 · 2020-10-19T06:03:23Z

It's been 2 months since nobody is working on it. I am interested to work on this issue.

rkc007 · 2020-10-19T06:04:02Z

take

MarcoGorelli · 2020-10-19T08:40:39Z

It's been 2 months since nobody is working on it. I am interested to work on this issue.

Awesome! Let us know if you want/need help

jankaWIS · 2022-07-02T13:45:11Z

Hi, I have a comment on this issue. It happened to me in v1.2.4, but it seems like it has been fixed now (v1.4.3) it has been fixed, but I have not found where or when it happened. I believe it's connected and could save time for someone who encounters this error while plotting with matplotlib:

ValueError: values must be a 1D array

If one runs the following code:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'x' : [1,2,3,4,5],
    'y' : [1.,2.,3.,4.1,5]
})

print(df.dtypes)
# x      int64
# y    float64
# dtype: object

# plot
plt.plot(df["x"],df["y"])

# convert types
df = df.convert_dtypes()

print(df.dtypes)
# x      Int64
# y    Float64
# dtype: object

# plot
plt.plot(df["x"],df["y"])

ie. if one converts dtypes to pandas dtypes, suddenly plotting with matplotlib fails. The code above will plot the first plot but after one converts the types, it fails. Notice that it doesn't matter if the variable is Float64 or Int64, ie. just plotting plt.plot(df["x"],df["x"]) or plt.plot(df["y"],df["y"]) will yield the same.

The full output is below:

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) ~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_values(self, indexer) 936 try: --> 937 return self._constructor(self._mgr.get_slice(indexer)).__finalize__(self) 938 except ValueError:

~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/managers.py in get_slice(self, slobj, axis)
1606 blk = self._block
-> 1607 array = blk._slice(slobj)
1608 block = blk.make_block_same_class(array, placement=slice(0, len(array)))

~/anaconda3/lib/python3.8/site-packages/pandas/core/internals/blocks.py in _slice(self, slicer)
1923
-> 1924 return self.values[slicer]
1925

~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in getitem(self, item)
114
--> 115 return type(self)(self._data[item], self._mask[item])
116

~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/integer.py in init(self, values, mask, copy)
347 )
--> 348 super().init(values, mask, copy=copy)
349

~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in init(self, values, mask, copy)
89 if values.ndim != 1:
---> 90 raise ValueError("values must be a 1D array")
91 if mask.ndim != 1:

ValueError: values must be a 1D array

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
/var/folders/bx/tb4883l53hdd3zp2y0nyy_4m0000gp/T/ipykernel_92470/2350386431.py in
16 print(df.dtypes)
17 # plot
---> 18 plt.plot(df["x"],df["y"])

~/anaconda3/lib/python3.8/site-packages/matplotlib/pyplot.py in plot(scalex, scaley, data, *args, **kwargs)
3017 @_copy_docstring_and_deprecators(Axes.plot)
3018 def plot(*args, scalex=True, scaley=True, data=None, **kwargs):
-> 3019 return gca().plot(
3020 *args, scalex=scalex, scaley=scaley,
3021 **({"data": data} if data is not None else {}), **kwargs)

~/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_axes.py in plot(self, scalex, scaley, data, *args, **kwargs)
1603 """
1604 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)
-> 1605 lines = [*self._get_lines(*args, data=data, **kwargs)]
1606 for line in lines:
1607 self.add_line(line)

~/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_base.py in call(self, data, *args, **kwargs)
313 this += args[0],
314 args = args[1:]
--> 315 yield from self._plot_args(this, kwargs)
316
317 def get_next_color(self):

~/anaconda3/lib/python3.8/site-packages/matplotlib/axes/_base.py in _plot_args(self, tup, kwargs, return_kwargs)
488
489 if len(xy) == 2:
--> 490 x = _check_1d(xy[0])
491 y = _check_1d(xy[1])
492 else:

~/anaconda3/lib/python3.8/site-packages/matplotlib/cbook/init.py in _check_1d(x)
1360 message='Support for multi-dimensional indexing')
1361
-> 1362 ndim = x[:, None].ndim
1363 # we have definitely hit a pandas index or series object
1364 # cast to a numpy array.

~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in getitem(self, key)
875 return self._get_values(key)
876
--> 877 return self._get_with(key)
878
879 def _get_with(self, key):

~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_with(self, key)
890 )
891 elif isinstance(key, tuple):
--> 892 return self._get_values_tuple(key)
893
894 elif not is_list_like(key):

~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_values_tuple(self, key)
920 # mpl hackaround
921 if com.any_none(*key):
--> 922 result = self._get_values(key)
923 deprecate_ndim_indexing(result, stacklevel=5)
924 return result

~/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_values(self, indexer)
940 # see tests.series.timeseries.test_mpl_compat_hack
941 # the asarray is needed to avoid returning a 2D DatetimeArray
--> 942 return np.asarray(self._values[indexer])
943
944 def _get_value(self, label, takeable: bool = False):

~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in getitem(self, item)
113 item = check_array_indexer(self, item)
114
--> 115 return type(self)(self._data[item], self._mask[item])
116
117 def _coerce_to_array(self, values) -> Tuple[np.ndarray, np.ndarray]:

~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/integer.py in init(self, values, mask, copy)
346 "the 'pd.array' function instead"
347 )
--> 348 super().init(values, mask, copy=copy)
349
350 def neg(self):

~/anaconda3/lib/python3.8/site-packages/pandas/core/arrays/masked.py in init(self, values, mask, copy)
88 )
89 if values.ndim != 1:
---> 90 raise ValueError("values must be a 1D array")
91 if mask.ndim != 1:
92 raise ValueError("mask must be a 1D array")

ValueError: values must be a 1D array

In case it was relevant, this was my installation at that time:

INSTALLED VERSIONS

commit : 2cb9652
python : 3.8.11.final.0
python-bits : 64
OS : Darwin
OS-release : 21.5.0
Version : Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:22 PDT 2022; root:xnu-8020.121.3~4/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.4
numpy : 1.22.3
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.2
setuptools : 52.0.0.post20210125
Cython : 0.29.24
pytest : 6.2.5
hypothesis : None
sphinx : 4.0.2
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.26.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 2021.07.0
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.3
sqlalchemy : 1.4.22
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.19.0
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1

TomAugspurger added Needs Info Clarification about behavior needed to assess issue Visualization plotting labels Feb 18, 2020

jorisvandenbossche added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays and removed Needs Info Clarification about behavior needed to assess issue labels Feb 19, 2020

jorisvandenbossche added this to the Contributions Welcome milestone Feb 19, 2020

jorisvandenbossche added the good first issue label Feb 19, 2020

raphacosta27 mentioned this issue Feb 19, 2020

Issues de sala de aula 2020/1 Insper/open-dev#198

Closed

github-actions bot assigned AnnaDaglis Feb 20, 2020

AnnaDaglis removed their assignment Feb 21, 2020

github-actions bot assigned jeandersonbc Feb 21, 2020

This was referenced Feb 28, 2020

BUG: fixes unhandled NAType when plotting (#32073) #32337

Closed

Plotting Int64 columns with nulled integers (NAType) fails #32073 #32387

Closed

jeandersonbc mentioned this issue Mar 3, 2020

BUG: fixes plotting with nullable integers (#32073) #32410

Closed

5 tasks

SultanOrazbayev mentioned this issue Jun 14, 2020

BUG: problem converting between new dtypes ('string' and 'Int8' / 'Int64') #34759

Closed

2 tasks

github-actions bot assigned cvanweelden Jun 20, 2020

cvanweelden mentioned this issue Jun 20, 2020

BUG: Fixes plotting with nullable integers (#32073) #34896

Closed

5 tasks

cvanweelden removed their assignment Jul 7, 2020

github-actions bot assigned vetedde Jul 13, 2020

github-actions bot assigned rkc007 Oct 19, 2020

rkc007 mentioned this issue Nov 23, 2020

BUG: Fixes plotting with nullable integers (#32073) #38014

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.2 Dec 4, 2020

rkc007 mentioned this issue Dec 7, 2020

STYLE use types_or in pre-commit #38022

Closed

jreback closed this as completed in #38014 Dec 7, 2020

jreback pushed a commit that referenced this issue Dec 7, 2020

BUG: Fixes plotting with nullable integers (#32073) (#38014)

0aa598a

jeandersonbc removed their assignment Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plotting Int64 columns with nulled integers (NAType) fails #32073

Plotting Int64 columns with nulled integers (NAType) fails #32073

Khris777 commented Feb 18, 2020

INSTALLED VERSIONS

TomAugspurger commented Feb 18, 2020

Khris777 commented Feb 19, 2020

jorisvandenbossche commented Feb 19, 2020

jeandersonbc commented Feb 19, 2020

jorisvandenbossche commented Feb 19, 2020

AnnaDaglis commented Feb 20, 2020

jorisvandenbossche commented Feb 20, 2020

jeandersonbc commented Feb 21, 2020

jorisvandenbossche commented Feb 21, 2020

jeandersonbc commented Feb 21, 2020

jeandersonbc commented Mar 2, 2020 •

edited

Loading

MarcoGorelli commented May 22, 2020

cvanweelden commented Jun 20, 2020

cvanweelden commented Jul 7, 2020

vetedde commented Jul 13, 2020

rkc007 commented Oct 19, 2020

rkc007 commented Oct 19, 2020

MarcoGorelli commented Oct 19, 2020

jankaWIS commented Jul 2, 2022 •

edited

Loading

INSTALLED VERSIONS

Plotting Int64 columns with nulled integers (NAType) fails #32073

Plotting Int64 columns with nulled integers (NAType) fails #32073

Comments

Khris777 commented Feb 18, 2020

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Feb 18, 2020

Khris777 commented Feb 19, 2020

jorisvandenbossche commented Feb 19, 2020

jeandersonbc commented Feb 19, 2020

jorisvandenbossche commented Feb 19, 2020

AnnaDaglis commented Feb 20, 2020

jorisvandenbossche commented Feb 20, 2020

jeandersonbc commented Feb 21, 2020

jorisvandenbossche commented Feb 21, 2020

jeandersonbc commented Feb 21, 2020

jeandersonbc commented Mar 2, 2020 • edited Loading

MarcoGorelli commented May 22, 2020

cvanweelden commented Jun 20, 2020

cvanweelden commented Jul 7, 2020

vetedde commented Jul 13, 2020

rkc007 commented Oct 19, 2020

rkc007 commented Oct 19, 2020

MarcoGorelli commented Oct 19, 2020

jankaWIS commented Jul 2, 2022 • edited Loading

INSTALLED VERSIONS

Output of `pd.show_versions()`

jeandersonbc commented Mar 2, 2020 •

edited

Loading

jankaWIS commented Jul 2, 2022 •

edited

Loading