read_json with dtype=False infers Missing Values as None #28501

WillAyd · 2019-09-18T15:24:02Z

Run against master:

In [13]: pd.read_json("[null]", dtype=True)
Out[13]:
    0
0 NaN

In [14]: pd.read_json("[null]", dtype=False)
Out[14]:
      0
0  None

I think the second above is an issue - should probably return np.nan instead of None

The text was updated successfully, but these errors were encountered:

chrisstpierre · 2019-10-02T16:13:05Z

I think that this isn't a bug. NaN is a numerical value. If dtype=False we shouldn't infer it to be a numerical value.

But if it it is an issue I can do this one.

WillAyd · 2019-10-03T03:04:10Z

Yea there is certainly some ambiguity here and @jorisvandenbossche might have some thoughts but I don't necessarily think the JSON null would map better to None than it would to np.nan.

read_csv would also convert this to NaN:

>>> pd.read_csv(io.StringIO("1\nnull"))
    1
0 NaN

scratchmex · 2020-03-27T05:29:26Z

Complementing what @chrisstpierre said, the null type in json can be numerical, string, list, ..., as seen in https://stackoverflow.com/questions/21120999/representing-null-in-json.

So setting up as np.nan even though dtype is set explicitly to False may be beneficial when working with numerical data only but not with strings for example. I think a np.nan in a column of strings should not be desirable.

mar-muel · 2020-06-01T14:26:46Z

Hi all - Just curious whether stringifying None in the following example is expected behaviour?

df = pd.read_json('[null]', dtype={0: str})
print(df.values)
# [['None']]

If yes, how can I avoid this while at the same time specifying the the column to be of str type?
Thanks for your help!

avinashpancham · 2020-11-14T16:38:13Z

take

jorisvandenbossche · 2020-11-20T14:39:29Z

read_csv would also convert this to NaN:

It's true that read_csv converts to NaN in a float dtype. With dtype=object (the way in read_csv to turn off dtype inference), you still get a NaN, but using object dtype (so this is different from the float dtype as what is now implemented by #37834)

Another point of comparison, when creating a Series from None without specified dtype, we also keep it as None with object dtype:

In [27]: pd.Series([None])
Out[27]: 
0    None
dtype: object

WillAyd added IO JSON read_json, to_json, json_normalize Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Sep 18, 2019

WillAyd added this to the Contributions Welcome milestone Sep 18, 2019

mroeschke added the Bug label May 8, 2020

simonjayhawkins mentioned this issue Jul 30, 2020

BUG: Type mismatch in read_json #35464

Open

github-actions bot assigned avinashpancham Nov 14, 2020

avinashpancham mentioned this issue Nov 14, 2020

BUG: Parse missing values using read_json with dtype=False to NaN instead of None (GH28501) #37834

Merged

5 tasks

jreback modified the milestones: Contributions Welcome, 1.2 Nov 14, 2020

jreback closed this as completed in #37834 Nov 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_json with dtype=False infers Missing Values as None #28501

read_json with dtype=False infers Missing Values as None #28501

WillAyd commented Sep 18, 2019

chrisstpierre commented Oct 2, 2019 •

edited

Loading

WillAyd commented Oct 3, 2019

scratchmex commented Mar 27, 2020

mar-muel commented Jun 1, 2020

avinashpancham commented Nov 14, 2020

jorisvandenbossche commented Nov 20, 2020

read_json with dtype=False infers Missing Values as None #28501

read_json with dtype=False infers Missing Values as None #28501

Comments

WillAyd commented Sep 18, 2019

chrisstpierre commented Oct 2, 2019 • edited Loading

WillAyd commented Oct 3, 2019

scratchmex commented Mar 27, 2020

mar-muel commented Jun 1, 2020

avinashpancham commented Nov 14, 2020

jorisvandenbossche commented Nov 20, 2020

chrisstpierre commented Oct 2, 2019 •

edited

Loading