Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_json with dtype=False infers Missing Values as None #28501

Closed
WillAyd opened this issue Sep 18, 2019 · 6 comments · Fixed by #37834
Closed

read_json with dtype=False infers Missing Values as None #28501

WillAyd opened this issue Sep 18, 2019 · 6 comments · Fixed by #37834
Assignees
Labels
Bug IO JSON read_json, to_json, json_normalize Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@WillAyd
Copy link
Member

WillAyd commented Sep 18, 2019

Run against master:

In [13]: pd.read_json("[null]", dtype=True)
Out[13]:
    0
0 NaN

In [14]: pd.read_json("[null]", dtype=False)
Out[14]:
      0
0  None

I think the second above is an issue - should probably return np.nan instead of None

@WillAyd WillAyd added IO JSON read_json, to_json, json_normalize Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Sep 18, 2019
@WillAyd WillAyd added this to the Contributions Welcome milestone Sep 18, 2019
@chrisstpierre
Copy link

chrisstpierre commented Oct 2, 2019

I think that this isn't a bug. NaN is a numerical value. If dtype=False we shouldn't infer it to be a numerical value.

But if it it is an issue I can do this one.

@WillAyd
Copy link
Member Author

WillAyd commented Oct 3, 2019

Yea there is certainly some ambiguity here and @jorisvandenbossche might have some thoughts but I don't necessarily think the JSON null would map better to None than it would to np.nan.

read_csv would also convert this to NaN:

>>> pd.read_csv(io.StringIO("1\nnull"))
    1
0 NaN

@scratchmex
Copy link

Complementing what @chrisstpierre said, the null type in json can be numerical, string, list, ..., as seen in https://stackoverflow.com/questions/21120999/representing-null-in-json.

So setting up as np.nan even though dtype is set explicitly to False may be beneficial when working with numerical data only but not with strings for example. I think a np.nan in a column of strings should not be desirable.

@mroeschke mroeschke added the Bug label May 8, 2020
@mar-muel
Copy link

mar-muel commented Jun 1, 2020

Hi all - Just curious whether stringifying None in the following example is expected behaviour?

df = pd.read_json('[null]', dtype={0: str})
print(df.values)
# [['None']]

If yes, how can I avoid this while at the same time specifying the the column to be of str type?
Thanks for your help!

@avinashpancham
Copy link
Contributor

take

@jorisvandenbossche
Copy link
Member

read_csv would also convert this to NaN:

It's true that read_csv converts to NaN in a float dtype. With dtype=object (the way in read_csv to turn off dtype inference), you still get a NaN, but using object dtype (so this is different from the float dtype as what is now implemented by #37834)

Another point of comparison, when creating a Series from None without specified dtype, we also keep it as None with object dtype:

In [27]: pd.Series([None])
Out[27]: 
0    None
dtype: object

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO JSON read_json, to_json, json_normalize Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
8 participants