Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Enable append mode to JSON lines #35849

Assignees
Labels
Enhancement IO JSON read_json, to_json, json_normalize

Comments

@nullhack
Copy link

nullhack commented Aug 22, 2020

Is your feature request related to a problem?

JSON Lines format allow appending to an existing file in a easy way (new line with valid JSON string). Historically DataFrame().to_json didn't allowmode="a" because It would introduce complications of reading/parsing/changing pure JSON strings. But for JSON lines It's done in an elegant way, as easy as a CSV files.

The pandas way of using JSON lines is setting orient='records' together with lines=True, but It lacks a mode="a" for append mode. My feature proposal (PR already done, just need review) is simple: include the capability of append mode (mode="a") on to_json IF orient='records' and lines=True

Describe the solution you'd like

The PR: #35832

If I got It right, the solution is simple:

  1. Include the argument mode: str = "w"on to_json this will NOT break anything as the default behavior continues as write mode
  2. Include a conditional in case mode="a", checking if orient='records' and lines=True, raising an Exception otherwise
  3. In case mode="a" and the file already exists and is not empty, then add a new line to the JSON string (s = convert_to_line_delimits(s)) before sending to handler
  4. Create the handler using the correct mode (get_handle(path_or_buf, mode, compression=compression) instead of the hardcoded get_handle(path_or_buf, "w", compression=compression) )
@nullhack nullhack added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 22, 2020
@bashtage
Copy link
Contributor

How much code is required to do the append yourself? Is it not just a case of passing an open file handle to an existing json file?

@nullhack
Copy link
Author

nullhack commented Aug 22, 2020

The parsing part is not much extra code. The part I struggled when creating an external module was that pandas can read/write multiple compression formats on the fly and I would need to re-implement case by case if I wanted to have support for the same formats. It was a natural choice to improve pandas instead. But...

The point here is not that I have a specific scenario and want to push it to pandas, but instead, I believe It's a missing capability of pandas for this specific (already supported) format. JSON lines should have an append mode for the same reasons append mode makes sense for CSV files. The format was created with easy appends as a feature.

E.g. A common scenario would be gathering data from batch script and append the changes to an existing file daily.

@nullhack nullhack mentioned this issue Aug 22, 2020
5 tasks
@nullhack nullhack changed the title ENH: Enable append mode on to_json if orient='records' and lines=True ENH: Enable append mode to JSON lines Aug 22, 2020
@gfyoung gfyoung added the IO JSON read_json, to_json, json_normalize label Aug 23, 2020
@nullhack
Copy link
Author

@gfyoung do you know maintainers/contributors that might be interested in commenting/reviewing this change?

As suggested, I created this issue just to raise discussion (as the PR is already done). But as not many comments were made, I'm trying to get some reviews here before conflicts can affect the PR and It becomes obsolete.

@jbrockmendel jbrockmendel removed the Needs Triage Issue that has not been reviewed by a pandas team member label Sep 3, 2020
@charizard-knows
Copy link

Hi. Can I get an update on this issue being fixed ?

@jreback
Copy link
Contributor

jreback commented Jul 7, 2022

@charizard-knows pandas is all volunteer

issue will be fixed when a community member does a pull request

pandas core will provide code review

@SFuller4
Copy link
Contributor

take

@SFuller4
Copy link
Contributor

@jreback I did a pull request for the code change. Looks like it failed a typing check but I can't see any information on why it failed. Do you have any insight on what the issue is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment