Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix array deletion bug #2662

Closed
wants to merge 3 commits into from
Closed

Conversation

nicowilliams
Copy link
Contributor

@nicowilliams nicowilliams commented Jul 6, 2023

TBD:

  • fix the man page tests
  • add more tests than just the man page tests
    • in particular add some sanity checks that nesting is correct
  • maybe instead of adding opcodes add an immediate 16-bit integer to express the default/forward/reverse option for EACH and EACH_OPT, and for PATH_BEGIN

@nicowilliams
Copy link
Contributor Author

@pkoppstein a small gift. I forget which issue (number) this relates to...

@itchyny
Copy link
Contributor

itchyny commented Jul 6, 2023

The issue is #2051.

@itchyny
Copy link
Contributor

itchyny commented Jul 6, 2023

But just reversing the path of EACH does not resolve all the issues here; [range(5)] | (.[1,3]) |= empty | . == [0,2,4] or even more mean example [range(5)] | (.[1,3,2]) |= empty | . == [0,4]. Following example is a bit surprising for users.

 $ ./jq -nc '[range(5)] | path_reverse(.[1],.[3],.[2])'
[1]
[3]
[2]

@nicowilliams
Copy link
Contributor Author

But just reversing the path of EACH does not resolve all the issues here; [range(5)] | (.[1,3]) |= empty | . == [0,2,4] or even more mean example [range(5)] | (.[1,3,2]) |= empty | . == [0,4]. [...]

Excellent observation.

If you have literal array indices going forwards like that, then we might could consider that user error. The real problem is that one might write ... | (.[indices_to_delete] |= empty), and then the user would really have to know to reverse array indices (... | (.[[indices_to_delete]|reverse[]] |= empty), or do it in the function indices_to_delete).

I'm not sure how to fix this more generally because the jq/jv model is that jv values are notionally copy-on-write. This is the most critical internals detail of the jq implementation. |= is really a function called _modify/2 (def _modify(paths, update): ...;) that reduces over the paths in the original input applying update to the values at each path then replacing the value in the reduction state with the new value (or deleting the value at that path, if the new value is empty). What we're doing when we say reduce path_reverse(paths) as $p (.; update) (a simplified form of |=/_modify) is iterate over the paths of the input, yes, but the value we update is a copy that we keep mutating. We don't and can't mutate the value in place that we are finding paths in. So we'd have to do something very different.

But I can't see how to jq-code _modify w/o having some form of this problem crop up. We'd need a completely different implementation of path/1 that takes a value, a path expression, and the previous path found (so, path/2) and produces the next path in that value, then we could share a one-refcount value between path iteration and mutation. But such a path/2 would have to be even more special than the current and very magical path/1. This strikes me as quite challenging.

But certainly, fixing |= so that array iteration is reversed is probably the simplest way to make any progress here for now.

[...]. Following example is a bit surprising for users.

 $ ./jq -nc '[range(5)] | path_reverse(.[1],.[3],.[2])'
[1]
[3]
[2]

Maybe, but the description for path_reverse/1 is quite specific about it only reversing .[] and .[]?, and only for array values.

I suppose I could add a note about how .[exp] is not reversed, and that users should do that manually.

It is true that one can avoid .[] in |= paths and use a function which iterates arrays backwards, so technically speaking we could just document that that is what one should do and close this PR :) But I'd rather finish this PR and get the 80% solution that it probably is.

In the meantime, the PATH_that tests/mantest` fails with:

Test #66: 'map(select(. >= 2))' at line number 274
*** Expected [5,3,7], but got [7,3,5] for test at line number 276: map(select(. >= 2))

That's this test:

      - title: "`select(boolean_expression)`"
        body: |

          The function `select(f)` produces its input unchanged if
          `f` returns true for that input, and produces no output
          otherwise.

          It's useful for filtering lists: `[1,2,3] | map(select(. >= 2))`
          will give you `[2,3]`.

        examples:
          - program: 'map(select(. >= 2))'
            input: '[1,5,3,0,7]'
            output: ['[5,3,7]']

and this is due to my logic in PATH_END handling being a bit buster. I need to re-push the jq->path_reverse state after the stack_save() so that on backtrack we can restore it.

@nicowilliams
Copy link
Contributor Author

I saw a comment earlier in some issue about how jaq solves this by finding paths in the mutated value. I've not looked at the inernals of jaq at all.

@itchyny
Copy link
Contributor

itchyny commented Jul 6, 2023

So years ago I gave up ordering the paths for deleting correctly, and the idea of the fix #2133 is collecting the deletion paths and use delpaths at last.

@nicowilliams
Copy link
Contributor Author

nicowilliams commented Jul 6, 2023

Ay, I did not read before writing, but yes, collecting the paths to delete works.

@nicowilliams
Copy link
Contributor Author

I'll abandon this tack, though I might keep .[-] -- seems potentially useful. I'll take up the collecting-paths-to-delete-at-the-end approach.

@emanuele6 emanuele6 deleted the nico/fix_array_deletion_bug branch July 6, 2023 04:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants