Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot update against null #19

Closed
itchyny opened this issue Jun 25, 2022 · 16 comments
Closed

Cannot update against null #19

itchyny opened this issue Jun 25, 2022 · 16 comments

Comments

@itchyny
Copy link
Contributor

itchyny commented Jun 25, 2022

While jq allows to construct object or array from null by updating, but jaq throws errors.

 $ jq -n '.x = 0'
{
  "x": 0
}

 $ jq -n '.[0] = 0'
[
  0
]

 $ jaq -n '.x = 0'
Error: cannot index null

 $ jaq -n '.[0] = 0'
Error: cannot index null
@itchyny
Copy link
Contributor Author

itchyny commented Jun 25, 2022

This is not just an issue of updating operator, but indexing issue?

 $ jq -n '.x'
null

 $ jaq -n '.x'
Error: cannot index null

@01mf02
Copy link
Owner

01mf02 commented Jun 27, 2022

Ok, so this history is a bit complicated.
Let us consider a slightly different example: 0 | .x. What does jq say to that?

$ jq -c -n '0 | .x'
jq: error (at <unknown>): Cannot index number with string "x"

Ok, so we cannot index numbers with strings. That is, there is no single number for which that operation makes sense.
Now comes the question: What sense does it make to index a null? I believe, none. That's why I throw an error there, because I believe it is most likely that the user made this operation by accident. (And that's why I also throw an error when updating against null.)
On the other hand, why does jaq yield null for something like {a: 1} | .b, for example? That's because there is nothing principally wrong with indexing an object with a string, so even though the object does not contain the key "b", jaq returns null here instead of throwing an error.

When I'm already talking about update operators: jq has this history of inventing values "from the air". For example:

$ jq -c -n '[1] | .[2] = 0'
[1,null,0]

Here, we update a single value of the array and implicitly grow the array at the same time.
I'll be frank here: I do not like this. Implicit behaviour like that, in my experience, bites you at some point, or is bad for performance.
That's why jaq is more strict than jq here and disallows that.

Now comes the question: Do you think that I should document these differences (of which there might be quite some) somewhere, and if so, where and in what form? In the README? In the tests? In the source?

@itchyny
Copy link
Contributor Author

itchyny commented Jun 27, 2022

I respect your design because this is your product. But README.md states while preserving compatibility with jq in most cases, so users encountering the behavior difference of the most basic filter may get confused. I don't think the behavior of jq is surprising. Adding notes on the strictness to README.md will help people understand how the tool is different from jq.

@01mf02
Copy link
Owner

01mf02 commented Jun 27, 2022

Ok, so I am currently drafting a section in the README about smaller differences between jaq and jq.
While doing that, I thought more about the case null | .a = 0. I can see that if you interpret null as a neutral element, this makes sense. For me the question is: Does this pattern appear in idiomatic jq code? Did you ever use this pattern?

@01mf02
Copy link
Owner

01mf02 commented Jun 27, 2022

I updated the README with a few differences in 0d00650, but this is surely not yet exhaustive.

@itchyny
Copy link
Contributor Author

itchyny commented Jun 28, 2022

Thanks for listing jq difference in README.md. Another difference to be worth noticing is the behavior of updating with multiple paths; {} | (.x, .y) = 0. The application of this simple example makes jq much powerful JSON processing tool IMO; (.. | strings) |= gsub("pattern"; "replacer") is very useful in replacing JSON recursively. As for null indexing, I think there're lots of code depending on the behavior, especially with the alternative operator; [some complex indexing] // [default value] (note that the alternative operator does not catch errors so null | .x // 1 yields different result in jaq than in jq).

@01mf02
Copy link
Owner

01mf02 commented Jul 2, 2022

Another difference to be worth noticing is the behavior of updating with multiple paths; {} | (.x, .y) = 0. The application of this simple example makes jq much powerful JSON processing tool IMO; (.. | strings) |= gsub("pattern"; "replacer") is very useful in replacing JSON recursively.

I agree, this feature makes jq quite powerful. I have already documented the difference. In the long run, I think it would be nice for jaq to also support this, but in the past, I have not yet achieved a nice way to implement this; that is, a way that is performant and that does not duplicate lots of code.

As for null indexing, I think there're lots of code depending on the behavior, especially with the alternative operator; [some complex indexing] // [default value] (note that the alternative operator does not catch errors so null | .x // 1 yields different result in jaq than in jq).

Fair point. So I just drafted some code that makes null | .x yield null instead of an error.
My current rationale to allow this is that null can be interpreted as empty array or as empty object, depending on the circumstance. However, that interpretation of null is not satisfied in jq, because running null | .[] in jq yields an error --- instead of an empty sequence! (Both [] | .[] and {} | .[] yield an empty sequence, so I would find it logical if null | .[] would yield that as well.)
So how would you go about resolving this conflict? Would you find it acceptable if null | .[] yields the empty sequence, diverging from jq's behaviour? Or do you see another rationale why null | .x should not fail?

@01mf02
Copy link
Owner

01mf02 commented Jul 2, 2022

And on a related note: What should null | .[] = 1 yield? With my proposed interpretation of null, I believe that the result should be null. But jq yields an error ...

@01mf02
Copy link
Owner

01mf02 commented Jul 2, 2022

Also, null | keys yields an error in jq, while I think that [] would be a perfectly fine result.

@01mf02
Copy link
Owner

01mf02 commented Jul 2, 2022

By the way, my rationale is also consistent with the behaviour that [] + null yields [] and {} + null yields {}.

@01mf02
Copy link
Owner

01mf02 commented Jul 2, 2022

I have a bit more formal specification of what I believe to need to hold for null values:

I call the neutral values: false, 0, "", [], or {}.
I say that a filter f yields a value if it does not yield an error.

Proposal for operations on null: An operation f(null) should yield a value v if and only if:

  1. for at least one neutral value x, f(x) does yield a value, and
  2. if there are two neutral values x1 and x2 such that f(x1) yields a value y1 and f(x2) yields a value y2, then y1 must be equal to y2.

For example, what about null | .x yielding null? In this case, we can consider def f(n): n | .x. Property 1 is satisfied because f({}) (expanding to {} | .x) yields null. Property 2 is satisfied because there is no other neutral value x than {} that makes f(x) yield a value.

What about null | .[] yielding the empty sequence? In this case, def f(n): n | .[]. Property 1 is satisfied because f([]) yields the empty sequence. Furthermore, property 2 is satisfied because also f({}) (expanding to {} | .[]) yields the empty sequence. No other neutral value yields a value; for example, false | .[] yields an error.

@itchyny
Copy link
Contributor Author

itchyny commented Jul 4, 2022

Making null to be iterable as well as indexable feels very consistent semantics, but I'm afraid it changes various filters;

  • def keys: [path(.[])[]]; emits [] on null (as you noticed, looks reasonable),
  • def to_entries: [keys[] as $k | {key: $k, value: .[$k]}]; emits [] on null (looks ok)
  • def map(f): [.[] | f] emits [] on null (not sure it's ok or not)
  • def join($x): reduce .[] ...; emits "" on null (maybe ok?)
  • def recurse: recurse(.[]?); (does not change the behavior on null?).

Your formal specification looks a sophisticated way of reasoning the changes, but I can give a counter example with no explicit branching; def f(n): n * n > n;; both f(0) and f({}) emit false but f(null) should throw an error (there's no multiplication on null). Another example is def f(n): 0 * n > n; for f(0) == f("").

@01mf02
Copy link
Owner

01mf02 commented Jul 5, 2022

Your formal specification looks a sophisticated way of reasoning the changes, but I can give a counter example with no explicit branching; def f(n): n * n > n;; both f(0) and f({}) emit false but f(null) should throw an error (there's no multiplication on null). Another example is def f(n): 0 * n > n; for f(0) == f("").

Thank you for your counterexample! I will have to think about this a bit more ...

On a different note, I have implemented in ab5c764 some initial support for updating with non-path expressions! This enables support for {} | (.x, .y) = 0, for example. Currently supported on the left-hand side are paths, |, , and if ... then ... else ... end. I am optimistic that recurse / .. should be possible to implement too, but I did not get around to it yet.

The implementation of the update operation should be significantly faster than in jq, because jaq does not explicitly construct paths. Furthermore, jaq aims to give more "intuitive" answers; for example, [1, 2, 3, 4, 5] | .[] |= empty yields [] in jaq, whereas it yields [2, 4] in jq. The latter is because jq first builds the paths for .[], which are 0, 1, 2, 3, 4 (the indices of the list). Then, it deletes the 0th element, yielding [2, 3, 4, 5]. Then it deletes the 1st element of the new list --- which is not 2, but 3!! This yields then [3, 4, 5]. Then, jq deletes the 2nd element, which is 4, yielding [3, 5]. Finally, it deletes the 4th and 5th elements, yielding the same list as before, because this list does not contain any 4th and 5th elements. I consider this behaviour to be deeply flawed.

Still, the current implementation of updating in jaq is quite new, so if you find some bugs there, I'll be happy to fix them. :)

@itchyny
Copy link
Contributor Author

itchyny commented Jul 5, 2022

That deletion issue of jq is well known (jqlang/jq#2051) and I suggested a fix two years ago (jqlang/jq#2133).

@01mf02
Copy link
Owner

01mf02 commented Jul 15, 2022

Oh, that is interesting to read! I think it is a pity that your pull request was never integrated ...

@itchyny
Copy link
Contributor Author

itchyny commented Nov 4, 2022

I'm closing because this is a design decision of jaq rather than a compatibility issue.

@itchyny itchyny closed this as completed Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants