Fold over queues #2791

Maria-12648430 · 2020-10-09T12:23:38Z

The first commit in this PR introduces a fold/3 function for the queue module. Up to now, this had to be done by converting a queue to a list and folding over that, which is both cumbersome and more performance-intensive (O(length(Rear)) for the conversion, and O(length(List)) for the fold over that list) than a fold over the internal lists making up the Rear and Front, inside the queue module (O(len(Queue)), roughly equivalent to only O(length(List))). The queue is traversed in queue order (front to rear), which is what would be expected.

The second commit in this PR addresses a "glitch" in the filter/2 function, namely that while returning true or false from the filter function is semantically equivalent to returning [Item] or [] respectively, the latter generates more garbage. By handling the two special cases "single-element list" and "empty list" the same as true and false respectively, this is no longer the case, and the paragraph in the function documentation has been removed accordingly.

Also, to align queue:filter/2 more closely with lists:filtermap/2 (queue:filter/2 is really a filtermap with insert capability), the filter function may return {true, NewItem} to signal that the current item should be replaced with NewItem in the result queue.

CLAassistant · 2020-10-09T12:23:43Z

All committers have signed the CLA.

jhogberg · 2020-10-12T11:06:36Z

Thanks for the PR!

We'll need to run it past the technical board (next week) as it changes an interface, but I think it looks great.

Maria-12648430 · 2020-10-12T11:21:34Z

We'll need to run it past the technical board (next week) as it changes an interface, but I think it looks great.

You mean because of the {true, NewItem} as a new possible return for the filter function?

jhogberg · 2020-10-12T11:41:30Z

You mean because of the {true, NewItem} as a new possible return for the filter function?

Yes, though we'll discuss fold/3 as well. We want as many eyeballs as possible on the documentation.

josevalim · 2020-10-13T07:58:53Z

Nice, I would definitely have used queue:fold/3 in the past!

I am not convinced about the return type for queue:filter though. I understand the rationale in aligning it with filtermap but given they are already different (filter allows a list to be returned), a new return type complicates the API and doesn't add anything over returning [NewItem] directly.

RaimoNiskanen · 2020-10-13T12:57:09Z

In that case, @josevalim & @Maria-12648430, maybe write a new queue:filtermap that aligns with lists:filtermap and silently let queue:filter keep its pecularity of being able to also do filtermap. Oddly enough queue:filter is actually more general than lists:filtermap since the former allows replacing an element with any number of elements.

@Maria-12648430 - Note that your change to optimize the code handling of [Item] does not change the amount of created garbage. The garbage is created when creating the return value [Item] which is one cons cell referring to Item, and that cons cell (2 VM words) becomes garbage when building the queue internal result list. Both [Item | filter_f(Fun, F)], and L ++ filter_f(Fun, F) does exactly the same: every cons cell in L (1 pcs) becomes garbage and a new one is created to hold the list that is appended to. Returning true does not produce garbage because an atom is an immediate value.

Your change optimizes away external calls to lists:append and lists:reverse for the important case of replacing an item (filtermap), which I think is a good optimization - keep it. But you can actually not remove the documentation about more created garbage.

Side note: returning {true, NewItem} produces 3 VM words of garbage since a tuple has a header word and then one per item, which is 50% more than when returning [NewItem].

A question for decision: would we need both queue:foldl and queue:foldr instead of just queue:fold? I think maybe not, since a queue:reverse is O(1).

josevalim · 2020-10-13T14:04:59Z

A question for decision: would we need both queue:foldl and queue:foldr instead of just queue:fold? I think maybe not, since a queue:reverse is O(1).

I pondered the same and my conclusion was that if there is ever a need for a “reverse fold”, it could be called “fold_r”, which aligns well with the other functions in the module. WDYT?

okeuday · 2020-10-13T17:47:38Z

You may want to consider the need for a separate function (possibly in a separate pull request) that would avoid processing the whole queue. I have found it necessary, due to testing of the time taken in queue:filter/2, to create a separate function to remove a single unique value from a queue (https://github.com/okeuday/pqueue/blob/ebfe8e76e87f05caae84827334662726629f5f54/src/pqueue4.erl#L11602-L11619). With a situation like that, queue:fold is unable to help. It is definitely good to have a queue:fold function. When removing a unique value from a queue, the order doesn't matter, so the queue lists have minimal impact (i.e., no reverses or concatenations).

RaimoNiskanen · 2020-10-14T13:51:39Z

@josevalim: fold and fold_r (the latter if needed). I buy that!

RaimoNiskanen · 2020-10-14T14:04:00Z

@okeuday: A napkin calculation of a possible queue:remove/2 function: It would, on average, only have to traverse half of the queue, and only have to rebuild on quarter of the cons cells (the ones preceeding the found element in either the front or the rear list), so it would be between 2..4 times faster than queue:filter/2, but still O(N). Sounds like a nice feature anyway.

Would one want both a queue:remove(PredicateFun, Queue) and a queue:delete(Item, Queue) for a predicate fun and a =:= match respectively? Matching equality is significantly faster than calling a fun().

okeuday · 2020-10-14T15:42:40Z

@RaimoNiskanen I needed the predicate fun to access a tuple element but the =:= equality match function should be helpful for other data.

Maria-12648430 · 2020-10-15T12:03:25Z

Oh, quite a response :)

@RaimoNiskanen & @josevalim:

In that case, @josevalim & @Maria-12648430, maybe write a new queue:filtermap that aligns with lists:filtermap and silently let queue:filter keep its pecularity of being able to also do filtermap. Oddly enough queue:filter is actually more general than lists:filtermap since the former allows replacing an element with any number of elements.

Ok, I second that. So, the filtermap function may return true to keep, false to drop, and {true, NewItem} to replace. No insert capability, right?

Maria-12648430 · 2020-10-15T12:04:22Z

@RaimoNiskanen

@Maria-12648430 - Note that your change to optimize the code handling of [Item] does not change the amount of created garbage. The garbage is created when creating the return value [Item] which is one cons cell referring to Item, and that cons cell (2 VM words) becomes garbage when building the queue internal result list. Both [Item | filter_f(Fun, F)], and L ++ filter_f(Fun, F) does exactly the same: every cons cell in L (1 pcs) becomes garbage and a new one is created to hold the list that is appended to. Returning true does not produce garbage because an atom is an immediate value.

Ah, got that wrong then.

Your change optimizes away external calls to lists:append and lists:reverse for the important case of replacing an item (filtermap), which I think is a good optimization - keep it. But you can actually not remove the documentation about more created garbage.

Ok, I'll add it back.

Side note: returning {true, NewItem} produces 3 VM words of garbage since a tuple has a header word and then one per item, which is 50% more than when returning [NewItem].

That would be obsoleted with a separate filtermap function.

Maria-12648430 · 2020-10-15T12:05:48Z

@RaimoNiskanen & @josevalim

A question for decision: would we need both queue:foldl and queue:foldr instead of just queue:fold? I think maybe not, since a queue:reverse is O(1).

TBH, I had both foldl and foldr in the beginning, but then removed them for fold because of the reason you described.

@josevalim: fold and fold_r (the latter if needed). I buy that!

Decided then? :)

Maria-12648430 · 2020-10-15T12:18:19Z

@okeuday

You may want to consider the need for a separate function (possibly in a separate pull request) that would avoid processing the whole queue. I have found it necessary, due to testing of the time taken in queue:filter/2, to create a separate function to remove a single unique value from a queue

I need something like that every now and then, too. Maybe later, in a seperate PR :)

With a situation like that, queue:fold is unable to help.

Yes

When removing a unique value from a queue, the order doesn't matter, so the queue lists have minimal impact (i.e., no reverses or concatenations).

Hm, I would argue that it is not guaranteed that an item is unique, at least not from the perspective of the queue.

I would suggest a function (or two) that removes the first element matching the condition (either predicate function or exact match), then stops traversing further.

The front list can be traversed without reversal, so best case would be if the element is in that list (ie, near the front). The rear lists has to be reversed to find the first element (instead of the last), so that would be a worse case. Worst case is if no element is found, which means full traversal plus a reversal in the middle.

A more general version of this would be something like remove(N, PredicateFun, Queue) or delete(N, Item, Queue), meaning "remove (at most) N matching elements".

Maria-12648430 · 2020-10-15T13:38:49Z

@okeuday & @RaimoNiskanen

Another function that could be handy would be contains(PredicateFun, Queue), as the existing member/ function only finds exact matches. Slower than member/2 which employs lists:member/2, yes, but removes the need for converting to list and such (and member/2 would remain of course, so...). More general forms that allow more flexibility even would be any and all, like their counterparts in lists. What do you think?

(Probably material for another PR, anyway ;))

RaimoNiskanen · 2020-10-15T14:39:49Z

@Maria-12648430

Side note: returning {true, NewItem} produces 3 VM words of garbage since a tuple has a header word and then one per item, which is 50% more than when returning [NewItem].

That would be obsoleted with a separate filtermap function.

Obsolete or not, it would still be so that queue:filter/2 with a Fun returning [Item] would produce slightly less garbage thanqueue:filtermap/2 with a Fun returning {true, NewItem}. Certainly irrelevant.
Nevertheless, queue:filtermap/2 should be compatible with lists:filtermap/2, just as you suggest.

RaimoNiskanen · 2020-10-15T14:40:28Z

@Maria-12648430

@josevalim: fold and fold_r (the latter if needed). I buy that!
Decided then? :)

I agree. And I think fold_r can wait for its demand, since queue:reverse(queue:fold(Fun, queue:reverse(Q))) is not that inefficient. Unless you just can't wait to implement it...

Maria-12648430 · 2020-10-15T14:49:44Z

Obsolete or not, it would still be so that queue:filter/2 with a Fun returning [Item] would produce slightly less garbage thanqueue:filtermap/2 with a Fun returning {true, NewItem}. Certainly irrelevant.

I meant that my addition of {true, NewItem} as a possible return for the filter function would be obsoleted by filtermap, ie should be removed from filter again ;)

Maria-12648430 · 2020-10-15T14:51:08Z

Unless you just can't wait to implement it...

I'll see ;)

essen · 2020-10-15T14:58:01Z

More functions in queue is more than welcome but fold_r sounds unnecessary at this point. Perhaps wait to get feedback on the normal fold first.

RaimoNiskanen · 2020-10-15T15:59:16Z

@Maria-12648430

The front list can be traversed without reversal, so best case would be if the element is in that list (ie, near the front). The rear lists has to be reversed to find the first element (instead of the last), so that would be a worse case. Worst case is if no element is found, which means full traversal plus a reversal in the middle.

Ouch! Of course, I did not think of that!

I did some coding in my head so these functions maybe cannot even be compiled, and may be incorrect, but anyway...

This function should traverse the whole list, but only rebuild the cons cells of the start of the list up to the last matching Item, the list after is untouched. If no matching Item is found the it returns false, so that must be handled by the caller. It minimizes rebuild but still has to traverse the whole reverse list once:

delete_last(Item, [Item | Rest]) ->
   case delete_last(Item, Rest) of
    false -> Rest;
    R -> [Item | R]
  end;
delete_last(Item, [X | Rest]) ->
  case delete_last(Item, Rest) of
    false -> false;
    R -> [X | R]
  end;
delete_last(_, []) -> false.

A similar trick can be used to minimize rebuild of the front list. It returns false if no matching Item is found, so the caller should then keep the original F list and try the R list instead. It only has to traverse and rebuild the start of the F list up to the first matching Item:

delete_first(Item, [Item | Rest]) -> Rest;
delete_first(Item, [X | Rest]) ->
  case delete_first(Item, Rest) of
    false -> false;
    F -> [X | F]
end;
delete_first(_, []) -> false.

Regarding "remove at most N of these items" I think is an odd bird, and there is also nothing like that in lists.

RaimoNiskanen · 2020-10-15T16:00:17Z

@Maria-12648430

I meant that my addition of {true, NewItem} as a possible return for the filter function would be obsoleted by filtermap, ie should be removed from filter again ;)

Ah, yes.

RaimoNiskanen · 2020-10-15T16:04:58Z

@Maria-12648430: contains/1, and any/2 and all/2 would be welcome additions. Then lists:contains/2 would be missing.

That would be a future PR...

I think there is a greater need for high level functions in queue because unlike lists there is no nice way in the language to traverse the elements with own code.

RaimoNiskanen · 2020-10-15T16:23:05Z

Here are two variants more shaped for using a PredicateFun:

delete_last(Item, [X | Rest]) ->
    case delete_last(Item, Rest) of
        false ->
            case Item of
                X -> Rest;
                _ -> false
            end;
        R -> [X | R]
    end;
delete_last(_, []) -> false.

delete_first(Item, [X | Rest]) ->
    case Item of
        X -> Rest;
        _ ->
            case delete_first(Item, Rest) of
                false -> false;
                F -> [X | F]
            end
    end;
delete_first(_, []) -> false.

Maria-12648430 · 2020-10-16T08:25:57Z

Ok, I'll recap:

this PR:
- remove {true, NewItem} from the return values of the filter function (and accordingly, from the documentation)
- add back the paragraph regarding garbage in the documentation
- add a new filtermap/2 function to mirror lists:filtermap
remove, delete, (remove_first, delete_first?), any and all are material for a future PR or PRs; maybe contains, also
fold_r in a future PR if needed

Agree so far?

Maria-12648430 · 2020-10-16T13:44:47Z

remove {true, NewItem} from the return values of the filter function (and accordingly, from the documentation)

add back the paragraph regarding garbage in the documentation

add a new filtermap/2 function to mirror lists:filtermap

Ok, all done :)

A note on queue:filtermap/2: I'm running lists:filtermap/2 on the front list, and iterate a custom function mimicking queue:filter_r/2 over the rear list, in order to traverse in queue order. The latter could also be done via lists:foldr/3, but it would require another function enveloping the filtermap function. Also I wasn't sure about "garbage" etc (my knowledge doesn't run that deep =^^=).

RaimoNiskanen · 2020-10-21T06:38:47Z

lib/stdlib/doc/src/queue.xml

+        <p>Calls <c><anno>Fun</anno>(<anno>Item</anno>, <anno>AccIn</anno>)</c>
+          on successive items <c>Item</c> of <c>Queue</c>, starting
+          with <c><anno>AccIn</anno> == <anno>Acc0</anno></c>. The queue is
+          traversed in queue order, i.e. from front to rear.


We do not use such abbreviations in documentation. Write "i.e." as "that is".

RaimoNiskanen · 2020-10-21T06:50:50Z

@Maria-12648430: We brought this up on our Technical Board, and have a few style comments.

The commit message on the third commit is too long so it causes a line break in GitHub's overview.
Our guidelines says max 50 characters, then a blank line and after that more explanation if needed.
I have commented inline about not using abbreviations like e.g. i.e. in our documentation.
A personal opinion; we had some for, and some against, so do as you like: I would prefer that since you had to write an own function for the reverse list for filtermap/2, write one for the front list too, so it is obvious from the code that both lists are handled the same way and to avoid an external call.
Those against say that since the code exists in lists:filtermap/2, it is better to use it.

Maria-12648430 · 2020-10-24T11:52:34Z

I'll do all that on monday, afk atm :)

If the filter function returns a single-item list or an empty list, it is treated the same way as the respective semantically equivalents of true and false.

A fold function in the queue module removes the need for a conversion to a list in order to achieve the same.

Maria-12648430 · 2020-10-26T09:47:03Z

@RaimoNiskanen everything done :) Please review again.

A personal opinion; we had some for, and some against, so do as you like: I would prefer that since you had to write an own function for the reverse list for filtermap/2, write one for the front list too, so it is obvious from the code that both lists are handled the same way and to avoid an external call.
Those against say that since the code exists in lists:filtermap/2, it is better to use it.

I decided to leave it as it is (front list via lists:filtermap/2), for the same reason as "those against" bring up ;)

Maria-12648430 · 2020-10-26T11:37:01Z

A more general question/request for opinions (especially in view of future additions to the queue module):

Should it be explicitly mandatory for the queue module to perform any traversals in queue order, at least when predicate functions are involved? It may (fold/3) or may not (filter/2, filtermap/2, or the future any/2 and all/2) matter for the return value of those functions, but it does always matter when it comes to side effects of the predicate function.

juhlig · 2020-10-26T16:29:08Z

Should it be explicitly mandatory for the queue module to perform any traversals in queue order, at least when predicate functions are involved? It may (fold/3) or may not (filter/2, filtermap/2, or the future any/2 and all/2) matter for the return value of those functions, but it does always matter when it comes to side effects of the predicate function.

My two cents: it should always be done in queue order when user functions are involved, for the possible side effects. filter/2 does it (so does your implementation of filtermap/2), and I think it is just what one would expect.

RaimoNiskanen · 2020-11-02T09:26:00Z

Merged. Thank you for your pull request!

Maria-12648430 · 2020-11-02T15:33:41Z

Merged. Thank you for your pull request!

You're welcome :)
I'm preparing another PR for any/2, all/2, delete/2, delete_r/2 right now.

queue:fold/3 erlang/otp#2791 queue:all/2, queue:any/2 erlang/otp#2850

Maria-12648430 force-pushed the queue_fold branch from 62d4bf0 to af098c5 Compare October 9, 2020 13:43

rickard-green added the team:VM Assigned to OTP team VM label Oct 12, 2020

rickard-green assigned jhogberg Oct 12, 2020

jhogberg added the testing currently being tested, tag is used by OTP internal CI label Oct 12, 2020

Maria-12648430 force-pushed the queue_fold branch from af098c5 to cf76f3e Compare October 16, 2020 13:14

RaimoNiskanen reviewed Oct 21, 2020

View reviewed changes

Maria-12648430 added 3 commits October 26, 2020 09:13

Handle special cases of filter fun returns

2e16609

If the filter function returns a single-item list or an empty list, it is treated the same way as the respective semantically equivalents of true and false.

Add a fold function to the queue module

025789f

A fold function in the queue module removes the need for a conversion to a list in order to achieve the same.

Add a filtermap function to the queue module

496e490

Maria-12648430 force-pushed the queue_fold branch from cf76f3e to 496e490 Compare October 26, 2020 09:09

RaimoNiskanen merged commit d07b4d7 into erlang:master Nov 2, 2020

RaimoNiskanen removed the testing currently being tested, tag is used by OTP internal CI label Nov 2, 2020

Maria-12648430 mentioned this pull request Nov 3, 2020

Add queue:all/2, queue:any/2, queue:delete/2, and queue:delete_r/2 #2850

Merged

Maria-12648430 added a commit to hnc-agency/hnc-pq that referenced this pull request Mar 5, 2021

Use new queue functions in OTP/24

3b63a00

queue:fold/3 erlang/otp#2791 queue:all/2, queue:any/2 erlang/otp#2850

Maria-12648430 mentioned this pull request Apr 6, 2021

queue, maps, proplists: add since information to docs for new functions #4715

Merged

Fold over queues #2791

Fold over queues #2791

Conversation

Maria-12648430 commented Oct 9, 2020

CLAassistant commented Oct 9, 2020 • edited Loading

jhogberg commented Oct 12, 2020

Maria-12648430 commented Oct 12, 2020

jhogberg commented Oct 12, 2020

josevalim commented Oct 13, 2020

RaimoNiskanen commented Oct 13, 2020

josevalim commented Oct 13, 2020

okeuday commented Oct 13, 2020

RaimoNiskanen commented Oct 14, 2020

RaimoNiskanen commented Oct 14, 2020

okeuday commented Oct 14, 2020

Maria-12648430 commented Oct 15, 2020

Maria-12648430 commented Oct 15, 2020

Maria-12648430 commented Oct 15, 2020

Maria-12648430 commented Oct 15, 2020

Maria-12648430 commented Oct 15, 2020 • edited Loading

RaimoNiskanen commented Oct 15, 2020

RaimoNiskanen commented Oct 15, 2020

Maria-12648430 commented Oct 15, 2020

Maria-12648430 commented Oct 15, 2020

essen commented Oct 15, 2020

RaimoNiskanen commented Oct 15, 2020

RaimoNiskanen commented Oct 15, 2020

RaimoNiskanen commented Oct 15, 2020

RaimoNiskanen commented Oct 15, 2020

Maria-12648430 commented Oct 16, 2020 • edited Loading

Maria-12648430 commented Oct 16, 2020

RaimoNiskanen Oct 21, 2020

Choose a reason for hiding this comment

RaimoNiskanen commented Oct 21, 2020 • edited Loading

Maria-12648430 commented Oct 24, 2020

Maria-12648430 commented Oct 26, 2020

Maria-12648430 commented Oct 26, 2020

juhlig commented Oct 26, 2020

RaimoNiskanen commented Nov 2, 2020

Maria-12648430 commented Nov 2, 2020

CLAassistant commented Oct 9, 2020 •

edited

Loading

Maria-12648430 commented Oct 15, 2020 •

edited

Loading

Maria-12648430 commented Oct 16, 2020 •

edited

Loading

RaimoNiskanen commented Oct 21, 2020 •

edited

Loading