Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fold over queues #2791

Merged
merged 3 commits into from
Nov 2, 2020
Merged

Fold over queues #2791

merged 3 commits into from
Nov 2, 2020

Conversation

Maria-12648430
Copy link
Contributor

The first commit in this PR introduces a fold/3 function for the queue module. Up to now, this had to be done by converting a queue to a list and folding over that, which is both cumbersome and more performance-intensive (O(length(Rear)) for the conversion, and O(length(List)) for the fold over that list) than a fold over the internal lists making up the Rear and Front, inside the queue module (O(len(Queue)), roughly equivalent to only O(length(List))). The queue is traversed in queue order (front to rear), which is what would be expected.


The second commit in this PR addresses a "glitch" in the filter/2 function, namely that while returning true or false from the filter function is semantically equivalent to returning [Item] or [] respectively, the latter generates more garbage. By handling the two special cases "single-element list" and "empty list" the same as true and false respectively, this is no longer the case, and the paragraph in the function documentation has been removed accordingly.

Also, to align queue:filter/2 more closely with lists:filtermap/2 (queue:filter/2 is really a filtermap with insert capability), the filter function may return {true, NewItem} to signal that the current item should be replaced with NewItem in the result queue.

@CLAassistant
Copy link

CLAassistant commented Oct 9, 2020

CLA assistant check
All committers have signed the CLA.

@rickard-green rickard-green added the team:VM Assigned to OTP team VM label Oct 12, 2020
@jhogberg jhogberg added the testing currently being tested, tag is used by OTP internal CI label Oct 12, 2020
@jhogberg
Copy link
Contributor

Thanks for the PR!

We'll need to run it past the technical board (next week) as it changes an interface, but I think it looks great.

@Maria-12648430
Copy link
Contributor Author

We'll need to run it past the technical board (next week) as it changes an interface, but I think it looks great.

You mean because of the {true, NewItem} as a new possible return for the filter function?

@jhogberg
Copy link
Contributor

You mean because of the {true, NewItem} as a new possible return for the filter function?

Yes, though we'll discuss fold/3 as well. We want as many eyeballs as possible on the documentation.

@josevalim
Copy link
Contributor

Nice, I would definitely have used queue:fold/3 in the past!

I am not convinced about the return type for queue:filter though. I understand the rationale in aligning it with filtermap but given they are already different (filter allows a list to be returned), a new return type complicates the API and doesn't add anything over returning [NewItem] directly.

@RaimoNiskanen
Copy link
Contributor

In that case, @josevalim & @Maria-12648430, maybe write a new queue:filtermap that aligns with lists:filtermap and silently let queue:filter keep its pecularity of being able to also do filtermap. Oddly enough queue:filter is actually more general than lists:filtermap since the former allows replacing an element with any number of elements.

@Maria-12648430 - Note that your change to optimize the code handling of [Item] does not change the amount of created garbage. The garbage is created when creating the return value [Item] which is one cons cell referring to Item, and that cons cell (2 VM words) becomes garbage when building the queue internal result list. Both [Item | filter_f(Fun, F)], and L ++ filter_f(Fun, F) does exactly the same: every cons cell in L (1 pcs) becomes garbage and a new one is created to hold the list that is appended to. Returning true does not produce garbage because an atom is an immediate value.

Your change optimizes away external calls to lists:append and lists:reverse for the important case of replacing an item (filtermap), which I think is a good optimization - keep it. But you can actually not remove the documentation about more created garbage.

Side note: returning {true, NewItem} produces 3 VM words of garbage since a tuple has a header word and then one per item, which is 50% more than when returning [NewItem].

A question for decision: would we need both queue:foldl and queue:foldr instead of just queue:fold? I think maybe not, since a queue:reverse is O(1).

@josevalim
Copy link
Contributor

A question for decision: would we need both queue:foldl and queue:foldr instead of just queue:fold? I think maybe not, since a queue:reverse is O(1).

I pondered the same and my conclusion was that if there is ever a need for a “reverse fold”, it could be called “fold_r”, which aligns well with the other functions in the module. WDYT?

@okeuday
Copy link
Contributor

okeuday commented Oct 13, 2020

You may want to consider the need for a separate function (possibly in a separate pull request) that would avoid processing the whole queue. I have found it necessary, due to testing of the time taken in queue:filter/2, to create a separate function to remove a single unique value from a queue (https://github.com/okeuday/pqueue/blob/ebfe8e76e87f05caae84827334662726629f5f54/src/pqueue4.erl#L11602-L11619). With a situation like that, queue:fold is unable to help. It is definitely good to have a queue:fold function. When removing a unique value from a queue, the order doesn't matter, so the queue lists have minimal impact (i.e., no reverses or concatenations).

@RaimoNiskanen
Copy link
Contributor

@josevalim: fold and fold_r (the latter if needed). I buy that!

@RaimoNiskanen
Copy link
Contributor

@okeuday: A napkin calculation of a possible queue:remove/2 function: It would, on average, only have to traverse half of the queue, and only have to rebuild on quarter of the cons cells (the ones preceeding the found element in either the front or the rear list), so it would be between 2..4 times faster than queue:filter/2, but still O(N). Sounds like a nice feature anyway.

Would one want both a queue:remove(PredicateFun, Queue) and a queue:delete(Item, Queue) for a predicate fun and a =:= match respectively? Matching equality is significantly faster than calling a fun().

@okeuday
Copy link
Contributor

okeuday commented Oct 14, 2020

@RaimoNiskanen I needed the predicate fun to access a tuple element but the =:= equality match function should be helpful for other data.

@Maria-12648430
Copy link
Contributor Author

Oh, quite a response :)


@RaimoNiskanen & @josevalim:

In that case, @josevalim & @Maria-12648430, maybe write a new queue:filtermap that aligns with lists:filtermap and silently let queue:filter keep its pecularity of being able to also do filtermap. Oddly enough queue:filter is actually more general than lists:filtermap since the former allows replacing an element with any number of elements.

Ok, I second that. So, the filtermap function may return true to keep, false to drop, and {true, NewItem} to replace. No insert capability, right?

@Maria-12648430
Copy link
Contributor Author

@RaimoNiskanen

@Maria-12648430 - Note that your change to optimize the code handling of [Item] does not change the amount of created garbage. The garbage is created when creating the return value [Item] which is one cons cell referring to Item, and that cons cell (2 VM words) becomes garbage when building the queue internal result list. Both [Item | filter_f(Fun, F)], and L ++ filter_f(Fun, F) does exactly the same: every cons cell in L (1 pcs) becomes garbage and a new one is created to hold the list that is appended to. Returning true does not produce garbage because an atom is an immediate value.

Ah, got that wrong then.

Your change optimizes away external calls to lists:append and lists:reverse for the important case of replacing an item (filtermap), which I think is a good optimization - keep it. But you can actually not remove the documentation about more created garbage.

Ok, I'll add it back.

Side note: returning {true, NewItem} produces 3 VM words of garbage since a tuple has a header word and then one per item, which is 50% more than when returning [NewItem].

That would be obsoleted with a separate filtermap function.

@Maria-12648430
Copy link
Contributor Author

@RaimoNiskanen & @josevalim

A question for decision: would we need both queue:foldl and queue:foldr instead of just queue:fold? I think maybe not, since a queue:reverse is O(1).

TBH, I had both foldl and foldr in the beginning, but then removed them for fold because of the reason you described.

@josevalim: fold and fold_r (the latter if needed). I buy that!

Decided then? :)

@Maria-12648430
Copy link
Contributor Author

@okeuday

You may want to consider the need for a separate function (possibly in a separate pull request) that would avoid processing the whole queue. I have found it necessary, due to testing of the time taken in queue:filter/2, to create a separate function to remove a single unique value from a queue

I need something like that every now and then, too. Maybe later, in a seperate PR :)

With a situation like that, queue:fold is unable to help.

Yes

When removing a unique value from a queue, the order doesn't matter, so the queue lists have minimal impact (i.e., no reverses or concatenations).

Hm, I would argue that it is not guaranteed that an item is unique, at least not from the perspective of the queue.

I would suggest a function (or two) that removes the first element matching the condition (either predicate function or exact match), then stops traversing further.

The front list can be traversed without reversal, so best case would be if the element is in that list (ie, near the front). The rear lists has to be reversed to find the first element (instead of the last), so that would be a worse case. Worst case is if no element is found, which means full traversal plus a reversal in the middle.

A more general version of this would be something like remove(N, PredicateFun, Queue) or delete(N, Item, Queue), meaning "remove (at most) N matching elements".

@Maria-12648430
Copy link
Contributor Author

Maria-12648430 commented Oct 15, 2020

@okeuday & @RaimoNiskanen

Another function that could be handy would be contains(PredicateFun, Queue), as the existing member/ function only finds exact matches. Slower than member/2 which employs lists:member/2, yes, but removes the need for converting to list and such (and member/2 would remain of course, so...). More general forms that allow more flexibility even would be any and all, like their counterparts in lists. What do you think?

(Probably material for another PR, anyway ;))

@RaimoNiskanen
Copy link
Contributor

@Maria-12648430

Side note: returning {true, NewItem} produces 3 VM words of garbage since a tuple has a header word and then one per item, which is 50% more than when returning [NewItem].

That would be obsoleted with a separate filtermap function.

Obsolete or not, it would still be so that queue:filter/2 with a Fun returning [Item] would produce slightly less garbage thanqueue:filtermap/2 with a Fun returning {true, NewItem}. Certainly irrelevant.
Nevertheless, queue:filtermap/2 should be compatible with lists:filtermap/2, just as you suggest.

@RaimoNiskanen
Copy link
Contributor

@Maria-12648430

@josevalim: fold and fold_r (the latter if needed). I buy that!
Decided then? :)

I agree. And I think fold_r can wait for its demand, since queue:reverse(queue:fold(Fun, queue:reverse(Q))) is not that inefficient. Unless you just can't wait to implement it...

@Maria-12648430
Copy link
Contributor Author

Obsolete or not, it would still be so that queue:filter/2 with a Fun returning [Item] would produce slightly less garbage thanqueue:filtermap/2 with a Fun returning {true, NewItem}. Certainly irrelevant.

I meant that my addition of {true, NewItem} as a possible return for the filter function would be obsoleted by filtermap, ie should be removed from filter again ;)

@Maria-12648430
Copy link
Contributor Author

Unless you just can't wait to implement it...

I'll see ;)

@essen
Copy link
Contributor

essen commented Oct 15, 2020

More functions in queue is more than welcome but fold_r sounds unnecessary at this point. Perhaps wait to get feedback on the normal fold first.

@RaimoNiskanen
Copy link
Contributor

@Maria-12648430

The front list can be traversed without reversal, so best case would be if the element is in that list (ie, near the front). The rear lists has to be reversed to find the first element (instead of the last), so that would be a worse case. Worst case is if no element is found, which means full traversal plus a reversal in the middle.

Ouch! Of course, I did not think of that!

I did some coding in my head so these functions maybe cannot even be compiled, and may be incorrect, but anyway...

This function should traverse the whole list, but only rebuild the cons cells of the start of the list up to the last matching Item, the list after is untouched. If no matching Item is found the it returns false, so that must be handled by the caller. It minimizes rebuild but still has to traverse the whole reverse list once:

delete_last(Item, [Item | Rest]) ->
   case delete_last(Item, Rest) of
    false -> Rest;
    R -> [Item | R]
  end;
delete_last(Item, [X | Rest]) ->
  case delete_last(Item, Rest) of
    false -> false;
    R -> [X | R]
  end;
delete_last(_, []) -> false.

A similar trick can be used to minimize rebuild of the front list. It returns false if no matching Item is found, so the caller should then keep the original F list and try the R list instead. It only has to traverse and rebuild the start of the F list up to the first matching Item:

delete_first(Item, [Item | Rest]) -> Rest;
delete_first(Item, [X | Rest]) ->
  case delete_first(Item, Rest) of
    false -> false;
    F -> [X | F]
end;
delete_first(_, []) -> false.

Regarding "remove at most N of these items" I think is an odd bird, and there is also nothing like that in lists.

@RaimoNiskanen
Copy link
Contributor

@Maria-12648430

I meant that my addition of {true, NewItem} as a possible return for the filter function would be obsoleted by filtermap, ie should be removed from filter again ;)

Ah, yes.

@RaimoNiskanen
Copy link
Contributor

@Maria-12648430: contains/1, and any/2 and all/2 would be welcome additions. Then lists:contains/2 would be missing.

That would be a future PR...

I think there is a greater need for high level functions in queue because unlike lists there is no nice way in the language to traverse the elements with own code.

@RaimoNiskanen
Copy link
Contributor

Here are two variants more shaped for using a PredicateFun:

delete_last(Item, [X | Rest]) ->
    case delete_last(Item, Rest) of
        false ->
            case Item of
                X -> Rest;
                _ -> false
            end;
        R -> [X | R]
    end;
delete_last(_, []) -> false.

delete_first(Item, [X | Rest]) ->
    case Item of
        X -> Rest;
        _ ->
            case delete_first(Item, Rest) of
                false -> false;
                F -> [X | F]
            end
    end;
delete_first(_, []) -> false.

@Maria-12648430
Copy link
Contributor Author

Maria-12648430 commented Oct 16, 2020

Ok, I'll recap:

  • this PR:
    • remove {true, NewItem} from the return values of the filter function (and accordingly, from the documentation)
    • add back the paragraph regarding garbage in the documentation
    • add a new filtermap/2 function to mirror lists:filtermap
  • remove, delete, (remove_first, delete_first?), any and all are material for a future PR or PRs; maybe contains, also
  • fold_r in a future PR if needed

Agree so far?

@Maria-12648430
Copy link
Contributor Author

  • remove {true, NewItem} from the return values of the filter function (and accordingly, from the documentation)
  • add back the paragraph regarding garbage in the documentation
  • add a new filtermap/2 function to mirror lists:filtermap

Ok, all done :)

A note on queue:filtermap/2: I'm running lists:filtermap/2 on the front list, and iterate a custom function mimicking queue:filter_r/2 over the rear list, in order to traverse in queue order. The latter could also be done via lists:foldr/3, but it would require another function enveloping the filtermap function. Also I wasn't sure about "garbage" etc (my knowledge doesn't run that deep =^^=).

<p>Calls <c><anno>Fun</anno>(<anno>Item</anno>, <anno>AccIn</anno>)</c>
on successive items <c>Item</c> of <c>Queue</c>, starting
with <c><anno>AccIn</anno> == <anno>Acc0</anno></c>. The queue is
traversed in queue order, i.e. from front to rear.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not use such abbreviations in documentation. Write "i.e." as "that is".

@RaimoNiskanen
Copy link
Contributor

RaimoNiskanen commented Oct 21, 2020

@Maria-12648430: We brought this up on our Technical Board, and have a few style comments.

  • The commit message on the third commit is too long so it causes a line break in GitHub's overview.
    Our guidelines says max 50 characters, then a blank line and after that more explanation if needed.
  • I have commented inline about not using abbreviations like e.g. i.e. in our documentation.
  • A personal opinion; we had some for, and some against, so do as you like: I would prefer that since you had to write an own function for the reverse list for filtermap/2, write one for the front list too, so it is obvious from the code that both lists are handled the same way and to avoid an external call.
    Those against say that since the code exists in lists:filtermap/2, it is better to use it.

@Maria-12648430
Copy link
Contributor Author

I'll do all that on monday, afk atm :)

If the filter function returns a single-item list
or an empty list, it is treated the same way as
the respective semantically equivalents of true
and false.
A fold function in the queue module removes the need for
a conversion to a list in order to achieve the same.
@Maria-12648430
Copy link
Contributor Author

@RaimoNiskanen everything done :) Please review again.

  • A personal opinion; we had some for, and some against, so do as you like: I would prefer that since you had to write an own function for the reverse list for filtermap/2, write one for the front list too, so it is obvious from the code that both lists are handled the same way and to avoid an external call.
    Those against say that since the code exists in lists:filtermap/2, it is better to use it.

I decided to leave it as it is (front list via lists:filtermap/2), for the same reason as "those against" bring up ;)

@Maria-12648430
Copy link
Contributor Author

A more general question/request for opinions (especially in view of future additions to the queue module):

Should it be explicitly mandatory for the queue module to perform any traversals in queue order, at least when predicate functions are involved? It may (fold/3) or may not (filter/2, filtermap/2, or the future any/2 and all/2) matter for the return value of those functions, but it does always matter when it comes to side effects of the predicate function.

@juhlig
Copy link
Contributor

juhlig commented Oct 26, 2020

Should it be explicitly mandatory for the queue module to perform any traversals in queue order, at least when predicate functions are involved? It may (fold/3) or may not (filter/2, filtermap/2, or the future any/2 and all/2) matter for the return value of those functions, but it does always matter when it comes to side effects of the predicate function.

My two cents: it should always be done in queue order when user functions are involved, for the possible side effects. filter/2 does it (so does your implementation of filtermap/2), and I think it is just what one would expect.

@RaimoNiskanen RaimoNiskanen merged commit d07b4d7 into erlang:master Nov 2, 2020
@RaimoNiskanen
Copy link
Contributor

Merged. Thank you for your pull request!

@RaimoNiskanen RaimoNiskanen removed the testing currently being tested, tag is used by OTP internal CI label Nov 2, 2020
@Maria-12648430
Copy link
Contributor Author

Merged. Thank you for your pull request!

You're welcome :)
I'm preparing another PR for any/2, all/2, delete/2, delete_r/2 right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants