Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decentralized counters for all ETS table types and an option to disable decentralized counters #2229

Conversation

kjellwinblad
Copy link
Contributor

This pull request contains two commits. The first commit (ad2ce60) adds support for using decentralized counters to count the number of items and the memory consumption in the ETS table types that are implemented with hashing (set, bag and duplicate_bag). All tables with the write_concurrency option activated use decentralized counters by default after this commit has been applied. The second commit (efe06e4) in this pull request adds an ETS table option that can be used to turn off decentralized counters in tables. Such an option can be useful in applications that frequently call ets:info(T), ets:info(T,size) or ets:info(T,memory) because such calls are substantially slower with decentralized counters compared to with centralized counters.

This pull request has been experimentally evaluated on a machine with 64 hardware threads and a machine with 8 hardware threads. The results from the machine with 64 hardware threads indicate that decentralized counters give ETS tables a major scalability improvement in scenarios with many write operations (i.e., insert/2 and delete/2) when many cores are being utilized. Furthermore, decentralized counters do not cause a major performance penalty in any of the scenarios that are included in the benchmarks.

@max-au
Copy link
Contributor

max-au commented Jul 18, 2019

Thank you for this PR! We've hit the issue (ets:info got really slow for ordered_set, as if 10x regression), reproduced it with erlperf and had to fall back for {write_concurrency, false}.

@kjellwinblad
Copy link
Contributor Author

kjellwinblad commented Jul 22, 2019

Thank you for this PR! We've hit the issue (ets:info got really slow for ordered_set, as if 10x regression), reproduced it with erlperf and had to fall back for {write_concurrency, false}.

Thanks for the feedback. The functions ets:info/1, ets:info(T,size) and ets:info(T,memory) are slow when decentralized counters are activated because they use the thread progress functionality to get atomic snapshots of the decentralized counters. We want to keep ets:info(size) and ets:info(memory) linearizable even when decentralized counters are used as we want to keep the same semantics as before.

Is it important that ets:info(T,size) is exact and atomic for you? If it is acceptable to get a good estimate of the table size, you may still be able to get the scalability benefit of decentralized counters by using the counters API to "manually" maintain the size of the table in a counter with the write_concurrency option enabled. This will, however, add extra overhead to all operations that remove and insert items. Such overhead could be avoided if we made it possible to get the approximate size of a table by calling e.g., ets:info(T,approximate_size). Would that be useful?

@max-au
Copy link
Contributor

max-au commented Jul 22, 2019

As we found out, our use-case should be covered with a simple call to ets:whereis(), so slow ets:info(size) does not present an issue for us.

@kjellwinblad kjellwinblad force-pushed the kjell/stdlib/ets_hash_decentralized_counters branch from ac3dc52 to 268c340 Compare August 6, 2019 14:43
@kjellwinblad
Copy link
Contributor Author

The benchmark results have been updated.

The following changes have been made since the previous benchmark run:

  1. @sverker has made the shrinking code more scalable and made the shrinking happen less eagerly (3f4d0db).
  2. I have disabled shrinking when there are 100 or less buckets per lock and when decentralized counters are activated in the last commit of this pull request. This avoids frequent grows and shrinks due to bad estimates of the number of items in the table when the tables are small.

Results:

This commit enables decentralized counters by default in all public
ETS tables when the `write_concurrency` option is turned on. Tables of
type `ordered_set` already had support for decentralized counters so
this commit only affects tables of type `set`, `bag` and
`duplicate_bag`.

[Experiments][1] indicate that this change substantially improves the
scalability in ETS table scenarios with frequent `ets:insert/2` and
`ets:delete/2` calls when many processor cores are being utilized.

[1]: http://winsh.me/ets_catree_benchmark/decent_ctrs_hash.html
This commit adds an option that can be passed to the `ets:new/2`
function to create an ETS table that uses centralized counters to keep
track of the number of items in the table and the memory consumption
of the table. Tables with the `write_concurrency` option enabled use
decentralized counters by default.

The functions `ets:info/1` and `ets:info/2` (with `size` or `memory`
as the second parameter) get substantially slower and less scalable in
a table that use decentralized counters compared to a table that use
centralized counters so this option is useful for applications that
frequently call `ets:info/1` or `ets:info/2`.
This commit changes the shrinking logic so that ETS hash tables with
decentralized counters only shrink when there is enough buckets per
lock to get a good estimate of the total number of items from an item
counter that is associated with a lock.
@kjellwinblad kjellwinblad force-pushed the kjell/stdlib/ets_hash_decentralized_counters branch from 268c340 to 9b8f91c Compare September 25, 2019 14:40
This commit changes the default value for the ETS table option
decentralized_counters. The default value for decentralized_counters
in ETS tables of type ordered_set with the write_concurrency option
enabled is true and the default value for decentralized_counters in
all other tables is false. This commit also changes the ETS
documentation accordingly.
@kjellwinblad kjellwinblad force-pushed the kjell/stdlib/ets_hash_decentralized_counters branch from 9b8f91c to 597d3a2 Compare September 27, 2019 06:06
@sverker sverker merged commit 5bf0a8a into erlang:master Oct 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants