-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hex encoding fonctions on the binary module #3014
Add hex encoding fonctions on the binary module #3014
Conversation
LGTM. My only objection is the naming, I think |
An option to add a space in between pairs or quads of hex values in
Once you have that you can replace the hex implementation in ssl used for debugging. |
Couple of points:
|
lib/stdlib/src/binary.erl
Outdated
-spec enc_hex_digit_upper(0..15) -> byte(). | ||
enc_hex_digit_upper(Char) when Char =< 9 -> | ||
Char + $0; | ||
enc_hex_digit_upper(Char) when Char =< 15 -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The when Char =< 15
guard is redundant and should IMO be dropped for performance reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After benchmark, it appears that deleting the clause does not significantly improve performance. Without real performance improvement, I prefer to keep code readable. Don't hesitate if you have benchmarks to support your opinion.
I've benchmark with |
So right now Erlang is going to have a What about having a separate |
While base16 and hex encodings are indeed the same, most developers know what hex encoding is, but have no idea about base16. Having simple hex encoding functions available would avoid multiple re-implementations everywhere.
That would be nice to have indeed, but the scope of such a contribution would be way larger, especially since it would involve a new module (and therefore a new name in the global module namespace), and the deprecation of an existing module. If someone was to come forward to write, test and document a |
My opinion is that conversion to/from hex is essentially like any other number conversion (binary, octal, decimal) while base64 conversion is substantially different, so I would not want to conflate them. |
I tend to use Giving this a good name will be hard, the best/silliest I came up with was Maybe we should focus less on the base that we convert to and more on the transformation we are doing? i.e. |
I find this line of reasoning detrimental in general. Improve documentation perhaps to make those developers aware of what base16 is, but I do not believe that we should introduce inconsistencies and whatnot because of some developers who lack particular knowledge. We should not change the libraries and such to appeal to them, instead they should learn. We can help them about that through documentation, tutorials, and/or examples, etc.
The last one looks good enough. I am not entirely sure about the naming (although it looks OK to me) but being able to specify the base seems good, in which case I am not sure what to do with base64. If we only add hex, then adding base16 might make more sense for consistency. Personally, I consider these two options good: Either
or
The second probably will not happen overnight, it has to be flagged as deprecated first, and slowly get rid of it. We can make base in the meantime and recommend people to use that. |
@essen Interesting idea feel free to send a patch. |
This one sounds great! We are also using base32 and base62, which would be trivial to add having a common API. |
base62?? |
The OTP Technical board spent some time today talking about this PR and has come to the conclusion that we like the inclusion of this functionality as a simple conversion function for binaries into the binary module. The naming is consistent with the naming of the binary module (i.e. We do however want to keep this function simple, which means no options for configuration and the default should be uppercase (the same as On a more general note: We would not mind the addition of a As with most things, a decision is never final and we may very change opinions about whether there should be an option to |
Just realized that this part of my previous comment makes no sense. Not sure what happened there as the rest of the comment talks about the correct things. |
7506549
to
51a1f8e
Compare
Thanks for the updates! The stdlib binary_module_SUITE:error_info testcase fails. You can run the tests like this: |
@garazdawi Fixed :) |
Dealing with hex format is frequent in many codebases. At this time, we all copy-paste similar helper in our codebases to encode and decode a hex-encoded binary.
746c67d
to
46abba5
Compare
Oh, I've missed this PR, but I have some input when it comes to algorithms. Currently I'm using this implementation https://github.com/esl/base16/blob/master/src/base16.erl using lookup tables, and it has (on OTP23, non-jit) benchmarks like: Benchee.run(
%{
"encode optimised" => fn input -> :base16.encode(input) end,
"encode otp" => fn input -> :bench_erl.encode_hex(input) end
},
inputs: %{
"1. Tiny" => :crypto.strong_rand_bytes(8),
"2. Small" => :crypto.strong_rand_bytes(64),
"3. Medium" => :crypto.strong_rand_bytes(256),
"4. Medium Big" => :crypto.strong_rand_bytes(512),
"5. Big" => :crypto.strong_rand_bytes(1024),
"6. Large" => :crypto.strong_rand_bytes(64000),
"7. Random" => :crypto.strong_rand_bytes(:rand.uniform(10000))
},
parallel: 12,
memory_time: 5
)
Benchee.run(
%{
"decode optimised" => fn input -> :base16.decode(input) end,
"decode otp" => fn input -> :bench_erl.my_decode(input) end
},
inputs: %{
"1. Tiny" => :bench_erl.encode(:crypto.strong_rand_bytes(8)),
"2. Small" => :bench_erl.encode(:crypto.strong_rand_bytes(64)),
"3. Medium" => :bench_erl.encode(:crypto.strong_rand_bytes(256)),
"4. Medium Big" => :bench_erl.encode(:crypto.strong_rand_bytes(512)),
"5. Big" => :bench_erl.encode(:crypto.strong_rand_bytes(1024)),
"6. Large" => :bench_erl.encode(:crypto.strong_rand_bytes(64000)),
"7. Random" => :bench_erl.encode(:crypto.strong_rand_bytes(:rand.uniform(10000)))
},
parallel: 12,
memory_time: 5
) With results
That is, always constant memory consumption vs linear, plus usually 2x-3x faster, regardless of the input size. I haven't built benchmarks using a BEAM with a JIT, but what do you think of such algorithm? I could adapt it to always return uppercase and errors as in OTP and make a PR 🙂 edit: .beam file gets bigger of course, and actually the difference is in encoding, decoding is pretty much exactly the same. |
I got all too curious by this, and adding elixir's |
I propose to add the
encode_hex/1
,encode_hex/2
anddecode_hex/1
functions on the
binary
module.Dealing with the hex format is frequent in many codebases. At this time,
we all copy-paste similar helpers in our codebases to encode and decode
a hex-encoded binary.
It's why I suggest enriching the binary module with these functions.
I've added these functions on the binary module because it makes sense
to me, and it avoids polluting the global namespace with a new
module. Maybe create a new module called hex, for example, is a better
solution.
I tried as much as possible to follow the contribution guide. Don't
hesitate to tell me if I made a mistake somewhere.