uint8_t parsing #349

shikharish · 2025-12-22T11:39:44Z

Related to #226.

shikharish · 2025-12-22T11:45:29Z

first implementation.
following points need to be addressed:

need to make sure std::memcpy() is alway valid (currently it is UB)
current code only works in little endian

I have added a separate benchmark which I will merge to the main benchmark later.
The benchmark results vary significantly with compiler and architecture.

Apple M1 (8) @ 3.20 GHz using clang (arm64)

loaded db: a14 (Apple A14/M1)
parse_ip_fromchars                       :   0.23 GB/s   15.9 Ma/s  62.78 ns/d   1.96 GHz  122.81 c/d  372.87 i/d   8.60 c/b  26.11 i/b   3.04 i/c 
parse_ip_fastswar                        :   0.38 GB/s   26.3 Ma/s  37.96 ns/d   1.96 GHz  74.25 c/d  186.00 i/d   5.20 c/b  13.02 i/b   2.51 i/c 
sink=3767029944

Apple M1 (8) @ 3.20 GHz using gcc (arm64)

loaded db: a14 (Apple A14/M1)
parse_ip_fromchars                       :   0.41 GB/s   28.6 Ma/s  34.95 ns/d   1.96 GHz  68.43 c/d  224.40 i/d   4.79 c/b  15.71 i/b   3.28 i/c 
parse_ip_fastswar                        :   0.36 GB/s   25.2 Ma/s  39.64 ns/d   1.96 GHz  77.54 c/d  228.47 i/d   5.43 c/b  16.00 i/b   2.95 i/c 
sink=3738477744

Intel(R) Core(TM) i7-5500U (4) @ 3.00 GHz using clang (x86_64)

parse_ip_fromchars                       :   0.40 GB/s   27.7 Ma/s  36.07 ns/d   2.68 GHz  96.82 c/d  222.65 i/d   6.78 c/b  15.59 i/b   2.30 i/c 
parse_ip_fastswar                        :   0.42 GB/s   29.1 Ma/s  34.34 ns/d   2.68 GHz  92.05 c/d  223.86 i/d   6.45 c/b  15.68 i/b   2.43 i/c 
sink=3738477744

Intel(R) Core(TM) i7-5500U (4) @ 3.00 GHz using gcc (x86_64)

parse_ip_fromchars                       :   0.38 GB/s   26.4 Ma/s  37.85 ns/d   2.68 GHz  101.25 c/d  231.36 i/d   7.09 c/b  16.20 i/b   2.29 i/c 
parse_ip_fastswar                        :   0.43 GB/s   30.1 Ma/s  33.23 ns/d   2.68 GHz  89.05 c/d  185.92 i/d   6.24 c/b  13.02 i/b   2.09 i/c 
sink=3738477744

shikharish · 2025-12-22T11:47:54Z

Request for review @lemire

shikharish · 2025-12-22T11:50:08Z

A constexpr branch should be efficient and enough for handling uint8_t.
What could maybe be improved is the different branches for nd==0 and nd>3. This is what I could come up with. When I tried to minimize branches, the code sometimes performed significantly worse.
Please review.

lemire · 2025-12-22T17:04:45Z

I recommend adopting your benchmark immediately: #350 It does not use your SWAR approach, we just benchmark fast_float vs the standard.

For the memcpy, it is not usable in a constexpr context, but bit_cast is.

shikharish · 2025-12-22T19:01:12Z

I recommend adopting your benchmark immediately: #350 It does not use your SWAR approach, we just benchmark fast_float vs the standard.

Alright. I will rebase this branch after the PR gets merged.

For the memcpy, it is not usable in a constexpr context, but bit_cast is.

Ah, we only choose the branch at compile time. I should rather do:

  constexpr bool is_uint8 = std::is_same_v<T, std::uint8_t>;

  if (is_uint8) {
    const size_t len = (size_t)(pend - p);
...

About the memcpy UB, I cannot find a fast (and generally-applicable) way to fix this. Adding a branch on len measurably slows the hot path. And we cannot assume the buffer will be padded properly.

Do we document a precondition: atleast 4 readable bytes from p otherwise it is not safe for uint8?

lemire · 2025-12-22T21:36:31Z

@shikharish

Ah, we only choose the branch at compile time. I should rather do

I think that this should be

if constexpr (is_uint8) {

although some care is needed not to break backward compatibility with earllier versions of C++.

About the memcpy UB, I cannot find a fast (and generally-applicable) way to fix this. Adding a branch on len measurably slows the hot path. And we cannot assume the buffer will be padded properly. Do we document a precondition: atleast 4 readable bytes from p otherwise it is not safe for uint8?

No. We don't do that. We can get creative, but we don't want to read beyond the buffer in general as this might cause a fatal crash in some instances.

shikharish · 2025-12-23T00:34:57Z

There is an issue with the benchmarking library. It is adding too much overhead(due to templating I think).

❯ sudo ./build/benchmarks/bench_ip
parse_ip_std_fromchars                   :   0.21 GB/s   14.6 Ma/s  68.67 ns/d   1.96 GHz  134.33 c/d  488.66 i/d   9.41 c/b  34.21 i/b   3.64 i/c 
parse_ip_fastfloat                       :   0.22 GB/s   15.5 Ma/s  64.54 ns/d   1.96 GHz  126.19 c/d  358.92 i/d   8.84 c/b  25.13 i/b   2.84 i/c 
sink=3749209294

This is inaccurate compared to the actual raw speed. I wrote a simple benchmark which just measures the throughput:

❯ ./simple_bench
std::from_chars                :  0.25 GB/s   56.8 ns/d
fast_float::from_chars         :  0.38 GB/s   37.9 ns/d
sink=3763786764

lemire · 2025-12-23T02:11:11Z

@shikharish Please see #351

Signed-off-by: Shikhar <[email protected]>

shikharish · 2025-12-24T19:18:59Z

rebased.
benchmark on Apple(M1) clang 14:

memcpy baseline                          :  45.71 GB/s  2857.1 Mip/s   0.35 ns/ip   2.77 GHz   0.97 c/ip   3.90 i/ip   0.06 c/b   0.24 i/b   4.02 i/c 
just_seek_ip_end (no parse)              :   0.61 GB/s   38.2 Mip/s  26.16 ns/ip   1.97 GHz  51.44 c/ip  123.91 i/ip   3.22 c/b   7.74 i/b   2.41 i/c 
parse_ip_std_fromchars                   :   0.38 GB/s   24.0 Mip/s  41.69 ns/ip   1.96 GHz  81.82 c/ip  289.01 i/ip   5.11 c/b  18.06 i/b   3.53 i/c 
parse_ip_fastfloat                       :   0.81 GB/s   50.7 Mip/s  19.71 ns/ip   1.97 GHz  38.83 c/ip  177.86 i/ip   2.43 c/b  11.12 i/b   4.58 i/c

@lemire
This can be merged now I think.

lemire · 2025-12-24T20:49:50Z

@shikharish Impressive...

main:

memcpy baseline                          :  97.64 GB/s  6102.5 Mip/s   0.16 ns/ip
just_seek_ip_end (no parse)              :   2.10 GB/s  131.0 Mip/s   7.64 ns/ip
parse_ip_std_fromchars                   :   1.07 GB/s   67.1 Mip/s  14.91 ns/ip
parse_ip_fastfloat                       :   1.26 GB/s   78.9 Mip/s  12.67 ns/ip

this PR

memcpy baseline                          :  99.30 GB/s  6206.0 Mip/s   0.16 ns/ip
just_seek_ip_end (no parse)              :   2.12 GB/s  132.5 Mip/s   7.55 ns/ip
parse_ip_std_fromchars                   :   1.12 GB/s   70.3 Mip/s  14.22 ns/ip
parse_ip_fastfloat                       :   2.21 GB/s  138.1 Mip/s   7.24 ns/ip

I am not sure how we are beating just_seek_ip_end (no parse). This seems... extraordinary.

Your code looks very good to me, but I'd want more tests before merging this.

Thus, please see shikharish#1

(The test do pass, but I'd like you to review the test and make sure you feel that we have enough tests.)

adding some ipv4 test

Signed-off-by: Shikhar <[email protected]>

lemire · 2025-12-24T23:37:54Z

Merged. Will be in next release. I will blog about it too (with credit to you).

shikharish · 2025-12-24T23:48:25Z

@lemire Thanks a lot! This was a great learning experience for me.

I want to work on some more issues/projects. Is there anything you have in mind I can work on? I have been looking at other projects(simdutf,simdjson,etc.) so anything in that general space would be exciting. Honestly, I am only starting to get into projects focusing on performance and latency(I've realized I love fast code), so anything would be a learning experience.

Either way, thanks again. Looking forward to the next release(and the blog post!).

lemire · 2025-12-25T00:43:50Z

@shikharish I have no secrets.

Because this seems to greatly speed up the common case when parsing IPv4 addresses, it seems that taking this code into ada-url/ada would be a low hanging fruit. IPv4 can come in different forms but we might be able to do optimistic parsing followed by a fallback. The ada library is used by millions of people. Now we don’t want to bring fast_float into this… but the code is simple enough.

lemire mentioned this pull request Dec 22, 2025

adding IP address benchmark #350

Merged

shikharish force-pushed the uint8 branch from 4646cc0 to 2b208cd Compare December 23, 2025 00:29

shikharish marked this pull request as ready for review December 23, 2025 00:44

shikharish force-pushed the uint8 branch from 037f59d to d70e84b Compare December 23, 2025 00:58

shikharish changed the title ~~wip: uint8_t parsing~~ uint8_t parsing Dec 24, 2025

shikharish added 3 commits December 25, 2025 00:45

uint8_t parsing

fce0ab6

Signed-off-by: Shikhar <[email protected]>

c++14 constexpr

fdb0edd

Signed-off-by: Shikhar <[email protected]>

fix macro

780c341

Signed-off-by: Shikhar <[email protected]>

shikharish force-pushed the uint8 branch from d70e84b to 780c341 Compare December 24, 2025 19:17

shikharish closed this Dec 24, 2025

shikharish reopened this Dec 24, 2025

adding some ipv4 test

120bdfd

shikharish and others added 2 commits December 25, 2025 02:57

Merge pull request #1 from lemire/add_test

e076a81

adding some ipv4 test

lint

97cb3ec

Signed-off-by: Shikhar <[email protected]>

lemire merged commit 1ad224e into fastfloat:main Dec 24, 2025
37 checks passed

uint8_t parsing #349

uint8_t parsing #349

Conversation

shikharish commented Dec 22, 2025

Uh oh!

shikharish commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shikharish commented Dec 22, 2025

Uh oh!

shikharish commented Dec 22, 2025

Uh oh!

lemire commented Dec 22, 2025

Uh oh!

shikharish commented Dec 22, 2025

Uh oh!

lemire commented Dec 22, 2025

Uh oh!

shikharish commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lemire commented Dec 23, 2025

Uh oh!

shikharish commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lemire commented Dec 24, 2025

Uh oh!

Uh oh!

lemire commented Dec 24, 2025

Uh oh!

shikharish commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lemire commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shikharish commented Dec 22, 2025 •

edited

Loading

shikharish commented Dec 23, 2025 •

edited

Loading

shikharish commented Dec 24, 2025 •

edited

Loading

shikharish commented Dec 24, 2025 •

edited

Loading