-
Notifications
You must be signed in to change notification settings - Fork 357
Cache parsed object names #675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looking up the fstring table isn't that cheap, if we assume most object
names are likely to re-appear, it makes sense to keep a small cache of them on
the stack.
e.g. `[{"foo": 1, "bar": 2}, {"foo": 3, "bar": 4}]`
In term of implementation, we use a simple sorted array with binary search
as it's the most compact and guarantee a decent `O(log n)`, and the comparison
is first done on the string length, an then fallback to `memcmp`.
This helps quite a bit on `activitypub` and `twitter` benchmarks, however
it is worse on `citm_catalog` because it has a number of unique object names,
of the same size, so it behave poorly with the chosen sorting algorithm.
It's a bit of a double edge sword, but perhaps a smarter comparison function
may solve it.
Before:
```
== Parsing activitypub.json (58160 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
json 714.000 i/100ms
oj 801.000 i/100ms
Oj::Parser 943.000 i/100ms
rapidjson 632.000 i/100ms
Calculating -------------------------------------
json 7.135k (± 0.7%) i/s (140.16 μs/i) - 35.700k in 5.003978s
oj 7.991k (± 0.2%) i/s (125.14 μs/i) - 40.050k in 5.012044s
Oj::Parser 9.611k (± 0.2%) i/s (104.04 μs/i) - 48.093k in 5.003723s
rapidjson 6.318k (± 0.2%) i/s (158.29 μs/i) - 31.600k in 5.001896s
Comparison:
json: 7134.7 i/s
Oj::Parser: 9611.5 i/s - 1.35x faster
oj: 7990.8 i/s - 1.12x faster
rapidjson: 6317.6 i/s - 1.13x slower
== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
json 57.000 i/100ms
oj 62.000 i/100ms
Oj::Parser 78.000 i/100ms
rapidjson 56.000 i/100ms
Calculating -------------------------------------
json 573.527 (± 1.6%) i/s (1.74 ms/i) - 2.907k in 5.070094s
oj 619.368 (± 1.6%) i/s (1.61 ms/i) - 3.100k in 5.006550s
Oj::Parser 770.095 (± 0.9%) i/s (1.30 ms/i) - 3.900k in 5.064768s
rapidjson 560.601 (± 0.4%) i/s (1.78 ms/i) - 2.856k in 5.094597s
Comparison:
json: 573.5 i/s
Oj::Parser: 770.1 i/s - 1.34x faster
oj: 619.4 i/s - 1.08x faster
rapidjson: 560.6 i/s - 1.02x slower
== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
json 31.000 i/100ms
oj 34.000 i/100ms
Oj::Parser 46.000 i/100ms
rapidjson 38.000 i/100ms
Calculating -------------------------------------
json 319.842 (± 0.3%) i/s (3.13 ms/i) - 1.612k in 5.040026s
oj 329.315 (± 2.4%) i/s (3.04 ms/i) - 1.666k in 5.061887s
Oj::Parser 452.725 (± 1.1%) i/s (2.21 ms/i) - 2.300k in 5.080996s
rapidjson 358.160 (± 0.8%) i/s (2.79 ms/i) - 1.824k in 5.093054s
Comparison:
json: 319.8 i/s
Oj::Parser: 452.7 i/s - 1.42x faster
rapidjson: 358.2 i/s - 1.12x faster
oj: 329.3 i/s - 1.03x faster
```
After:
```
== Parsing activitypub.json (58160 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
json 810.000 i/100ms
oj 794.000 i/100ms
Oj::Parser 936.000 i/100ms
rapidjson 630.000 i/100ms
Calculating -------------------------------------
json 7.759k (± 1.2%) i/s (128.88 μs/i) - 38.880k in 5.011657s
oj 7.945k (± 1.2%) i/s (125.87 μs/i) - 40.494k in 5.097765s
Oj::Parser 9.658k (± 1.3%) i/s (103.54 μs/i) - 48.672k in 5.040614s
rapidjson 6.321k (± 0.3%) i/s (158.20 μs/i) - 32.130k in 5.082985s
Comparison:
json: 7759.2 i/s
Oj::Parser: 9657.8 i/s - 1.24x faster
oj: 7944.6 i/s - same-ish: difference falls within error
rapidjson: 6321.1 i/s - 1.23x slower
== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
json 66.000 i/100ms
oj 62.000 i/100ms
Oj::Parser 78.000 i/100ms
rapidjson 56.000 i/100ms
Calculating -------------------------------------
json 661.653 (± 0.6%) i/s (1.51 ms/i) - 3.366k in 5.087447s
oj 621.674 (± 1.3%) i/s (1.61 ms/i) - 3.162k in 5.087171s
Oj::Parser 775.859 (± 0.3%) i/s (1.29 ms/i) - 3.900k in 5.026718s
rapidjson 563.961 (± 0.4%) i/s (1.77 ms/i) - 2.856k in 5.064230s
Comparison:
json: 661.7 i/s
Oj::Parser: 775.9 i/s - 1.17x faster
oj: 621.7 i/s - 1.06x slower
rapidjson: 564.0 i/s - 1.17x slower
== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
json 30.000 i/100ms
oj 34.000 i/100ms
Oj::Parser 47.000 i/100ms
rapidjson 38.000 i/100ms
Calculating -------------------------------------
json 303.318 (± 0.3%) i/s (3.30 ms/i) - 1.530k in 5.044285s
oj 328.832 (± 2.7%) i/s (3.04 ms/i) - 1.666k in 5.069899s
Oj::Parser 458.235 (± 0.4%) i/s (2.18 ms/i) - 2.303k in 5.025895s
rapidjson 381.204 (± 0.5%) i/s (2.62 ms/i) - 1.938k in 5.084065s
Comparison:
json: 303.3 i/s
Oj::Parser: 458.2 i/s - 1.51x faster
rapidjson: 381.2 i/s - 1.26x faster
oj: 328.8 i/s - 1.08x faster
```
Based on how JSON documents are generally constructed, a numeric
object name is much less likely to be repeated in the document.
```
== Parsing activitypub.json (58160 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
json 779.000 i/100ms
oj 793.000 i/100ms
Oj::Parser 956.000 i/100ms
rapidjson 620.000 i/100ms
Calculating -------------------------------------
json 7.849k (± 0.3%) i/s (127.41 μs/i) - 39.729k in 5.061875s
oj 7.953k (± 0.3%) i/s (125.74 μs/i) - 40.443k in 5.085293s
Oj::Parser 9.652k (± 0.2%) i/s (103.61 μs/i) - 48.756k in 5.051573s
rapidjson 6.198k (± 0.1%) i/s (161.35 μs/i) - 31.000k in 5.001787s
Comparison:
json: 7848.7 i/s
Oj::Parser: 9651.7 i/s - 1.23x faster
oj: 7953.0 i/s - 1.01x faster
rapidjson: 6197.8 i/s - 1.27x slower
== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
json 66.000 i/100ms
oj 62.000 i/100ms
Oj::Parser 79.000 i/100ms
rapidjson 56.000 i/100ms
Calculating -------------------------------------
json 676.145 (± 0.3%) i/s (1.48 ms/i) - 3.432k in 5.075895s
oj 631.864 (± 0.3%) i/s (1.58 ms/i) - 3.162k in 5.004287s
Oj::Parser 791.409 (± 0.3%) i/s (1.26 ms/i) - 4.029k in 5.090947s
rapidjson 566.597 (± 0.2%) i/s (1.76 ms/i) - 2.856k in 5.040641s
Comparison:
json: 676.1 i/s
Oj::Parser: 791.4 i/s - 1.17x faster
oj: 631.9 i/s - 1.07x slower
rapidjson: 566.6 i/s - 1.19x slower
== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
json 34.000 i/100ms
oj 32.000 i/100ms
Oj::Parser 43.000 i/100ms
rapidjson 37.000 i/100ms
Calculating -------------------------------------
json 356.414 (± 0.3%) i/s (2.81 ms/i) - 1.802k in 5.055963s
oj 349.397 (± 2.0%) i/s (2.86 ms/i) - 1.760k in 5.039216s
Oj::Parser 478.342 (± 0.4%) i/s (2.09 ms/i) - 2.408k in 5.034158s
rapidjson 395.222 (± 0.8%) i/s (2.53 ms/i) - 1.998k in 5.055669s
Comparison:
json: 356.4 i/s
Oj::Parser: 478.3 i/s - 1.34x faster
rapidjson: 395.2 i/s - 1.11x faster
oj: 349.4 i/s - same-ish: difference falls within error
```
|
4a04252 solves the regression. It's simple, we just don't cache names that don't start with a letter. This may sound like benchmark gaming, but I think it makes sense as an heuristic given how most JSON documents are constructed. |
ext/json/ext/parser/name_cache.h
Outdated
| static VALUE rstring_cache_fetch(rstring_cache *cache, const char *str, const long length) | ||
| { | ||
| if (RB_UNLIKELY(length > JSON_RSTRING_CACHE_MAX_ENTRY_LENGTH)) { | ||
| rb_enc_interned_str(str, length, rb_utf8_encoding()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this missing a return?
Fixed in the next commit, but I wonder if it affects the benchmark results for this commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it was a mistake. I don't think we ever hit that branch on any of the benchmarks.
|
@eregon while you're here, I'm surprised to see TruffleRuby is missing |
|
https://github.com/oracle/truffleruby/blob/07b978e4e0d43827ca13c600af447c98ac477a71/src/main/c/cext/string.c#L435-L442 looks like TruffleRuby has rb_enc_interned_str_cstr() and rb_str_to_interned_str() but not rb_enc_interned_str() yet -> truffleruby/truffleruby#3703 |
Also micro optimize a bit further.
```
== Parsing activitypub.json (58160 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
json 832.000 i/100ms
oj 799.000 i/100ms
Oj::Parser 969.000 i/100ms
rapidjson 636.000 i/100ms
Calculating -------------------------------------
json 8.020k (± 0.3%) i/s (124.70 μs/i) - 40.768k in 5.083607s
oj 7.942k (± 1.7%) i/s (125.92 μs/i) - 39.950k in 5.031909s
Oj::Parser 9.515k (± 4.4%) i/s (105.10 μs/i) - 47.481k in 5.001202s
rapidjson 6.282k (± 2.1%) i/s (159.20 μs/i) - 31.800k in 5.064719s
Comparison:
json: 8019.6 i/s
Oj::Parser: 9514.5 i/s - 1.19x faster
oj: 7941.9 i/s - same-ish: difference falls within error
rapidjson: 6281.6 i/s - 1.28x slower
== Parsing twitter.json (567916 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
json 67.000 i/100ms
oj 62.000 i/100ms
Oj::Parser 79.000 i/100ms
rapidjson 55.000 i/100ms
Calculating -------------------------------------
json 670.935 (± 2.7%) i/s (1.49 ms/i) - 3.417k in 5.096850s
oj 618.937 (± 3.2%) i/s (1.62 ms/i) - 3.100k in 5.014800s
Oj::Parser 768.894 (± 1.7%) i/s (1.30 ms/i) - 3.871k in 5.036093s
rapidjson 556.882 (± 2.7%) i/s (1.80 ms/i) - 2.805k in 5.040970s
Comparison:
json: 670.9 i/s
Oj::Parser: 768.9 i/s - 1.15x faster
oj: 618.9 i/s - 1.08x slower
rapidjson: 556.9 i/s - 1.20x slower
== Parsing citm_catalog.json (1727030 bytes)
ruby 3.3.4 (2024-07-09 revision be1089c8ec) +YJIT [arm64-darwin23]
Warming up --------------------------------------
json 35.000 i/100ms
oj 32.000 i/100ms
Oj::Parser 43.000 i/100ms
rapidjson 39.000 i/100ms
Calculating -------------------------------------
json 382.262 (± 2.6%) i/s (2.62 ms/i) - 1.925k in 5.039336s
oj 348.721 (± 0.6%) i/s (2.87 ms/i) - 1.760k in 5.047265s
Oj::Parser 478.294 (± 0.6%) i/s (2.09 ms/i) - 2.408k in 5.034798s
rapidjson 398.740 (± 0.8%) i/s (2.51 ms/i) - 2.028k in 5.086365s
Comparison:
json: 382.3 i/s
Oj::Parser: 478.3 i/s - 1.25x faster
rapidjson: 398.7 i/s - 1.04x faster
oj: 348.7 i/s - 1.10x slower
```
|
I fixed the TruffleRuby and Ruby 2.7 support, and squeezed some more micro-optimizations: I'll squash and merge. |
This is somewhat dead code as unless you are using `JSON::Parser.new` direcltly we never allocate `JSON::Ext::Parser` anymore. But still, we should mark all its reference in case some code out there uses that. Followup: ruby#675
This is somewhat dead code as unless you are using `JSON::Parser.new` direcltly we never allocate `JSON::Ext::Parser` anymore. But still, we should mark all its reference in case some code out there uses that. Followup: ruby#675
Heavily inspired from ruby/json#675. When parsing documents with the same structure repeatedly, a lot of time can be saved by keeping a small cache of map keys encountered. Using the existing `bench/bench.rb`, comparing with and without the cache shows a 30% performance improvement: ``` Calculating ------------------------------------- unpack-pooled 960.380k (± 1.4%) i/s - 4.865M in 5.066600s unpack-key-cache 1.245M (± 1.6%) i/s - 6.232M in 5.009060s Comparison: unpack-pooled: 960379.8 i/s unpack-key-cache: 1244517.6 i/s - 1.30x (± 0.00) faster ``` However, on the same benchmark, but with the cache filled with other keys, the performance is notably degraded: ``` Calculating ------------------------------------- unpack-pooled 926.849k (± 2.1%) i/s - 4.639M in 5.007333s unpack-key-cache 822.266k (± 2.4%) i/s - 4.113M in 5.004645s Comparison: unpack-pooled: 926849.2 i/s unpack-key-cache: 822265.6 i/s - 1.13x (± 0.00) slower ``` So this feature is powerful but situational.
Heavily inspired from ruby/json#675. When parsing documents with the same structure repeatedly, a lot of time can be saved by keeping a small cache of map keys encountered. Using the existing `bench/bench.rb`, comparing with and without the cache shows a 30% performance improvement: ``` Calculating ------------------------------------- unpack-pooled 960.380k (± 1.4%) i/s - 4.865M in 5.066600s unpack-key-cache 1.245M (± 1.6%) i/s - 6.232M in 5.009060s Comparison: unpack-pooled: 960379.8 i/s unpack-key-cache: 1244517.6 i/s - 1.30x (± 0.00) faster ``` However, on the same benchmark, but with the cache filled with other keys, the performance is notably degraded: ``` Calculating ------------------------------------- unpack-pooled 926.849k (± 2.1%) i/s - 4.639M in 5.007333s unpack-key-cache 822.266k (± 2.4%) i/s - 4.113M in 5.004645s Comparison: unpack-pooled: 926849.2 i/s unpack-key-cache: 822265.6 i/s - 1.13x (± 0.00) slower ``` So this feature is powerful but situational.
Heavily inspired from ruby/json#675. When parsing documents with the same structure repeatedly, a lot of time can be saved by keeping a small cache of map keys encountered. Using the existing `bench/bench.rb`, comparing with and without the cache shows a 30% performance improvement: ``` Calculating ------------------------------------- unpack-pooled 960.380k (± 1.4%) i/s - 4.865M in 5.066600s unpack-key-cache 1.245M (± 1.6%) i/s - 6.232M in 5.009060s Comparison: unpack-pooled: 960379.8 i/s unpack-key-cache: 1244517.6 i/s - 1.30x (± 0.00) faster ``` However, on the same benchmark, but with the cache filled with other keys, the performance is notably degraded: ``` Calculating ------------------------------------- unpack-pooled 926.849k (± 2.1%) i/s - 4.639M in 5.007333s unpack-key-cache 822.266k (± 2.4%) i/s - 4.113M in 5.004645s Comparison: unpack-pooled: 926849.2 i/s unpack-key-cache: 822265.6 i/s - 1.13x (± 0.00) slower ``` So this feature is powerful but situational.
Heavily inspired from ruby/json#675. When parsing documents with the same structure repeatedly, a lot of time can be saved by keeping a small cache of map keys encountered. Using the existing `bench/bench.rb`, comparing with and without the cache shows a 30% performance improvement: ``` Calculating ------------------------------------- unpack-pooled 960.380k (± 1.4%) i/s - 4.865M in 5.066600s unpack-key-cache 1.245M (± 1.6%) i/s - 6.232M in 5.009060s Comparison: unpack-pooled: 960379.8 i/s unpack-key-cache: 1244517.6 i/s - 1.30x (± 0.00) faster ``` However, on the same benchmark, but with the cache filled with other keys, the performance is notably degraded: ``` Calculating ------------------------------------- unpack-pooled 926.849k (± 2.1%) i/s - 4.639M in 5.007333s unpack-key-cache 822.266k (± 2.4%) i/s - 4.113M in 5.004645s Comparison: unpack-pooled: 926849.2 i/s unpack-key-cache: 822265.6 i/s - 1.13x (± 0.00) slower ``` So this feature is powerful but situational.
Looking up the fstring table isn't that cheap, if we assume most object names are likely to re-appear, it makes sense to keep a small cache of them on the stack.
e.g.
[{"foo": 1, "bar": 2}, {"foo": 3, "bar": 4}]In term of implementation, we use a simple sorted array with binary search as it's the most compact and guarantee a decent
O(log n), and the comparison is first done on the string length, an then fallback tomemcmp.This helps quite a bit on
activitypubandtwitterbenchmarks, however it is worse oncitm_catalogbecause it has a number of unique object names, of the same size, so it behave poorly with the chosen sorting algorithm.It's a bit of a double edge sword, but perhaps a smarter comparison function may solve it.Solved later on.Before:
After: