Pipeline Benchmarks
ndn-rs ships a Criterion-based benchmark suite that measures individual pipeline stage costs and end-to-end forwarding latency. The benchmarks live in crates/engine/ndn-engine/benches/pipeline.rs.
Running Benchmarks
# Run the full suite
cargo bench -p ndn-engine
# Run a specific benchmark group
cargo bench -p ndn-engine -- "cs/"
cargo bench -p ndn-engine -- "fib/lpm"
cargo bench -p ndn-engine -- "interest_pipeline"
# View HTML reports after a run
open target/criterion/report/index.html
Criterion generates HTML reports with statistical analysis, throughput charts, and comparison against previous runs in target/criterion/.
Approximate Relative Cost of Pipeline Stages
%%{init: {'theme': 'default'}}%%
pie title Pipeline Stage Cost Breakdown (approximate)
"TLV Decode" : 30
"CS Lookup (miss)" : 10
"PIT Check" : 15
"FIB LPM" : 20
"Strategy" : 10
"Dispatch" : 15
The chart above shows approximate relative costs for a typical Interest pipeline traversal (CS miss path). TLV decode and FIB longest-prefix match dominate because they involve parsing variable-length names and traversing trie nodes. CS lookup on a miss and strategy execution are comparatively cheap. Actual proportions depend on name length, table sizes, and cache state – run the benchmarks to get precise numbers for your workload.
Benchmark Harness Architecture
graph LR
subgraph "Setup (per iteration)"
PB["Pre-built wire packets<br/>(realistic names, ~100 B content)"]
end
subgraph "Benchmark Loop (Criterion)"
PB --> S1["Stage under test<br/>(e.g. TlvDecodeStage)"]
S1 --> M["Measure:<br/>latency (ns/op)<br/>throughput (ops/sec, bytes/sec)"]
end
subgraph "Full Pipeline Benchmarks"
PB --> FP["All stages in sequence<br/>(decode -> CS -> PIT -> FIB -> strategy -> dispatch)"]
FP --> M2["End-to-end latency"]
end
RT["Tokio current-thread runtime<br/>(no I/O, no scheduling jitter)"] -.->|"runs"| S1
RT -.->|"runs"| FP
style PB fill:#e8f4fd,stroke:#2196F3
style M fill:#c8e6c9,stroke:#4CAF50
style M2 fill:#c8e6c9,stroke:#4CAF50
style RT fill:#fff3e0,stroke:#FF9800
What Is Benchmarked
TLV Decode
Groups: decode/interest, decode/data
Measures the cost of TlvDecodeStage – parsing raw wire bytes into a decoded Interest or Data struct and setting ctx.name. Tested with 4-component and 8-component names to show scaling with name length.
Throughput is reported in bytes/sec to make comparisons across packet sizes meaningful.
Content Store Lookup
Group: cs
cs/hit: lookup of a name that exists in the CS. Measures the fast path where a cached Data is returned and the Interest pipeline short-circuits (no PIT or strategy involved).cs/miss: lookup of a name not in the CS. Measures the overhead added to every Interest that proceeds past the CS stage.
Uses a 64 MiB LruCs with a pre-populated entry for the hit case.
PIT Check
Group: pit
pit/new_entry: inserting a new PIT entry for a never-seen name. Uses a fresh PIT per iteration to isolate insert cost.pit/aggregate: second Interest with a different nonce hitting an existing PIT entry. This is the aggregation path where the Interest is suppressed (returned asAction::Drop).
FIB Longest-Prefix Match
Group: fib/lpm
Measures LPM lookup time with 10, 100, and 1000 routes in the FIB. Routes have 2-component prefixes; the lookup name has 4 components (2 matching + 2 extra). This isolates trie traversal cost from name parsing.
PIT Match (Data Path)
Group: pit_match
pit_match/hit: Data arriving that matches an existing PIT entry. Seeds the PIT with a matching Interest, then measures the match and entry extraction.pit_match/miss: Data arriving with no matching PIT entry (unsolicited Data, dropped).
CS Insert
Group: cs_insert
cs_insert/insert_replace: steady-state replacement of an existing CS entry (same name, new Data). Measures the cost when the CS is warm.cs_insert/insert_new: inserting a unique name on each iteration. Measures cold-path cost including NameTrie node creation.
Validation Stage
Group: validation_stage
validation_stage/disabled: passthrough when noValidatoris configured. Measures the baseline overhead of the stage itself.validation_stage/cert_via_anchor: full Ed25519 signature verification using a trust anchor. Includes schema check, key lookup, and cryptographic verify.
Full Interest Pipeline
Groups: interest_pipeline, interest_pipeline/cs_hit
interest_pipeline/no_route: decode + CS miss + PIT new entry. Stops before the strategy stage to isolate pure pipeline overhead. Tested with 4 and 8 component names.interest_pipeline/cs_hit: decode + CS hit. Measures the fast path where a cached Data satisfies the Interest immediately.
Full Data Pipeline
Group: data_pipeline
Decode + PIT match + CS insert. Seeds the PIT with a matching Interest, then runs the full Data path. Tested with 4 and 8 component names. Throughput is reported in bytes/sec.
Decode Throughput
Group: decode_throughput
Batch decoding of 1000 Interests in a tight loop. Reports throughput in elements/sec rather than latency, giving a peak-rate estimate for the decode stage.
Benchmark Design Notes
- All async benchmarks use a current-thread Tokio runtime with no I/O, isolating CPU cost from scheduling jitter.
- Packet wire bytes are built with realistic name lengths (4 and 8 components) and ~100 B Data content.
- The PIT is cleared between iterations where noted to ensure consistent starting state.
- Each benchmark group uses Criterion’s
Throughputannotations so reports show both latency and throughput.
Interpreting Results
Criterion reports median latency by default. Look for:
- Regression alerts: Criterion flags changes >5% from the baseline. CI uses a 10% threshold (see Methodology).
- Outliers: high outlier percentages suggest contention or GC pauses. The current-thread runtime minimizes this.
- Throughput numbers: useful for capacity planning. If
decode_throughputshows 2M Interest/sec, that is the ceiling before other stages are considered.
The HTML report at target/criterion/report/index.html includes violin plots, PDFs, and regression analysis for each benchmark.
SHA-256 vs BLAKE3 in this bench
signing/sha256-digest uses sha2::Sha256 (rustcrypto), which on
both x86_64 and aarch64 ships runtime CPUID dispatch through the
cpufeatures crate and uses Intel
SHA-NI / ARMv8 SHA crypto when the CPU exposes them. Effectively
every modern CI runner and consumer CPU does, so the absolute
SHA-256 numbers in this table are SHA-NI numbers — there is no
practical “software SHA” baseline left to compare against.
That makes BLAKE3 a comparison between a hardware-accelerated SHA-256 and an AVX2/NEON-vectorised BLAKE3, and it shows: BLAKE3 is not single-thread faster than SHA-256 on these CPUs at the input sizes a typical NDN signed portion has (a few hundred bytes to a few KB). The “BLAKE3 is 3–8× faster than SHA-256” claim refers to BLAKE3 vs plain software SHA-256 — true on chips without SHA extensions, but no longer the common case. See Why BLAKE3 for the actual reasons ndn-rs supports BLAKE3 (Merkle-tree partial verification of segmented Data, multi-thread hashing, single algorithm for hash + MAC + KDF + XOF) — none of which are about raw single- thread throughput.
Latest CI Results
Last updated by CI on 2026-04-15 (ubuntu-latest, stable Rust)
| Benchmark | Median | ± Variance |
|---|---|---|
cs/hit | 762 ns | ±34 ns |
cs/miss | 524 ns | ±2 ns |
cs_insert/insert_new | 10.21 µs | ±18.18 µs |
cs_insert/insert_replace | 943 ns | ±14 ns |
data_pipeline/4 | 1.88 µs | ±66 ns |
data_pipeline/8 | 2.27 µs | ±38 ns |
decode/data/4 | 394 ns | ±26 ns |
decode/data/8 | 464 ns | ±0 ns |
decode/interest/4 | 481 ns | ±0 ns |
decode/interest/8 | 556 ns | ±2 ns |
decode_throughput/4 | 442.84 µs | ±39.54 µs |
decode_throughput/8 | 525.64 µs | ±7.39 µs |
fib/lpm/10 | 35 ns | ±0 ns |
fib/lpm/100 | 96 ns | ±0 ns |
fib/lpm/1000 | 96 ns | ±0 ns |
interest_pipeline/cs_hit | 921 ns | ±1 ns |
interest_pipeline/no_route/4 | 1.40 µs | ±33 ns |
interest_pipeline/no_route/8 | 1.55 µs | ±20 ns |
large/blake3-rayon/hash/1MB | 122.33 µs | ±2.48 µs |
large/blake3-rayon/hash/256KB | 40.89 µs | ±1.36 µs |
large/blake3-rayon/hash/4MB | 439.02 µs | ±2.45 µs |
large/blake3-single/hash/1MB | 252.69 µs | ±923 ns |
large/blake3-single/hash/256KB | 61.68 µs | ±321 ns |
large/blake3-single/hash/4MB | 999.28 µs | ±3.07 µs |
large/sha256/hash/1MB | 659.90 µs | ±893 ns |
large/sha256/hash/256KB | 164.78 µs | ±243 ns |
large/sha256/hash/4MB | 2.64 ms | ±1.82 µs |
lru/evict | 189 ns | ±3 ns |
lru/evict_prefix | 2.00 µs | ±2.06 µs |
lru/get_can_be_prefix | 297 ns | ±0 ns |
lru/get_hit | 213 ns | ±0 ns |
lru/get_miss_empty | 140 ns | ±0 ns |
lru/get_miss_populated | 188 ns | ±0 ns |
lru/insert_new | 1.99 µs | ±1.46 µs |
lru/insert_replace | 376 ns | ±4 ns |
name/display/components/4 | 452 ns | ±1 ns |
name/display/components/8 | 866 ns | ±8 ns |
name/eq/eq_match | 39 ns | ±0 ns |
name/eq/eq_miss_first | 2 ns | ±0 ns |
name/eq/eq_miss_last | 38 ns | ±0 ns |
name/has_prefix/prefix_len/1 | 7 ns | ±0 ns |
name/has_prefix/prefix_len/4 | 24 ns | ±1 ns |
name/has_prefix/prefix_len/8 | 35 ns | ±3 ns |
name/hash/components/4 | 86 ns | ±0 ns |
name/hash/components/8 | 163 ns | ±8 ns |
name/parse/components/12 | 679 ns | ±9 ns |
name/parse/components/4 | 236 ns | ±1 ns |
name/parse/components/8 | 468 ns | ±1 ns |
name/tlv_decode/components/12 | 301 ns | ±1 ns |
name/tlv_decode/components/4 | 140 ns | ±0 ns |
name/tlv_decode/components/8 | 210 ns | ±0 ns |
pit/aggregate | 2.32 µs | ±125 ns |
pit/new_entry | 1.23 µs | ±7 ns |
pit_match/hit | 1.61 µs | ±7 ns |
pit_match/miss | 1.95 µs | ±12 ns |
sharded/get_hit/1 | 229 ns | ±0 ns |
sharded/get_hit/16 | 228 ns | ±2 ns |
sharded/get_hit/4 | 233 ns | ±7 ns |
sharded/get_hit/8 | 229 ns | ±3 ns |
sharded/insert/1 | 2.56 µs | ±1.60 µs |
sharded/insert/16 | 1.91 µs | ±1.59 µs |
sharded/insert/4 | 2.58 µs | ±1.73 µs |
sharded/insert/8 | 2.44 µs | ±1.66 µs |
signing/blake3-keyed/sign_sync/100B | 182 ns | ±0 ns |
signing/blake3-keyed/sign_sync/1KB | 1.20 µs | ±0 ns |
signing/blake3-keyed/sign_sync/2KB | 2.41 µs | ±2 ns |
signing/blake3-keyed/sign_sync/4KB | 3.54 µs | ±2 ns |
signing/blake3-keyed/sign_sync/500B | 618 ns | ±1 ns |
signing/blake3-keyed/sign_sync/8KB | 4.80 µs | ±4 ns |
signing/blake3-plain/sign_sync/100B | 199 ns | ±0 ns |
signing/blake3-plain/sign_sync/1KB | 1.21 µs | ±1 ns |
signing/blake3-plain/sign_sync/2KB | 2.41 µs | ±3 ns |
signing/blake3-plain/sign_sync/4KB | 3.53 µs | ±4 ns |
signing/blake3-plain/sign_sync/500B | 633 ns | ±3 ns |
signing/blake3-plain/sign_sync/8KB | 4.80 µs | ±10 ns |
signing/ed25519/sign_sync/100B | 20.73 µs | ±297 ns |
signing/ed25519/sign_sync/1KB | 24.20 µs | ±97 ns |
signing/ed25519/sign_sync/2KB | 28.03 µs | ±144 ns |
signing/ed25519/sign_sync/4KB | 35.16 µs | ±73 ns |
signing/ed25519/sign_sync/500B | 22.26 µs | ±814 ns |
signing/ed25519/sign_sync/8KB | 50.29 µs | ±91 ns |
signing/hmac/sign_sync/100B | 276 ns | ±4 ns |
signing/hmac/sign_sync/1KB | 836 ns | ±1 ns |
signing/hmac/sign_sync/2KB | 1.49 µs | ±3 ns |
signing/hmac/sign_sync/4KB | 2.74 µs | ±2 ns |
signing/hmac/sign_sync/500B | 518 ns | ±0 ns |
signing/hmac/sign_sync/8KB | 5.27 µs | ±3 ns |
signing/sha256-digest/sign_sync/100B | 101 ns | ±0 ns |
signing/sha256-digest/sign_sync/1KB | 664 ns | ±1 ns |
signing/sha256-digest/sign_sync/2KB | 1.30 µs | ±2 ns |
signing/sha256-digest/sign_sync/4KB | 2.54 µs | ±5 ns |
signing/sha256-digest/sign_sync/500B | 341 ns | ±0 ns |
signing/sha256-digest/sign_sync/8KB | 5.07 µs | ±6 ns |
validation/cert_missing | 192 ns | ±0 ns |
validation/schema_mismatch | 146 ns | ±2 ns |
validation/single_hop | 46.71 µs | ±93 ns |
validation_stage/cert_via_anchor | 48.11 µs | ±134 ns |
validation_stage/disabled | 617 ns | ±2 ns |
verification/blake3-keyed/verify/100B | 304 ns | ±0 ns |
verification/blake3-keyed/verify/1KB | 1.32 µs | ±1 ns |
verification/blake3-keyed/verify/2KB | 2.52 µs | ±67 ns |
verification/blake3-keyed/verify/4KB | 3.65 µs | ±13 ns |
verification/blake3-keyed/verify/500B | 740 ns | ±0 ns |
verification/blake3-keyed/verify/8KB | 4.92 µs | ±6 ns |
verification/blake3-plain/verify/100B | 309 ns | ±0 ns |
verification/blake3-plain/verify/1KB | 1.32 µs | ±1 ns |
verification/blake3-plain/verify/2KB | 2.52 µs | ±6 ns |
verification/blake3-plain/verify/4KB | 3.65 µs | ±6 ns |
verification/blake3-plain/verify/500B | 744 ns | ±1 ns |
verification/blake3-plain/verify/8KB | 4.92 µs | ±10 ns |
verification/ed25519-batch/1 | 54.78 µs | ±410 ns |
verification/ed25519-batch/10 | 248.72 µs | ±606 ns |
verification/ed25519-batch/100 | 2.27 ms | ±7.78 µs |
verification/ed25519-batch/1000 | 18.58 ms | ±156.20 µs |
verification/ed25519-per-sig-loop/1 | 42.34 µs | ±141 ns |
verification/ed25519-per-sig-loop/10 | 421.42 µs | ±2.02 µs |
verification/ed25519-per-sig-loop/100 | 4.29 ms | ±6.06 µs |
verification/ed25519-per-sig-loop/1000 | 43.16 ms | ±68.38 µs |
verification/ed25519/verify/100B | 41.75 µs | ±99 ns |
verification/ed25519/verify/1KB | 43.81 µs | ±88 ns |
verification/ed25519/verify/2KB | 45.57 µs | ±77 ns |
verification/ed25519/verify/4KB | 49.28 µs | ±110 ns |
verification/ed25519/verify/500B | 42.93 µs | ±677 ns |
verification/ed25519/verify/8KB | 57.63 µs | ±106 ns |
verification/sha256-digest/verify/100B | 102 ns | ±0 ns |
verification/sha256-digest/verify/1KB | 662 ns | ±0 ns |
verification/sha256-digest/verify/2KB | 1.30 µs | ±0 ns |
verification/sha256-digest/verify/4KB | 2.55 µs | ±1 ns |
verification/sha256-digest/verify/500B | 341 ns | ±0 ns |
verification/sha256-digest/verify/8KB | 5.08 µs | ±105 ns |