Pipeline Benchmarks
ndn-rs ships a Criterion-based benchmark suite that measures individual pipeline stage costs and end-to-end forwarding latency. The benchmarks live in crates/spec/ndn-engine/benches/pipeline.rs.
Running Benchmarks
# Run the full suite
cargo bench -p ndn-engine
# Run a specific benchmark group
cargo bench -p ndn-engine -- "cs/"
cargo bench -p ndn-engine -- "fib/lpm"
cargo bench -p ndn-engine -- "interest_pipeline"
# View HTML reports after a run
open target/criterion/report/index.html
Criterion generates HTML reports with statistical analysis, throughput charts, and comparison against previous runs in target/criterion/.
Approximate Relative Cost of Pipeline Stages
%%{init: {'theme': 'default'}}%%
pie title Pipeline Stage Cost Breakdown (approximate)
"TLV Decode" : 30
"CS Lookup (miss)" : 10
"PIT Check" : 15
"FIB LPM" : 20
"Strategy" : 10
"Dispatch" : 15
The chart above shows approximate relative costs for a typical Interest pipeline traversal (CS miss path). TLV decode and FIB longest-prefix match dominate because they involve parsing variable-length names and traversing trie nodes. CS lookup on a miss and strategy execution are comparatively cheap. Actual proportions depend on name length, table sizes, and cache state – run the benchmarks to get precise numbers for your workload.
Benchmark Harness Architecture
graph LR
subgraph "Setup (per iteration)"
PB["Pre-built wire packets<br/>(realistic names, ~100 B content)"]
end
subgraph "Benchmark Loop (Criterion)"
PB --> S1["Stage under test<br/>(e.g. TlvDecodeStage)"]
S1 --> M["Measure:<br/>latency (ns/op)<br/>throughput (ops/sec, bytes/sec)"]
end
subgraph "Full Pipeline Benchmarks"
PB --> FP["All stages in sequence<br/>(decode -> CS -> PIT -> FIB -> strategy -> dispatch)"]
FP --> M2["End-to-end latency"]
end
RT["Tokio current-thread runtime<br/>(no I/O, no scheduling jitter)"] -.->|"runs"| S1
RT -.->|"runs"| FP
style PB fill:#e8f4fd,stroke:#2196F3
style M fill:#c8e6c9,stroke:#4CAF50
style M2 fill:#c8e6c9,stroke:#4CAF50
style RT fill:#fff3e0,stroke:#FF9800
What Is Benchmarked
TLV Decode
Groups: decode/interest, decode/data
Measures the cost of TlvDecodeStage – parsing raw wire bytes into a decoded Interest or Data struct and setting ctx.name. Tested with 4-component and 8-component names to show scaling with name length.
Throughput is reported in bytes/sec to make comparisons across packet sizes meaningful.
Content Store Lookup
Group: cs
cs/hit: lookup of a name that exists in the CS. Measures the fast path where a cached Data is returned and the Interest pipeline short-circuits (no PIT or strategy involved).cs/miss: lookup of a name not in the CS. Measures the overhead added to every Interest that proceeds past the CS stage.
Uses a 64 MiB LruCs with a pre-populated entry for the hit case.
PIT Check
Group: pit
pit/new_entry: inserting a new PIT entry for a never-seen name. Uses a fresh PIT per iteration to isolate insert cost.pit/aggregate: second Interest with a different nonce hitting an existing PIT entry. This is the aggregation path where the Interest is suppressed (returned asAction::Drop).
FIB Longest-Prefix Match
Group: fib/lpm
Measures LPM lookup time with 10, 100, and 1000 routes in the FIB. Routes have 2-component prefixes; the lookup name has 4 components (2 matching + 2 extra). This isolates trie traversal cost from name parsing.
PIT Match (Data Path)
Group: pit_match
pit_match/hit: Data arriving that matches an existing PIT entry. Seeds the PIT with a matching Interest, then measures the match and entry extraction.pit_match/miss: Data arriving with no matching PIT entry (unsolicited Data, dropped).
CS Insert
Group: cs_insert
cs_insert/insert_replace: steady-state replacement of an existing CS entry (same name, new Data). Measures the cost when the CS is warm.cs_insert/insert_new: inserting a unique name on each iteration. Measures cold-path cost including NameTrie node creation.
Validation Stage
Group: validation_stage
validation_stage/disabled: passthrough when noValidatoris configured. Measures the baseline overhead of the stage itself.validation_stage/cert_via_anchor: full Ed25519 signature verification using a trust anchor. Includes schema check, key lookup, and cryptographic verify.
Full Interest Pipeline
Groups: interest_pipeline, interest_pipeline/cs_hit
interest_pipeline/no_route: decode + CS miss + PIT new entry. Stops before the strategy stage to isolate pure pipeline overhead. Tested with 4 and 8 component names.interest_pipeline/cs_hit: decode + CS hit. Measures the fast path where a cached Data satisfies the Interest immediately.
Full Data Pipeline
Group: data_pipeline
Decode + PIT match + CS insert. Seeds the PIT with a matching Interest, then runs the full Data path. Tested with 4 and 8 component names. Throughput is reported in bytes/sec.
Decode Throughput
Group: decode_throughput
Batch decoding of 1000 Interests in a tight loop. Reports throughput in elements/sec rather than latency, giving a peak-rate estimate for the decode stage.
Benchmark Design Notes
- All async benchmarks use a current-thread Tokio runtime with no I/O, isolating CPU cost from scheduling jitter.
- Packet wire bytes are built with realistic name lengths (4 and 8 components) and ~100 B Data content.
- The PIT is cleared between iterations where noted to ensure consistent starting state.
- Each benchmark group uses Criterion’s
Throughputannotations so reports show both latency and throughput.
Interpreting Results
Criterion reports median latency by default. Look for:
- Regression alerts: Criterion flags changes >5% from the baseline. CI uses a 10% threshold (see Methodology).
- Outliers: high outlier percentages suggest contention or GC pauses. The current-thread runtime minimizes this.
- Throughput numbers: useful for capacity planning. If
decode_throughputshows 2M Interest/sec, that is the ceiling before other stages are considered.
The HTML report at target/criterion/report/index.html includes violin plots, PDFs, and regression analysis for each benchmark.
SHA-256 vs BLAKE3 in this bench
signing/sha256-digest uses sha2::Sha256 (rustcrypto), which on
both x86_64 and aarch64 ships runtime CPUID dispatch through the
cpufeatures crate and uses Intel
SHA-NI / ARMv8 SHA crypto when the CPU exposes them. Effectively
every modern CI runner and consumer CPU does, so the absolute
SHA-256 numbers in this table are SHA-NI numbers — there is no
practical “software SHA” baseline left to compare against.
That makes BLAKE3 a comparison between a hardware-accelerated SHA-256 and an AVX2/NEON-vectorised BLAKE3, and it shows: BLAKE3 is not single-thread faster than SHA-256 on these CPUs at the input sizes a typical NDN signed portion has (a few hundred bytes to a few KB). The “BLAKE3 is 3–8× faster than SHA-256” claim refers to BLAKE3 vs plain software SHA-256 — true on chips without SHA extensions, but no longer the common case. See Why BLAKE3 for the actual reasons ndn-rs supports BLAKE3 (Merkle-tree partial verification of segmented Data, multi-thread hashing, single algorithm for hash + MAC + KDF + XOF) — none of which are about raw single- thread throughput.
Latest CI Results
Last updated by CI on 2026-05-13 (ubuntu-latest, stable Rust)
| Benchmark | Median | ± Variance |
|---|---|---|
cs/hit | 883 ns | ±47 ns |
cs/miss | 598 ns | ±27 ns |
cs_insert/insert_new | 1.48 µs | ±91 ns |
cs_insert/insert_replace | 823 ns | ±58 ns |
data_pipeline/4 | 2.44 µs | ±157 ns |
data_pipeline/8 | 2.53 µs | ±122 ns |
decode/data/4 | 945 ns | ±63 ns |
decode/data/8 | 1.05 µs | ±55 ns |
decode/interest/4 | 1.17 µs | ±47 ns |
decode/interest/8 | 1.52 µs | ±80 ns |
decode_throughput/4 | 970.03 µs | ±42.54 µs |
decode_throughput/8 | 1.23 ms | ±71.89 µs |
fib/lpm/10 | 43 ns | ±2 ns |
fib/lpm/100 | 95 ns | ±3 ns |
fib/lpm/1000 | 93 ns | ±3 ns |
interest_pipeline/cs_hit | 1.57 µs | ±59 ns |
interest_pipeline/no_route/4 | 2.35 µs | ±105 ns |
interest_pipeline/no_route/8 | 2.74 µs | ±89 ns |
lru/evict | 195 ns | ±7 ns |
lru/evict_prefix | 2.29 µs | ±2.22 µs |
lru/get_can_be_prefix | 290 ns | ±5 ns |
lru/get_hit | 206 ns | ±6 ns |
lru/get_miss_empty | 138 ns | ±4 ns |
lru/get_miss_populated | 183 ns | ±9 ns |
lru/insert_new | 2.26 µs | ±1.60 µs |
lru/insert_replace | 377 ns | ±15 ns |
name/display/components/4 | 458 ns | ±29 ns |
name/display/components/8 | 953 ns | ±51 ns |
name/eq/eq_match | 29 ns | ±2 ns |
name/eq/eq_miss_first | 1 ns | ±0 ns |
name/eq/eq_miss_last | 28 ns | ±1 ns |
name/has_prefix/prefix_len/1 | 6 ns | ±0 ns |
name/has_prefix/prefix_len/4 | 15 ns | ±1 ns |
name/has_prefix/prefix_len/8 | 29 ns | ±1 ns |
name/hash/components/4 | 86 ns | ±5 ns |
name/hash/components/8 | 180 ns | ±11 ns |
name/parse/components/12 | 979 ns | ±94 ns |
name/parse/components/4 | 365 ns | ±26 ns |
name/parse/components/8 | 626 ns | ±37 ns |
name/tlv_decode/components/12 | 311 ns | ±23 ns |
name/tlv_decode/components/4 | 151 ns | ±10 ns |
name/tlv_decode/components/8 | 242 ns | ±17 ns |
pit/aggregate | 2.83 µs | ±136 ns |
pit/new_entry | 1.82 µs | ±58 ns |
pit_match/hit | 2.13 µs | ±68 ns |
pit_match/miss | 1.15 µs | ±88 ns |
sharded/get_hit/1 | 227 ns | ±6 ns |
sharded/get_hit/16 | 274 ns | ±13 ns |
sharded/get_hit/4 | 234 ns | ±16 ns |
sharded/get_hit/8 | 226 ns | ±5 ns |
sharded/insert/1 | 2.74 µs | ±1.30 µs |
sharded/insert/16 | 2.54 µs | ±1.97 µs |
sharded/insert/4 | 3.02 µs | ±1.34 µs |
sharded/insert/8 | 3.38 µs | ±2.24 µs |
signing/blake3-keyed/sign_sync/100B | 213 ns | ±10 ns |
signing/blake3-keyed/sign_sync/1KB | 1.23 µs | ±58 ns |
signing/blake3-keyed/sign_sync/2KB | 2.40 µs | ±63 ns |
signing/blake3-keyed/sign_sync/4KB | 3.79 µs | ±193 ns |
signing/blake3-keyed/sign_sync/500B | 661 ns | ±32 ns |
signing/blake3-keyed/sign_sync/8KB | 5.15 µs | ±415 ns |
signing/blake3-plain/sign_sync/100B | 221 ns | ±11 ns |
signing/blake3-plain/sign_sync/1KB | 1.34 µs | ±49 ns |
signing/blake3-plain/sign_sync/2KB | 2.59 µs | ±112 ns |
signing/blake3-plain/sign_sync/4KB | 4.08 µs | ±200 ns |
signing/blake3-plain/sign_sync/500B | 716 ns | ±32 ns |
signing/blake3-plain/sign_sync/8KB | 5.37 µs | ±231 ns |
signing/ed25519/sign_sync/100B | 25.54 µs | ±1.65 µs |
signing/ed25519/sign_sync/1KB | 24.75 µs | ±977 ns |
signing/ed25519/sign_sync/2KB | 28.47 µs | ±2.21 µs |
signing/ed25519/sign_sync/4KB | 36.99 µs | ±2.39 µs |
signing/ed25519/sign_sync/500B | 23.35 µs | ±1.18 µs |
signing/ed25519/sign_sync/8KB | 52.40 µs | ±2.16 µs |
signing/hmac/sign_sync/100B | 288 ns | ±15 ns |
signing/hmac/sign_sync/1KB | 863 ns | ±31 ns |
signing/hmac/sign_sync/2KB | 1.55 µs | ±73 ns |
signing/hmac/sign_sync/4KB | 2.88 µs | ±120 ns |
signing/hmac/sign_sync/500B | 528 ns | ±15 ns |
signing/hmac/sign_sync/8KB | 5.30 µs | ±310 ns |
signing/sha256-digest/sign_sync/100B | 101 ns | ±4 ns |
signing/sha256-digest/sign_sync/1KB | 661 ns | ±20 ns |
signing/sha256-digest/sign_sync/2KB | 1.58 µs | ±62 ns |
signing/sha256-digest/sign_sync/4KB | 2.90 µs | ±265 ns |
signing/sha256-digest/sign_sync/500B | 339 ns | ±7 ns |
signing/sha256-digest/sign_sync/8KB | 5.35 µs | ±313 ns |
spawn_overhead/runtime_trait_boxed | 47.60 µs | ±1.60 µs |
spawn_overhead/spawn_boxed | 35.17 µs | ±2.18 µs |
spawn_overhead/spawn_concrete | 29.56 µs | ±798 ns |
validation_stage/cert_via_anchor | 47.32 µs | ±3.37 µs |
validation_stage/disabled | 1.17 µs | ±87 ns |
verification/blake3-keyed/verify/100B | 350 ns | ±19 ns |
verification/blake3-keyed/verify/1KB | 1.44 µs | ±79 ns |
verification/blake3-keyed/verify/2KB | 2.67 µs | ±137 ns |
verification/blake3-keyed/verify/4KB | 4.22 µs | ±171 ns |
verification/blake3-keyed/verify/500B | 769 ns | ±25 ns |
verification/blake3-keyed/verify/8KB | 5.86 µs | ±305 ns |
verification/blake3-plain/verify/100B | 369 ns | ±20 ns |
verification/blake3-plain/verify/1KB | 1.40 µs | ±58 ns |
verification/blake3-plain/verify/2KB | 2.96 µs | ±135 ns |
verification/blake3-plain/verify/4KB | 3.71 µs | ±237 ns |
verification/blake3-plain/verify/500B | 857 ns | ±30 ns |
verification/blake3-plain/verify/8KB | 6.10 µs | ±257 ns |
verification/ed25519/verify/100B | 45.99 µs | ±2.63 µs |
verification/ed25519/verify/1KB | 45.10 µs | ±1.60 µs |
verification/ed25519/verify/2KB | 49.97 µs | ±2.73 µs |
verification/ed25519/verify/4KB | 53.38 µs | ±2.73 µs |
verification/ed25519/verify/500B | 49.45 µs | ±3.48 µs |
verification/ed25519/verify/8KB | 58.02 µs | ±1.36 µs |
verification/sha256-digest/verify/100B | 101 ns | ±4 ns |
verification/sha256-digest/verify/1KB | 662 ns | ±22 ns |
verification/sha256-digest/verify/2KB | 1.32 µs | ±51 ns |
verification/sha256-digest/verify/4KB | 2.56 µs | ±153 ns |
verification/sha256-digest/verify/500B | 358 ns | ±19 ns |
verification/sha256-digest/verify/8KB | 5.08 µs | ±140 ns |