Pipeline Benchmarks

ndn-rs ships a Criterion-based benchmark suite that measures individual pipeline stage costs and end-to-end forwarding latency. The benchmarks live in crates/spec/ndn-engine/benches/pipeline.rs.

Running Benchmarks

# Run the full suite
cargo bench -p ndn-engine

# Run a specific benchmark group
cargo bench -p ndn-engine -- "cs/"
cargo bench -p ndn-engine -- "fib/lpm"
cargo bench -p ndn-engine -- "interest_pipeline"

# View HTML reports after a run
open target/criterion/report/index.html

Criterion generates HTML reports with statistical analysis, throughput charts, and comparison against previous runs in target/criterion/.

Approximate Relative Cost of Pipeline Stages

%%{init: {'theme': 'default'}}%%
pie title Pipeline Stage Cost Breakdown (approximate)
    "TLV Decode" : 30
    "CS Lookup (miss)" : 10
    "PIT Check" : 15
    "FIB LPM" : 20
    "Strategy" : 10
    "Dispatch" : 15

The chart above shows approximate relative costs for a typical Interest pipeline traversal (CS miss path). TLV decode and FIB longest-prefix match dominate because they involve parsing variable-length names and traversing trie nodes. CS lookup on a miss and strategy execution are comparatively cheap. Actual proportions depend on name length, table sizes, and cache state – run the benchmarks to get precise numbers for your workload.

Benchmark Harness Architecture

graph LR
    subgraph "Setup (per iteration)"
        PB["Pre-built wire packets<br/>(realistic names, ~100 B content)"]
    end

    subgraph "Benchmark Loop (Criterion)"
        PB --> S1["Stage under test<br/>(e.g. TlvDecodeStage)"]
        S1 --> M["Measure:<br/>latency (ns/op)<br/>throughput (ops/sec, bytes/sec)"]
    end

    subgraph "Full Pipeline Benchmarks"
        PB --> FP["All stages in sequence<br/>(decode -> CS -> PIT -> FIB -> strategy -> dispatch)"]
        FP --> M2["End-to-end latency"]
    end

    RT["Tokio current-thread runtime<br/>(no I/O, no scheduling jitter)"] -.->|"runs"| S1
    RT -.->|"runs"| FP

    style PB fill:#e8f4fd,stroke:#2196F3
    style M fill:#c8e6c9,stroke:#4CAF50
    style M2 fill:#c8e6c9,stroke:#4CAF50
    style RT fill:#fff3e0,stroke:#FF9800

What Is Benchmarked

TLV Decode

Groups: decode/interest, decode/data

Measures the cost of TlvDecodeStage – parsing raw wire bytes into a decoded Interest or Data struct and setting ctx.name. Tested with 4-component and 8-component names to show scaling with name length.

Throughput is reported in bytes/sec to make comparisons across packet sizes meaningful.

Content Store Lookup

Group: cs

cs/hit: lookup of a name that exists in the CS. Measures the fast path where a cached Data is returned and the Interest pipeline short-circuits (no PIT or strategy involved).
cs/miss: lookup of a name not in the CS. Measures the overhead added to every Interest that proceeds past the CS stage.

Uses a 64 MiB LruCs with a pre-populated entry for the hit case.

PIT Check

Group: pit

pit/new_entry: inserting a new PIT entry for a never-seen name. Uses a fresh PIT per iteration to isolate insert cost.
pit/aggregate: second Interest with a different nonce hitting an existing PIT entry. This is the aggregation path where the Interest is suppressed (returned as Action::Drop).

FIB Longest-Prefix Match

Group: fib/lpm

Measures LPM lookup time with 10, 100, and 1000 routes in the FIB. Routes have 2-component prefixes; the lookup name has 4 components (2 matching + 2 extra). This isolates trie traversal cost from name parsing.

PIT Match (Data Path)

Group: pit_match

pit_match/hit: Data arriving that matches an existing PIT entry. Seeds the PIT with a matching Interest, then measures the match and entry extraction.
pit_match/miss: Data arriving with no matching PIT entry (unsolicited Data, dropped).

CS Insert

Group: cs_insert

cs_insert/insert_replace: steady-state replacement of an existing CS entry (same name, new Data). Measures the cost when the CS is warm.
cs_insert/insert_new: inserting a unique name on each iteration. Measures cold-path cost including NameTrie node creation.

Validation Stage

Group: validation_stage

validation_stage/disabled: passthrough when no Validator is configured. Measures the baseline overhead of the stage itself.
validation_stage/cert_via_anchor: full Ed25519 signature verification using a trust anchor. Includes schema check, key lookup, and cryptographic verify.

Full Interest Pipeline

Groups: interest_pipeline, interest_pipeline/cs_hit

interest_pipeline/no_route: decode + CS miss + PIT new entry. Stops before the strategy stage to isolate pure pipeline overhead. Tested with 4 and 8 component names.
interest_pipeline/cs_hit: decode + CS hit. Measures the fast path where a cached Data satisfies the Interest immediately.

Full Data Pipeline

Group: data_pipeline

Decode + PIT match + CS insert. Seeds the PIT with a matching Interest, then runs the full Data path. Tested with 4 and 8 component names. Throughput is reported in bytes/sec.

Decode Throughput

Group: decode_throughput

Batch decoding of 1000 Interests in a tight loop. Reports throughput in elements/sec rather than latency, giving a peak-rate estimate for the decode stage.

Benchmark Design Notes

All async benchmarks use a current-thread Tokio runtime with no I/O, isolating CPU cost from scheduling jitter.
Packet wire bytes are built with realistic name lengths (4 and 8 components) and ~100 B Data content.
The PIT is cleared between iterations where noted to ensure consistent starting state.
Each benchmark group uses Criterion’s Throughput annotations so reports show both latency and throughput.

Interpreting Results

Criterion reports median latency by default. Look for:

Regression alerts: Criterion flags changes >5% from the baseline. CI uses a 10% threshold (see Methodology).
Outliers: high outlier percentages suggest contention or GC pauses. The current-thread runtime minimizes this.
Throughput numbers: useful for capacity planning. If decode_throughput shows 2M Interest/sec, that is the ceiling before other stages are considered.

The HTML report at target/criterion/report/index.html includes violin plots, PDFs, and regression analysis for each benchmark.

SHA-256 vs BLAKE3 in this bench

signing/sha256-digest uses sha2::Sha256 (rustcrypto), which on both x86_64 and aarch64 ships runtime CPUID dispatch through the cpufeatures crate and uses Intel SHA-NI / ARMv8 SHA crypto when the CPU exposes them. Effectively every modern CI runner and consumer CPU does, so the absolute SHA-256 numbers in this table are SHA-NI numbers — there is no practical “software SHA” baseline left to compare against.

That makes BLAKE3 a comparison between a hardware-accelerated SHA-256 and an AVX2/NEON-vectorised BLAKE3, and it shows: BLAKE3 is not single-thread faster than SHA-256 on these CPUs at the input sizes a typical NDN signed portion has (a few hundred bytes to a few KB). The “BLAKE3 is 3–8× faster than SHA-256” claim refers to BLAKE3 vs plain software SHA-256 — true on chips without SHA extensions, but no longer the common case. See Why BLAKE3 for the actual reasons ndn-rs supports BLAKE3 (Merkle-tree partial verification of segmented Data, multi-thread hashing, single algorithm for hash + MAC + KDF + XOF) — none of which are about raw single- thread throughput.

Latest CI Results

Last updated by CI on 2026-05-13 (ubuntu-latest, stable Rust)

Benchmark	Median	± Variance
`cs/hit`	883 ns	±47 ns
`cs/miss`	598 ns	±27 ns

`cs_insert/insert_new`	1.48 µs	±91 ns
`cs_insert/insert_replace`	823 ns	±58 ns

`data_pipeline/4`	2.44 µs	±157 ns
`data_pipeline/8`	2.53 µs	±122 ns

`decode/data/4`	945 ns	±63 ns
`decode/data/8`	1.05 µs	±55 ns
`decode/interest/4`	1.17 µs	±47 ns
`decode/interest/8`	1.52 µs	±80 ns

`decode_throughput/4`	970.03 µs	±42.54 µs
`decode_throughput/8`	1.23 ms	±71.89 µs

`fib/lpm/10`	43 ns	±2 ns
`fib/lpm/100`	95 ns	±3 ns
`fib/lpm/1000`	93 ns	±3 ns

`interest_pipeline/cs_hit`	1.57 µs	±59 ns
`interest_pipeline/no_route/4`	2.35 µs	±105 ns
`interest_pipeline/no_route/8`	2.74 µs	±89 ns

`lru/evict`	195 ns	±7 ns
`lru/evict_prefix`	2.29 µs	±2.22 µs
`lru/get_can_be_prefix`	290 ns	±5 ns
`lru/get_hit`	206 ns	±6 ns
`lru/get_miss_empty`	138 ns	±4 ns
`lru/get_miss_populated`	183 ns	±9 ns
`lru/insert_new`	2.26 µs	±1.60 µs
`lru/insert_replace`	377 ns	±15 ns

`name/display/components/4`	458 ns	±29 ns
`name/display/components/8`	953 ns	±51 ns
`name/eq/eq_match`	29 ns	±2 ns
`name/eq/eq_miss_first`	1 ns	±0 ns
`name/eq/eq_miss_last`	28 ns	±1 ns
`name/has_prefix/prefix_len/1`	6 ns	±0 ns
`name/has_prefix/prefix_len/4`	15 ns	±1 ns
`name/has_prefix/prefix_len/8`	29 ns	±1 ns
`name/hash/components/4`	86 ns	±5 ns
`name/hash/components/8`	180 ns	±11 ns
`name/parse/components/12`	979 ns	±94 ns
`name/parse/components/4`	365 ns	±26 ns
`name/parse/components/8`	626 ns	±37 ns
`name/tlv_decode/components/12`	311 ns	±23 ns
`name/tlv_decode/components/4`	151 ns	±10 ns
`name/tlv_decode/components/8`	242 ns	±17 ns

`pit/aggregate`	2.83 µs	±136 ns
`pit/new_entry`	1.82 µs	±58 ns

`pit_match/hit`	2.13 µs	±68 ns
`pit_match/miss`	1.15 µs	±88 ns

`sharded/get_hit/1`	227 ns	±6 ns
`sharded/get_hit/16`	274 ns	±13 ns
`sharded/get_hit/4`	234 ns	±16 ns
`sharded/get_hit/8`	226 ns	±5 ns
`sharded/insert/1`	2.74 µs	±1.30 µs
`sharded/insert/16`	2.54 µs	±1.97 µs
`sharded/insert/4`	3.02 µs	±1.34 µs
`sharded/insert/8`	3.38 µs	±2.24 µs

`signing/blake3-keyed/sign_sync/100B`	213 ns	±10 ns
`signing/blake3-keyed/sign_sync/1KB`	1.23 µs	±58 ns
`signing/blake3-keyed/sign_sync/2KB`	2.40 µs	±63 ns
`signing/blake3-keyed/sign_sync/4KB`	3.79 µs	±193 ns
`signing/blake3-keyed/sign_sync/500B`	661 ns	±32 ns
`signing/blake3-keyed/sign_sync/8KB`	5.15 µs	±415 ns
`signing/blake3-plain/sign_sync/100B`	221 ns	±11 ns
`signing/blake3-plain/sign_sync/1KB`	1.34 µs	±49 ns
`signing/blake3-plain/sign_sync/2KB`	2.59 µs	±112 ns
`signing/blake3-plain/sign_sync/4KB`	4.08 µs	±200 ns
`signing/blake3-plain/sign_sync/500B`	716 ns	±32 ns
`signing/blake3-plain/sign_sync/8KB`	5.37 µs	±231 ns
`signing/ed25519/sign_sync/100B`	25.54 µs	±1.65 µs
`signing/ed25519/sign_sync/1KB`	24.75 µs	±977 ns
`signing/ed25519/sign_sync/2KB`	28.47 µs	±2.21 µs
`signing/ed25519/sign_sync/4KB`	36.99 µs	±2.39 µs
`signing/ed25519/sign_sync/500B`	23.35 µs	±1.18 µs
`signing/ed25519/sign_sync/8KB`	52.40 µs	±2.16 µs
`signing/hmac/sign_sync/100B`	288 ns	±15 ns
`signing/hmac/sign_sync/1KB`	863 ns	±31 ns
`signing/hmac/sign_sync/2KB`	1.55 µs	±73 ns
`signing/hmac/sign_sync/4KB`	2.88 µs	±120 ns
`signing/hmac/sign_sync/500B`	528 ns	±15 ns
`signing/hmac/sign_sync/8KB`	5.30 µs	±310 ns
`signing/sha256-digest/sign_sync/100B`	101 ns	±4 ns
`signing/sha256-digest/sign_sync/1KB`	661 ns	±20 ns
`signing/sha256-digest/sign_sync/2KB`	1.58 µs	±62 ns
`signing/sha256-digest/sign_sync/4KB`	2.90 µs	±265 ns
`signing/sha256-digest/sign_sync/500B`	339 ns	±7 ns
`signing/sha256-digest/sign_sync/8KB`	5.35 µs	±313 ns

`spawn_overhead/runtime_trait_boxed`	47.60 µs	±1.60 µs
`spawn_overhead/spawn_boxed`	35.17 µs	±2.18 µs
`spawn_overhead/spawn_concrete`	29.56 µs	±798 ns

`validation_stage/cert_via_anchor`	47.32 µs	±3.37 µs
`validation_stage/disabled`	1.17 µs	±87 ns

`verification/blake3-keyed/verify/100B`	350 ns	±19 ns
`verification/blake3-keyed/verify/1KB`	1.44 µs	±79 ns
`verification/blake3-keyed/verify/2KB`	2.67 µs	±137 ns
`verification/blake3-keyed/verify/4KB`	4.22 µs	±171 ns
`verification/blake3-keyed/verify/500B`	769 ns	±25 ns
`verification/blake3-keyed/verify/8KB`	5.86 µs	±305 ns
`verification/blake3-plain/verify/100B`	369 ns	±20 ns
`verification/blake3-plain/verify/1KB`	1.40 µs	±58 ns
`verification/blake3-plain/verify/2KB`	2.96 µs	±135 ns
`verification/blake3-plain/verify/4KB`	3.71 µs	±237 ns
`verification/blake3-plain/verify/500B`	857 ns	±30 ns
`verification/blake3-plain/verify/8KB`	6.10 µs	±257 ns
`verification/ed25519/verify/100B`	45.99 µs	±2.63 µs
`verification/ed25519/verify/1KB`	45.10 µs	±1.60 µs
`verification/ed25519/verify/2KB`	49.97 µs	±2.73 µs
`verification/ed25519/verify/4KB`	53.38 µs	±2.73 µs
`verification/ed25519/verify/500B`	49.45 µs	±3.48 µs
`verification/ed25519/verify/8KB`	58.02 µs	±1.36 µs
`verification/sha256-digest/verify/100B`	101 ns	±4 ns
`verification/sha256-digest/verify/1KB`	662 ns	±22 ns
`verification/sha256-digest/verify/2KB`	1.32 µs	±51 ns
`verification/sha256-digest/verify/4KB`	2.56 µs	±153 ns
`verification/sha256-digest/verify/500B`	358 ns	±19 ns
`verification/sha256-digest/verify/8KB`	5.08 µs	±140 ns

Keyboard shortcuts

ndn-rs Wiki