Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Pipeline Benchmarks

ndn-rs ships a Criterion-based benchmark suite that measures individual pipeline stage costs and end-to-end forwarding latency. The benchmarks live in crates/spec/ndn-engine/benches/pipeline.rs.

Running Benchmarks

# Run the full suite
cargo bench -p ndn-engine

# Run a specific benchmark group
cargo bench -p ndn-engine -- "cs/"
cargo bench -p ndn-engine -- "fib/lpm"
cargo bench -p ndn-engine -- "interest_pipeline"

# View HTML reports after a run
open target/criterion/report/index.html

Criterion generates HTML reports with statistical analysis, throughput charts, and comparison against previous runs in target/criterion/.

Approximate Relative Cost of Pipeline Stages

%%{init: {'theme': 'default'}}%%
pie title Pipeline Stage Cost Breakdown (approximate)
    "TLV Decode" : 30
    "CS Lookup (miss)" : 10
    "PIT Check" : 15
    "FIB LPM" : 20
    "Strategy" : 10
    "Dispatch" : 15

The chart above shows approximate relative costs for a typical Interest pipeline traversal (CS miss path). TLV decode and FIB longest-prefix match dominate because they involve parsing variable-length names and traversing trie nodes. CS lookup on a miss and strategy execution are comparatively cheap. Actual proportions depend on name length, table sizes, and cache state – run the benchmarks to get precise numbers for your workload.

Benchmark Harness Architecture

graph LR
    subgraph "Setup (per iteration)"
        PB["Pre-built wire packets<br/>(realistic names, ~100 B content)"]
    end

    subgraph "Benchmark Loop (Criterion)"
        PB --> S1["Stage under test<br/>(e.g. TlvDecodeStage)"]
        S1 --> M["Measure:<br/>latency (ns/op)<br/>throughput (ops/sec, bytes/sec)"]
    end

    subgraph "Full Pipeline Benchmarks"
        PB --> FP["All stages in sequence<br/>(decode -> CS -> PIT -> FIB -> strategy -> dispatch)"]
        FP --> M2["End-to-end latency"]
    end

    RT["Tokio current-thread runtime<br/>(no I/O, no scheduling jitter)"] -.->|"runs"| S1
    RT -.->|"runs"| FP

    style PB fill:#e8f4fd,stroke:#2196F3
    style M fill:#c8e6c9,stroke:#4CAF50
    style M2 fill:#c8e6c9,stroke:#4CAF50
    style RT fill:#fff3e0,stroke:#FF9800

What Is Benchmarked

TLV Decode

Groups: decode/interest, decode/data

Measures the cost of TlvDecodeStage – parsing raw wire bytes into a decoded Interest or Data struct and setting ctx.name. Tested with 4-component and 8-component names to show scaling with name length.

Throughput is reported in bytes/sec to make comparisons across packet sizes meaningful.

Content Store Lookup

Group: cs

  • cs/hit: lookup of a name that exists in the CS. Measures the fast path where a cached Data is returned and the Interest pipeline short-circuits (no PIT or strategy involved).
  • cs/miss: lookup of a name not in the CS. Measures the overhead added to every Interest that proceeds past the CS stage.

Uses a 64 MiB LruCs with a pre-populated entry for the hit case.

PIT Check

Group: pit

  • pit/new_entry: inserting a new PIT entry for a never-seen name. Uses a fresh PIT per iteration to isolate insert cost.
  • pit/aggregate: second Interest with a different nonce hitting an existing PIT entry. This is the aggregation path where the Interest is suppressed (returned as Action::Drop).

FIB Longest-Prefix Match

Group: fib/lpm

Measures LPM lookup time with 10, 100, and 1000 routes in the FIB. Routes have 2-component prefixes; the lookup name has 4 components (2 matching + 2 extra). This isolates trie traversal cost from name parsing.

PIT Match (Data Path)

Group: pit_match

  • pit_match/hit: Data arriving that matches an existing PIT entry. Seeds the PIT with a matching Interest, then measures the match and entry extraction.
  • pit_match/miss: Data arriving with no matching PIT entry (unsolicited Data, dropped).

CS Insert

Group: cs_insert

  • cs_insert/insert_replace: steady-state replacement of an existing CS entry (same name, new Data). Measures the cost when the CS is warm.
  • cs_insert/insert_new: inserting a unique name on each iteration. Measures cold-path cost including NameTrie node creation.

Validation Stage

Group: validation_stage

  • validation_stage/disabled: passthrough when no Validator is configured. Measures the baseline overhead of the stage itself.
  • validation_stage/cert_via_anchor: full Ed25519 signature verification using a trust anchor. Includes schema check, key lookup, and cryptographic verify.

Full Interest Pipeline

Groups: interest_pipeline, interest_pipeline/cs_hit

  • interest_pipeline/no_route: decode + CS miss + PIT new entry. Stops before the strategy stage to isolate pure pipeline overhead. Tested with 4 and 8 component names.
  • interest_pipeline/cs_hit: decode + CS hit. Measures the fast path where a cached Data satisfies the Interest immediately.

Full Data Pipeline

Group: data_pipeline

Decode + PIT match + CS insert. Seeds the PIT with a matching Interest, then runs the full Data path. Tested with 4 and 8 component names. Throughput is reported in bytes/sec.

Decode Throughput

Group: decode_throughput

Batch decoding of 1000 Interests in a tight loop. Reports throughput in elements/sec rather than latency, giving a peak-rate estimate for the decode stage.

Benchmark Design Notes

  • All async benchmarks use a current-thread Tokio runtime with no I/O, isolating CPU cost from scheduling jitter.
  • Packet wire bytes are built with realistic name lengths (4 and 8 components) and ~100 B Data content.
  • The PIT is cleared between iterations where noted to ensure consistent starting state.
  • Each benchmark group uses Criterion’s Throughput annotations so reports show both latency and throughput.

Interpreting Results

Criterion reports median latency by default. Look for:

  • Regression alerts: Criterion flags changes >5% from the baseline. CI uses a 10% threshold (see Methodology).
  • Outliers: high outlier percentages suggest contention or GC pauses. The current-thread runtime minimizes this.
  • Throughput numbers: useful for capacity planning. If decode_throughput shows 2M Interest/sec, that is the ceiling before other stages are considered.

The HTML report at target/criterion/report/index.html includes violin plots, PDFs, and regression analysis for each benchmark.

SHA-256 vs BLAKE3 in this bench

signing/sha256-digest uses sha2::Sha256 (rustcrypto), which on both x86_64 and aarch64 ships runtime CPUID dispatch through the cpufeatures crate and uses Intel SHA-NI / ARMv8 SHA crypto when the CPU exposes them. Effectively every modern CI runner and consumer CPU does, so the absolute SHA-256 numbers in this table are SHA-NI numbers — there is no practical “software SHA” baseline left to compare against.

That makes BLAKE3 a comparison between a hardware-accelerated SHA-256 and an AVX2/NEON-vectorised BLAKE3, and it shows: BLAKE3 is not single-thread faster than SHA-256 on these CPUs at the input sizes a typical NDN signed portion has (a few hundred bytes to a few KB). The “BLAKE3 is 3–8× faster than SHA-256” claim refers to BLAKE3 vs plain software SHA-256 — true on chips without SHA extensions, but no longer the common case. See Why BLAKE3 for the actual reasons ndn-rs supports BLAKE3 (Merkle-tree partial verification of segmented Data, multi-thread hashing, single algorithm for hash + MAC + KDF + XOF) — none of which are about raw single- thread throughput.

Latest CI Results

Last updated by CI on 2026-05-13 (ubuntu-latest, stable Rust)

BenchmarkMedian± Variance
cs/hit883 ns±47 ns
cs/miss598 ns±27 ns
cs_insert/insert_new1.48 µs±91 ns
cs_insert/insert_replace823 ns±58 ns
data_pipeline/42.44 µs±157 ns
data_pipeline/82.53 µs±122 ns
decode/data/4945 ns±63 ns
decode/data/81.05 µs±55 ns
decode/interest/41.17 µs±47 ns
decode/interest/81.52 µs±80 ns
decode_throughput/4970.03 µs±42.54 µs
decode_throughput/81.23 ms±71.89 µs
fib/lpm/1043 ns±2 ns
fib/lpm/10095 ns±3 ns
fib/lpm/100093 ns±3 ns
interest_pipeline/cs_hit1.57 µs±59 ns
interest_pipeline/no_route/42.35 µs±105 ns
interest_pipeline/no_route/82.74 µs±89 ns
lru/evict195 ns±7 ns
lru/evict_prefix2.29 µs±2.22 µs
lru/get_can_be_prefix290 ns±5 ns
lru/get_hit206 ns±6 ns
lru/get_miss_empty138 ns±4 ns
lru/get_miss_populated183 ns±9 ns
lru/insert_new2.26 µs±1.60 µs
lru/insert_replace377 ns±15 ns
name/display/components/4458 ns±29 ns
name/display/components/8953 ns±51 ns
name/eq/eq_match29 ns±2 ns
name/eq/eq_miss_first1 ns±0 ns
name/eq/eq_miss_last28 ns±1 ns
name/has_prefix/prefix_len/16 ns±0 ns
name/has_prefix/prefix_len/415 ns±1 ns
name/has_prefix/prefix_len/829 ns±1 ns
name/hash/components/486 ns±5 ns
name/hash/components/8180 ns±11 ns
name/parse/components/12979 ns±94 ns
name/parse/components/4365 ns±26 ns
name/parse/components/8626 ns±37 ns
name/tlv_decode/components/12311 ns±23 ns
name/tlv_decode/components/4151 ns±10 ns
name/tlv_decode/components/8242 ns±17 ns
pit/aggregate2.83 µs±136 ns
pit/new_entry1.82 µs±58 ns
pit_match/hit2.13 µs±68 ns
pit_match/miss1.15 µs±88 ns
sharded/get_hit/1227 ns±6 ns
sharded/get_hit/16274 ns±13 ns
sharded/get_hit/4234 ns±16 ns
sharded/get_hit/8226 ns±5 ns
sharded/insert/12.74 µs±1.30 µs
sharded/insert/162.54 µs±1.97 µs
sharded/insert/43.02 µs±1.34 µs
sharded/insert/83.38 µs±2.24 µs
signing/blake3-keyed/sign_sync/100B213 ns±10 ns
signing/blake3-keyed/sign_sync/1KB1.23 µs±58 ns
signing/blake3-keyed/sign_sync/2KB2.40 µs±63 ns
signing/blake3-keyed/sign_sync/4KB3.79 µs±193 ns
signing/blake3-keyed/sign_sync/500B661 ns±32 ns
signing/blake3-keyed/sign_sync/8KB5.15 µs±415 ns
signing/blake3-plain/sign_sync/100B221 ns±11 ns
signing/blake3-plain/sign_sync/1KB1.34 µs±49 ns
signing/blake3-plain/sign_sync/2KB2.59 µs±112 ns
signing/blake3-plain/sign_sync/4KB4.08 µs±200 ns
signing/blake3-plain/sign_sync/500B716 ns±32 ns
signing/blake3-plain/sign_sync/8KB5.37 µs±231 ns
signing/ed25519/sign_sync/100B25.54 µs±1.65 µs
signing/ed25519/sign_sync/1KB24.75 µs±977 ns
signing/ed25519/sign_sync/2KB28.47 µs±2.21 µs
signing/ed25519/sign_sync/4KB36.99 µs±2.39 µs
signing/ed25519/sign_sync/500B23.35 µs±1.18 µs
signing/ed25519/sign_sync/8KB52.40 µs±2.16 µs
signing/hmac/sign_sync/100B288 ns±15 ns
signing/hmac/sign_sync/1KB863 ns±31 ns
signing/hmac/sign_sync/2KB1.55 µs±73 ns
signing/hmac/sign_sync/4KB2.88 µs±120 ns
signing/hmac/sign_sync/500B528 ns±15 ns
signing/hmac/sign_sync/8KB5.30 µs±310 ns
signing/sha256-digest/sign_sync/100B101 ns±4 ns
signing/sha256-digest/sign_sync/1KB661 ns±20 ns
signing/sha256-digest/sign_sync/2KB1.58 µs±62 ns
signing/sha256-digest/sign_sync/4KB2.90 µs±265 ns
signing/sha256-digest/sign_sync/500B339 ns±7 ns
signing/sha256-digest/sign_sync/8KB5.35 µs±313 ns
spawn_overhead/runtime_trait_boxed47.60 µs±1.60 µs
spawn_overhead/spawn_boxed35.17 µs±2.18 µs
spawn_overhead/spawn_concrete29.56 µs±798 ns
validation_stage/cert_via_anchor47.32 µs±3.37 µs
validation_stage/disabled1.17 µs±87 ns
verification/blake3-keyed/verify/100B350 ns±19 ns
verification/blake3-keyed/verify/1KB1.44 µs±79 ns
verification/blake3-keyed/verify/2KB2.67 µs±137 ns
verification/blake3-keyed/verify/4KB4.22 µs±171 ns
verification/blake3-keyed/verify/500B769 ns±25 ns
verification/blake3-keyed/verify/8KB5.86 µs±305 ns
verification/blake3-plain/verify/100B369 ns±20 ns
verification/blake3-plain/verify/1KB1.40 µs±58 ns
verification/blake3-plain/verify/2KB2.96 µs±135 ns
verification/blake3-plain/verify/4KB3.71 µs±237 ns
verification/blake3-plain/verify/500B857 ns±30 ns
verification/blake3-plain/verify/8KB6.10 µs±257 ns
verification/ed25519/verify/100B45.99 µs±2.63 µs
verification/ed25519/verify/1KB45.10 µs±1.60 µs
verification/ed25519/verify/2KB49.97 µs±2.73 µs
verification/ed25519/verify/4KB53.38 µs±2.73 µs
verification/ed25519/verify/500B49.45 µs±3.48 µs
verification/ed25519/verify/8KB58.02 µs±1.36 µs
verification/sha256-digest/verify/100B101 ns±4 ns
verification/sha256-digest/verify/1KB662 ns±22 ns
verification/sha256-digest/verify/2KB1.32 µs±51 ns
verification/sha256-digest/verify/4KB2.56 µs±153 ns
verification/sha256-digest/verify/500B358 ns±19 ns
verification/sha256-digest/verify/8KB5.08 µs±140 ns