Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Pipeline Benchmarks

ndn-rs ships a Criterion-based benchmark suite that measures individual pipeline stage costs and end-to-end forwarding latency. The benchmarks live in crates/engine/ndn-engine/benches/pipeline.rs.

Running Benchmarks

# Run the full suite
cargo bench -p ndn-engine

# Run a specific benchmark group
cargo bench -p ndn-engine -- "cs/"
cargo bench -p ndn-engine -- "fib/lpm"
cargo bench -p ndn-engine -- "interest_pipeline"

# View HTML reports after a run
open target/criterion/report/index.html

Criterion generates HTML reports with statistical analysis, throughput charts, and comparison against previous runs in target/criterion/.

Approximate Relative Cost of Pipeline Stages

%%{init: {'theme': 'default'}}%%
pie title Pipeline Stage Cost Breakdown (approximate)
    "TLV Decode" : 30
    "CS Lookup (miss)" : 10
    "PIT Check" : 15
    "FIB LPM" : 20
    "Strategy" : 10
    "Dispatch" : 15

The chart above shows approximate relative costs for a typical Interest pipeline traversal (CS miss path). TLV decode and FIB longest-prefix match dominate because they involve parsing variable-length names and traversing trie nodes. CS lookup on a miss and strategy execution are comparatively cheap. Actual proportions depend on name length, table sizes, and cache state – run the benchmarks to get precise numbers for your workload.

Benchmark Harness Architecture

graph LR
    subgraph "Setup (per iteration)"
        PB["Pre-built wire packets<br/>(realistic names, ~100 B content)"]
    end

    subgraph "Benchmark Loop (Criterion)"
        PB --> S1["Stage under test<br/>(e.g. TlvDecodeStage)"]
        S1 --> M["Measure:<br/>latency (ns/op)<br/>throughput (ops/sec, bytes/sec)"]
    end

    subgraph "Full Pipeline Benchmarks"
        PB --> FP["All stages in sequence<br/>(decode -> CS -> PIT -> FIB -> strategy -> dispatch)"]
        FP --> M2["End-to-end latency"]
    end

    RT["Tokio current-thread runtime<br/>(no I/O, no scheduling jitter)"] -.->|"runs"| S1
    RT -.->|"runs"| FP

    style PB fill:#e8f4fd,stroke:#2196F3
    style M fill:#c8e6c9,stroke:#4CAF50
    style M2 fill:#c8e6c9,stroke:#4CAF50
    style RT fill:#fff3e0,stroke:#FF9800

What Is Benchmarked

TLV Decode

Groups: decode/interest, decode/data

Measures the cost of TlvDecodeStage – parsing raw wire bytes into a decoded Interest or Data struct and setting ctx.name. Tested with 4-component and 8-component names to show scaling with name length.

Throughput is reported in bytes/sec to make comparisons across packet sizes meaningful.

Content Store Lookup

Group: cs

  • cs/hit: lookup of a name that exists in the CS. Measures the fast path where a cached Data is returned and the Interest pipeline short-circuits (no PIT or strategy involved).
  • cs/miss: lookup of a name not in the CS. Measures the overhead added to every Interest that proceeds past the CS stage.

Uses a 64 MiB LruCs with a pre-populated entry for the hit case.

PIT Check

Group: pit

  • pit/new_entry: inserting a new PIT entry for a never-seen name. Uses a fresh PIT per iteration to isolate insert cost.
  • pit/aggregate: second Interest with a different nonce hitting an existing PIT entry. This is the aggregation path where the Interest is suppressed (returned as Action::Drop).

FIB Longest-Prefix Match

Group: fib/lpm

Measures LPM lookup time with 10, 100, and 1000 routes in the FIB. Routes have 2-component prefixes; the lookup name has 4 components (2 matching + 2 extra). This isolates trie traversal cost from name parsing.

PIT Match (Data Path)

Group: pit_match

  • pit_match/hit: Data arriving that matches an existing PIT entry. Seeds the PIT with a matching Interest, then measures the match and entry extraction.
  • pit_match/miss: Data arriving with no matching PIT entry (unsolicited Data, dropped).

CS Insert

Group: cs_insert

  • cs_insert/insert_replace: steady-state replacement of an existing CS entry (same name, new Data). Measures the cost when the CS is warm.
  • cs_insert/insert_new: inserting a unique name on each iteration. Measures cold-path cost including NameTrie node creation.

Validation Stage

Group: validation_stage

  • validation_stage/disabled: passthrough when no Validator is configured. Measures the baseline overhead of the stage itself.
  • validation_stage/cert_via_anchor: full Ed25519 signature verification using a trust anchor. Includes schema check, key lookup, and cryptographic verify.

Full Interest Pipeline

Groups: interest_pipeline, interest_pipeline/cs_hit

  • interest_pipeline/no_route: decode + CS miss + PIT new entry. Stops before the strategy stage to isolate pure pipeline overhead. Tested with 4 and 8 component names.
  • interest_pipeline/cs_hit: decode + CS hit. Measures the fast path where a cached Data satisfies the Interest immediately.

Full Data Pipeline

Group: data_pipeline

Decode + PIT match + CS insert. Seeds the PIT with a matching Interest, then runs the full Data path. Tested with 4 and 8 component names. Throughput is reported in bytes/sec.

Decode Throughput

Group: decode_throughput

Batch decoding of 1000 Interests in a tight loop. Reports throughput in elements/sec rather than latency, giving a peak-rate estimate for the decode stage.

Benchmark Design Notes

  • All async benchmarks use a current-thread Tokio runtime with no I/O, isolating CPU cost from scheduling jitter.
  • Packet wire bytes are built with realistic name lengths (4 and 8 components) and ~100 B Data content.
  • The PIT is cleared between iterations where noted to ensure consistent starting state.
  • Each benchmark group uses Criterion’s Throughput annotations so reports show both latency and throughput.

Interpreting Results

Criterion reports median latency by default. Look for:

  • Regression alerts: Criterion flags changes >5% from the baseline. CI uses a 10% threshold (see Methodology).
  • Outliers: high outlier percentages suggest contention or GC pauses. The current-thread runtime minimizes this.
  • Throughput numbers: useful for capacity planning. If decode_throughput shows 2M Interest/sec, that is the ceiling before other stages are considered.

The HTML report at target/criterion/report/index.html includes violin plots, PDFs, and regression analysis for each benchmark.

SHA-256 vs BLAKE3 in this bench

signing/sha256-digest uses sha2::Sha256 (rustcrypto), which on both x86_64 and aarch64 ships runtime CPUID dispatch through the cpufeatures crate and uses Intel SHA-NI / ARMv8 SHA crypto when the CPU exposes them. Effectively every modern CI runner and consumer CPU does, so the absolute SHA-256 numbers in this table are SHA-NI numbers — there is no practical “software SHA” baseline left to compare against.

That makes BLAKE3 a comparison between a hardware-accelerated SHA-256 and an AVX2/NEON-vectorised BLAKE3, and it shows: BLAKE3 is not single-thread faster than SHA-256 on these CPUs at the input sizes a typical NDN signed portion has (a few hundred bytes to a few KB). The “BLAKE3 is 3–8× faster than SHA-256” claim refers to BLAKE3 vs plain software SHA-256 — true on chips without SHA extensions, but no longer the common case. See Why BLAKE3 for the actual reasons ndn-rs supports BLAKE3 (Merkle-tree partial verification of segmented Data, multi-thread hashing, single algorithm for hash + MAC + KDF + XOF) — none of which are about raw single- thread throughput.

Latest CI Results

Last updated by CI on 2026-04-15 (ubuntu-latest, stable Rust)

BenchmarkMedian± Variance
cs/hit762 ns±34 ns
cs/miss524 ns±2 ns
cs_insert/insert_new10.21 µs±18.18 µs
cs_insert/insert_replace943 ns±14 ns
data_pipeline/41.88 µs±66 ns
data_pipeline/82.27 µs±38 ns
decode/data/4394 ns±26 ns
decode/data/8464 ns±0 ns
decode/interest/4481 ns±0 ns
decode/interest/8556 ns±2 ns
decode_throughput/4442.84 µs±39.54 µs
decode_throughput/8525.64 µs±7.39 µs
fib/lpm/1035 ns±0 ns
fib/lpm/10096 ns±0 ns
fib/lpm/100096 ns±0 ns
interest_pipeline/cs_hit921 ns±1 ns
interest_pipeline/no_route/41.40 µs±33 ns
interest_pipeline/no_route/81.55 µs±20 ns
large/blake3-rayon/hash/1MB122.33 µs±2.48 µs
large/blake3-rayon/hash/256KB40.89 µs±1.36 µs
large/blake3-rayon/hash/4MB439.02 µs±2.45 µs
large/blake3-single/hash/1MB252.69 µs±923 ns
large/blake3-single/hash/256KB61.68 µs±321 ns
large/blake3-single/hash/4MB999.28 µs±3.07 µs
large/sha256/hash/1MB659.90 µs±893 ns
large/sha256/hash/256KB164.78 µs±243 ns
large/sha256/hash/4MB2.64 ms±1.82 µs
lru/evict189 ns±3 ns
lru/evict_prefix2.00 µs±2.06 µs
lru/get_can_be_prefix297 ns±0 ns
lru/get_hit213 ns±0 ns
lru/get_miss_empty140 ns±0 ns
lru/get_miss_populated188 ns±0 ns
lru/insert_new1.99 µs±1.46 µs
lru/insert_replace376 ns±4 ns
name/display/components/4452 ns±1 ns
name/display/components/8866 ns±8 ns
name/eq/eq_match39 ns±0 ns
name/eq/eq_miss_first2 ns±0 ns
name/eq/eq_miss_last38 ns±0 ns
name/has_prefix/prefix_len/17 ns±0 ns
name/has_prefix/prefix_len/424 ns±1 ns
name/has_prefix/prefix_len/835 ns±3 ns
name/hash/components/486 ns±0 ns
name/hash/components/8163 ns±8 ns
name/parse/components/12679 ns±9 ns
name/parse/components/4236 ns±1 ns
name/parse/components/8468 ns±1 ns
name/tlv_decode/components/12301 ns±1 ns
name/tlv_decode/components/4140 ns±0 ns
name/tlv_decode/components/8210 ns±0 ns
pit/aggregate2.32 µs±125 ns
pit/new_entry1.23 µs±7 ns
pit_match/hit1.61 µs±7 ns
pit_match/miss1.95 µs±12 ns
sharded/get_hit/1229 ns±0 ns
sharded/get_hit/16228 ns±2 ns
sharded/get_hit/4233 ns±7 ns
sharded/get_hit/8229 ns±3 ns
sharded/insert/12.56 µs±1.60 µs
sharded/insert/161.91 µs±1.59 µs
sharded/insert/42.58 µs±1.73 µs
sharded/insert/82.44 µs±1.66 µs
signing/blake3-keyed/sign_sync/100B182 ns±0 ns
signing/blake3-keyed/sign_sync/1KB1.20 µs±0 ns
signing/blake3-keyed/sign_sync/2KB2.41 µs±2 ns
signing/blake3-keyed/sign_sync/4KB3.54 µs±2 ns
signing/blake3-keyed/sign_sync/500B618 ns±1 ns
signing/blake3-keyed/sign_sync/8KB4.80 µs±4 ns
signing/blake3-plain/sign_sync/100B199 ns±0 ns
signing/blake3-plain/sign_sync/1KB1.21 µs±1 ns
signing/blake3-plain/sign_sync/2KB2.41 µs±3 ns
signing/blake3-plain/sign_sync/4KB3.53 µs±4 ns
signing/blake3-plain/sign_sync/500B633 ns±3 ns
signing/blake3-plain/sign_sync/8KB4.80 µs±10 ns
signing/ed25519/sign_sync/100B20.73 µs±297 ns
signing/ed25519/sign_sync/1KB24.20 µs±97 ns
signing/ed25519/sign_sync/2KB28.03 µs±144 ns
signing/ed25519/sign_sync/4KB35.16 µs±73 ns
signing/ed25519/sign_sync/500B22.26 µs±814 ns
signing/ed25519/sign_sync/8KB50.29 µs±91 ns
signing/hmac/sign_sync/100B276 ns±4 ns
signing/hmac/sign_sync/1KB836 ns±1 ns
signing/hmac/sign_sync/2KB1.49 µs±3 ns
signing/hmac/sign_sync/4KB2.74 µs±2 ns
signing/hmac/sign_sync/500B518 ns±0 ns
signing/hmac/sign_sync/8KB5.27 µs±3 ns
signing/sha256-digest/sign_sync/100B101 ns±0 ns
signing/sha256-digest/sign_sync/1KB664 ns±1 ns
signing/sha256-digest/sign_sync/2KB1.30 µs±2 ns
signing/sha256-digest/sign_sync/4KB2.54 µs±5 ns
signing/sha256-digest/sign_sync/500B341 ns±0 ns
signing/sha256-digest/sign_sync/8KB5.07 µs±6 ns
validation/cert_missing192 ns±0 ns
validation/schema_mismatch146 ns±2 ns
validation/single_hop46.71 µs±93 ns
validation_stage/cert_via_anchor48.11 µs±134 ns
validation_stage/disabled617 ns±2 ns
verification/blake3-keyed/verify/100B304 ns±0 ns
verification/blake3-keyed/verify/1KB1.32 µs±1 ns
verification/blake3-keyed/verify/2KB2.52 µs±67 ns
verification/blake3-keyed/verify/4KB3.65 µs±13 ns
verification/blake3-keyed/verify/500B740 ns±0 ns
verification/blake3-keyed/verify/8KB4.92 µs±6 ns
verification/blake3-plain/verify/100B309 ns±0 ns
verification/blake3-plain/verify/1KB1.32 µs±1 ns
verification/blake3-plain/verify/2KB2.52 µs±6 ns
verification/blake3-plain/verify/4KB3.65 µs±6 ns
verification/blake3-plain/verify/500B744 ns±1 ns
verification/blake3-plain/verify/8KB4.92 µs±10 ns
verification/ed25519-batch/154.78 µs±410 ns
verification/ed25519-batch/10248.72 µs±606 ns
verification/ed25519-batch/1002.27 ms±7.78 µs
verification/ed25519-batch/100018.58 ms±156.20 µs
verification/ed25519-per-sig-loop/142.34 µs±141 ns
verification/ed25519-per-sig-loop/10421.42 µs±2.02 µs
verification/ed25519-per-sig-loop/1004.29 ms±6.06 µs
verification/ed25519-per-sig-loop/100043.16 ms±68.38 µs
verification/ed25519/verify/100B41.75 µs±99 ns
verification/ed25519/verify/1KB43.81 µs±88 ns
verification/ed25519/verify/2KB45.57 µs±77 ns
verification/ed25519/verify/4KB49.28 µs±110 ns
verification/ed25519/verify/500B42.93 µs±677 ns
verification/ed25519/verify/8KB57.63 µs±106 ns
verification/sha256-digest/verify/100B102 ns±0 ns
verification/sha256-digest/verify/1KB662 ns±0 ns
verification/sha256-digest/verify/2KB1.30 µs±0 ns
verification/sha256-digest/verify/4KB2.55 µs±1 ns
verification/sha256-digest/verify/500B341 ns±0 ns
verification/sha256-digest/verify/8KB5.08 µs±105 ns