Build with DocBundle
Compile vector PDFs into forensic-grade document bundles with semantic maps, Merkle integrity trees, and clause-level tracking. Open source. Deterministic. Court-ready.
Quick Start
# Download the CLI
curl -L https://manus-storage.s3.us-east-1.amazonaws.com/artifacts/docbundle-cli-linux-x86_64.tar.gz -o docbundle.tar.gz
# Extract
tar xzf docbundle.tar.gz
# Verify
./docbundle --version# Compile a vector PDF into a .docbundle.zip
./docbundle compile contract.pdf -o contract.docbundle.zip
# Output:
# ✓ Extracted 14 elements (3 headings, 8 clauses, 2 definitions, 1 signature)
# ✓ Built semantic map with element IDs
# ✓ Generated 14 graph edges (CONTAINS, NEXT, DEFINES, MATERIAL)
# ✓ Constructed Merkle tree (root: a7f3e2d1...)
# ✓ Bundle written: contract.docbundle.zip# Verify bundle integrity
./docbundle verify contract.docbundle.zip
# Output:
# ✓ Manifest valid
# ✓ Merkle root matches: a7f3e2d1...
# ✓ All 14 element hashes verified
# ✓ Bundle root: b8c4d1e2...
# ✓ INTEGRITY PASSDownload DocBundle CLI
# Windows x86_64 build coming soon!
# For now, build from source:
cargo install --path tools/docbundleWhat's Inside a .docbundle.zip
Semantic Map
Every text block, heading, clause, definition, and signature field extracted with exact bounding boxes, font info, and content-addressable element IDs.
Document Graph
JSONL edges connecting elements: CONTAINS, NEXT, REFERS_TO, DEFINES, MENTIONS, MATERIAL. The full relationship structure of the document.
Merkle Integrity Tree
SHA-256 Merkle tree over all semantic map leaves. Prove any individual clause hasn't been modified without revealing the entire document.
Manifest
Two-pass bundle_root computation. artifact_id is deterministic (same PDF always produces same ID). Includes version, timestamps, and element counts.
Retrieval Index
BM25 + keyword hybrid index for clause-level search. Ask 'what's the indemnification clause?' and get the exact element with coordinates.
Policy Pack
Deterministic gating rules for informed consent: material clause markers, minimum dwell times, acknowledgment requirements.
API Reference
Compile an uploaded PDF into a DocBundle. The PDF must already be uploaded and associated with a signed document record.
{
"documentId": 42
}{
"success": true,
"artifactId": "a7f3e2d1c4b5...",
"bundleRoot": "b8c4d1e2f5a6...",
"merkleRoot": "c9d5e2f6a7b8...",
"elementCount": 14,
"materialClauseCount": 3
}Architecture
1. Compile
The Rust CLI parses the PDF content stream, extracts vector objects, builds the semantic map, constructs the Merkle tree, and outputs a .docbundle.zip.
2. Serve
The server loads the semantic map into FoxSpaces. Live cursors snap to clause-level elements. Dwell time is tracked per material clause. The AI guide answers questions about specific sections.
3. Prove
Every interaction is recorded in an append-only audit trail. The Merkle tree proves per-clause integrity. The forensic record shows exactly what each signer saw, read, and signed.
