From the Humble Scanner to the Intelligent Document — PDF/R Is the Foundation
PDF/R (PDF/Raster) was co-developed by the TWAIN Working Group and the PDF Association to solve a problem that has existed since the beginning of document scanning: scanners produce raster images, but the world runs on documents. TIFF and JPEG are image formats — they carry pixels, not documents. PDF/R changes that.
Published as ISO 23504 in 2020, PDF/R is a strictly-defined subset of the PDF specification purpose-built for scanned raster image documents. It is compact enough to be generated directly by scanner firmware on resource-constrained embedded systems, yet produces 100% valid PDF output that every PDF application in the world can read. It is the native output format of TWAIN Direct — the cloud-native, driverless scanning protocol — and can be adopted as a standalone format for any scanning or imaging workflow.
Now, as the document capture ecosystem evolves toward AI pipelines, content authenticity standards, business-process automation, and next-generation compression, the TWAIN Innovation Cloud is convening developers and adopters to explore how PDF/R extends to meet these emerging requirements — and to build the capabilities together.
- Image formats — pixels only, no document structure
- No embedded provenance or authenticity data
- No business-process metadata survives transmission
- TIFF: no PDF compatibility without conversion
- No AI/LLM semantic structure out of the box
- JPEG: lossy only, no lossless document option
- Fragmented metadata across sidecar files
- Document format — metadata, signatures, encryption
- C2PA provenance manifest embedded at capture
- XMP business-process metadata travels with file
- 100% PDF compatible — reads everywhere
- AI/LLM-ready: OCR + bounding boxes + structured metadata
- JPEG-XL option: superior lossless + lossy compression
- ISO 23504 standardized — royalty-free, open
- ✓ PDF/R specification and technical resources — full access to the PDF/R specification (ISO 23504 / PDF/R-1), sample code repository on GitHub, and the TWAIN Working Group's PDF/R technical documentation — everything needed to implement PDF/R output in a scanner, application, or cloud platform.
- ✓ C2PA integration technical consultation — a working session with TWG technical experts on embedding C2PA Content Credentials manifests within PDF/R output — covering manifest structure, Trust List certificate requirements, C2PA Generator Product conformance pathway, and verification via contentcredentials.org.
- ✓ XMP metadata schema design session — a consultation on designing XMP custom metadata schemas for your specific business-process requirements — document type classification, routing codes, retention policies, compliance tags, and any domain-specific metadata your workflow demands — and how to embed them in PDF/R output at scan time.
- ✓ JPEG-XL feasibility evaluation — a technical review of incorporating JPEG-XL as an additional compression option in your PDF/R implementation — covering the ISO/IEC 18181 specification, available open-source encoders/decoders (libjxl), and the performance/quality tradeoffs for your specific document types and target file sizes.
- ✓ AI/LLM-ready PDF/R architecture review — a technical session on structuring PDF/R output for direct consumption by AI document intelligence pipelines — OCR text with spatial bounding boxes, XMP classification metadata, C2PA provenance for training data integrity, and embedding AI-generated augmentation within the PDF/R file structure.
- ✓ Properly structured PDF output review — a review of your current or planned PDF/R implementation against PDF document structure best practices — page tree, document catalog, metadata streams, encryption, digital signatures, and OCR invisible text layer — ensuring your output participates fully in PDF-consuming downstream ecosystems.
- ✓ TWAIN Direct integration pathway — guidance on incorporating PDF/R as the native output format within a TWAIN Direct driverless scanning workflow — covering the TWAIN Direct protocol's PDF/R output specification, cloud delivery, and integration with downstream document management and AI systems.
- ✓ Direct access to TWG technical experts — TIC programme participants engage directly with TWAIN Working Group engineers and PDF/R experts — including members of the original TWG/PDF Association development team — for implementation questions, architecture review, and emerging capability development.
- ✓ TIC ecosystem connection — introduction to TIC sponsor companies whose products are directly relevant to your PDF/R implementation: ExactCODE (RISC-V/open-source), C2PA (content provenance), Verve Capture (enterprise capture), JSE Imaging (TWAIN software), Dynamsoft (scanning SDK), Thin Scanner (cloud capture), and others in the TIC ecosystem.