TWAIN Innovation Cloud · PDF/R Adoption & Development Programme

ISO 23504

PDF/R

PDF/Raster — Open ISO Standard

TWAIN WG · PDF Association · Royalty-Free

The Open Scanning Format
Evolving for the
AI-Ready Document Era.

Adopt or build with PDF/R (PDF/Raster) — the ISO-standardized, royalty-free document format for scanned content. Now expanding with next-generation capabilities: C2PA content provenance, XMP business-process metadata, JPEG-XL compression, properly structured PDF output, and AI/LLM-ready document files.

📄 PDF/R Foundation (ISO 23504) 🏷 C2PA Content Provenance 📋 XMP Business-Process Metadata 🖼 JPEG-XL Next-Gen Compression 🧠 AI / LLM-Ready PDF/R Files ⚡ Properly Structured PDF Output

ISO 23504 Standardized (2020) · Royalty-Free · Open Download · TWAIN Direct Native Format · PDF Association Co-Developed

Available to registered TIC participants · Adopters and Developers welcome

ISO 23504

Standard (2020)

Royalty-Free

Open Download

TIFF → PDF/R

Replaces TIFF & JPEG

TWAIN Direct

Native Format

5 New Caps

Emerging Capabilities

What is PDF/R & Why It Matters

From the Humble Scanner to the Intelligent Document — PDF/R Is the Foundation

PDF/R (PDF/Raster) was co-developed by the TWAIN Working Group and the PDF Association to solve a problem that has existed since the beginning of document scanning: scanners produce raster images, but the world runs on documents. TIFF and JPEG are image formats — they carry pixels, not documents. PDF/R changes that.

Published as ISO 23504 in 2020, PDF/R is a strictly-defined subset of the PDF specification purpose-built for scanned raster image documents. It is compact enough to be generated directly by scanner firmware on resource-constrained embedded systems, yet produces 100% valid PDF output that every PDF application in the world can read. It is the native output format of TWAIN Direct — the cloud-native, driverless scanning protocol — and can be adopted as a standalone format for any scanning or imaging workflow.

Now, as the document capture ecosystem evolves toward AI pipelines, content authenticity standards, business-process automation, and next-generation compression, the TWAIN Innovation Cloud is convening developers and adopters to explore how PDF/R extends to meet these emerging requirements — and to build the capabilities together.

PDF/R Foundation Capabilities (ISO 23504)

What PDF/R Delivers Today — The Open Baseline

Image Support

Bitonal (1-bit)

Grayscale (8-bit)

RGB Color (24-bit)

JPEG compression

CCITT Group 4 Fax (lossless)

Uncompressed

Multi-page documents

PDF/R Document Features

100% PDF compatible

Metadata support

Digital signature support

Encryption support

Fixed-buffer generation (firmware)

No full PDF parser required to generate

Portable across all PDF consumers

Emerging Capabilities — Where PDF/R Is Going

Content Provenance EMERGING

C2PA Content Credentials — Tamper-Evident Provenance in Every Scan

By embedding a C2PA Content Credentials manifest within the PDF/R document structure, every scanned file can carry a cryptographically signed, tamper-evident record of its origin: the scanning device, the operator identity, the capture timestamp, and whether AI was involved in any processing step. This manifest travels with the PDF/R file to every downstream recipient. Any C2PA-conformant validator — including Adobe, Microsoft, and contentcredentials.org — can verify the document's full provenance chain and detect any post-capture modification. C2PA-enabled PDF/R turns a scanned document into an authenticated document — one that proves where it came from and that it hasn't been altered.

// C2PA manifest embedded in PDF/R XMP metadata stream or as PDF attachment. // Signed using C2PA Trust List certificate. Verifiable by any C2PA Validator. // Aligns with C2PA Specification v2.x Generator Product conformance requirements.

C2PA Spec v2.x Cryptographic Signing Tamper Detection AI Disclosure Ecosystem Interoperable

Business-Process Metadata EXPANDING

XMP Metadata for Business Process — Routing, Workflow, and Compliance in the File

Extensible Metadata Platform (XMP) is an ISO-standardized metadata framework that allows structured, machine-readable metadata to be embedded directly within PDF/R files — extending well beyond basic document information to full business-process metadata: document type, routing destination, approval state, retention schedule, compliance tags, department codes, custodian identity, and any custom workflow attributes your organisation requires. When a PDF/R file arrives at an ECM, ERP, or AI pipeline with XMP business-process metadata embedded, the receiving system knows exactly what the document is, where it belongs, what rules apply to it, and how to process it — without any human routing, interpretation, or re-keying. XMP metadata survives file copy, email transmission, and format conversion, making it the ideal mechanism for encoding business intent at the scanner.

// XMP packet embedded in PDF/R document metadata stream. // Custom schemas for workflow routing, retention policy, document classification. // ISO 16684-1 (XMP) compliant. Readable by any XMP-aware application or AI pipeline.

ISO 16684-1 (XMP) Workflow Routing Retention Policy Custom Schemas Machine-Readable

Document Structure ENHANCED

Properly Structured PDF Output — Document Semantics Beyond Raster Images

PDF/R's strict subset definition keeps implementation simple for embedded systems, but a properly structured PDF/R output goes further — adding correct PDF document structure that enables richer integration with PDF-consuming downstream systems. This includes proper page tree structure, document-level metadata in the PDF document catalog, correctly formed XMP metadata streams, optional OCR text layer embedding (invisible text over images), proper PDF encryption and permission flags, and digital signature fields. The result is a PDF/R file that not only passes PDF validation but participates fully in PDF-based workflow ecosystems — archival (PDF/A compatibility consideration), accessibility pipelines, and enterprise content management systems that expect document-grade PDF, not just image-wrapped PDF.

// Proper PDF document catalog and page tree structure. // XMP metadata stream at document level. Optional invisible OCR text overlay. // PDF encryption flags, permission controls, digital signature fields. // Validated against PDF 1.7 / ISO 32000 conformance requirements.

Document Catalog OCR Text Layer Digital Signatures Encryption Flags PDF/A Alignment

Next-Generation Compression EMERGING

JPEG-XL — Superior Quality at Lower File Sizes for Scanned Documents

JPEG-XL (JXL) is the next-generation image compression standard (ISO/IEC 18181) that delivers dramatically superior quality-to-filesize ratios compared to legacy JPEG — particularly for document scanning use cases. For scanned text and mixed-content documents, JPEG-XL's modular architecture enables lossless compression that beats PNG by ~35%, and lossy compression that beats JPEG at equivalent visual quality by 60% or better. JPEG-XL also supports progressive decoding, HDR, wide color gamut, animation, and lossless re-encoding of existing JPEG files without quality loss (via its JPEG bitstream recompression capability). Incorporating JPEG-XL as an additional compression option within the PDF/R framework would allow scanners and capture devices to produce significantly smaller files with higher fidelity — critical for mobile capture, cloud transmission, and long-term archival workflows.

// JPEG-XL (ISO/IEC 18181) as an additional PDF/R compression stream type. // Lossless mode: ~35% smaller than PNG, perfect for bitonal / text documents. // Lossy mode: 60%+ smaller than JPEG at equivalent SSIM score. // JPEG recompression: lossless transoding of existing JPEG without quality loss.

ISO/IEC 18181 (JPEG-XL) Lossless + Lossy Progressive Decoding Wide Color Gamut JPEG Recompression

AI-Ready Architecture EMERGING

AI / LLM-Ready PDF/R — Structured for Machine Consumption from First Scan

As AI and large language model pipelines increasingly process scanned documents, the gap between what scanners produce and what AI systems need has become a critical inefficiency. AI/LLM-ready PDF/R addresses this by encoding the document's structure and semantics into the file at capture time — so that no pre-processing transformation is needed before the file enters an AI pipeline. This includes: embedded OCR text with spatial coordinates (bounding boxes) aligned to the raster image for visual grounding; structured XMP metadata describing document type, classification, and extraction hints; C2PA provenance manifest confirming the document's authentic origin (critical for AI training data integrity and RAG pipeline trustworthiness); and optional AI-generated document summaries or entity extraction results embedded as PDF/R annotations or XMP custom metadata — so that AI augmentation travels with the document rather than being stored separately in a sidecar system.

// OCR text with bounding boxes: spatially-aligned text for visual grounding. // XMP: document classification, entity extraction hints, confidence scores. // C2PA: provenance manifest for AI training data integrity and RAG trustworthiness. // Embedded AI augmentation: summaries, entities, classifications as XMP / annotations.

OCR + Bounding Boxes Visual Grounding RAG Pipeline Ready AI Training Integrity Embedded Augmentation LLM Context Structured

PDF/R vs Legacy Formats

✗ TIFF / JPEG — The Legacy Approach

✗Image formats — pixels only, no document structure
✗No embedded provenance or authenticity data
✗No business-process metadata survives transmission
✗TIFF: no PDF compatibility without conversion
✗No AI/LLM semantic structure out of the box
✗JPEG: lossy only, no lossless document option
✗Fragmented metadata across sidecar files

✓ PDF/R — The Document Format

✓Document format — metadata, signatures, encryption
✓C2PA provenance manifest embedded at capture
✓XMP business-process metadata travels with file
✓100% PDF compatible — reads everywhere
✓AI/LLM-ready: OCR + bounding boxes + structured metadata
✓JPEG-XL option: superior lossless + lossy compression
✓ISO 23504 standardized — royalty-free, open

Who Should Use This TIC Offer

🖨

Scanner & Device Manufacturers

Firmware developers implementing PDF/R output natively in scanner hardware — evaluating JPEG-XL support, C2PA signing at capture, and XMP metadata embedding from device firmware.

💻

Document Capture ISVs (TIC)

Software developers building TWAIN Direct or TWAIN Classic capture applications — adding PDF/R output with full metadata, provenance, and AI-ready structure to their scan workflows.

🤖

AI / LLM Platform Developers

Teams building document intelligence pipelines who want a standardized, provenance-verified, semantically structured input format from scanners — replacing ad-hoc TIFF/JPEG + OCR sidecar approaches.

☁️

Cloud & ECM Platform Builders

Cloud document management, ECM, and content services platforms adding native PDF/R ingestion — leveraging XMP business-process metadata for automatic routing, classification, and compliance tagging.

🔒

Compliance & Records Platforms

Healthcare, legal, government, and financial services platforms requiring tamper-evident, provenance-verified document input — where C2PA-embedded PDF/R satisfies audit and legal defensibility requirements.

🔬

Standards Researchers & Pilots

Academic researchers, standards body participants, and innovation labs evaluating JPEG-XL, AI/LLM document structuring, and C2PA integration within a royalty-free, open ISO scanning standard.

🔓

Royalty-Free · Open Download · ISO Standardized — Yours to Use and Build On

PDF/R is completely free to use, implement, and build on. The specification is available for royalty-free download from the TWAIN Working Group and the PDF Association. Sample code is available on GitHub. The standard is ISO-published (ISO 23504) and co-developed with the PDF Association — one of the world's leading open document standards organizations. There is no licence fee, no certification cost, and no vendor lock-in. Build it into your scanner, your application, your cloud platform, or your AI pipeline — it belongs to the ecosystem.

↓ Download PDF/R Spec GitHub Sample Code PDF Association Resource twain.org

What's Included in the TIC PDF/R Programme

✓ PDF/R specification and technical resources — full access to the PDF/R specification (ISO 23504 / PDF/R-1), sample code repository on GitHub, and the TWAIN Working Group's PDF/R technical documentation — everything needed to implement PDF/R output in a scanner, application, or cloud platform.
✓ C2PA integration technical consultation — a working session with TWG technical experts on embedding C2PA Content Credentials manifests within PDF/R output — covering manifest structure, Trust List certificate requirements, C2PA Generator Product conformance pathway, and verification via contentcredentials.org.
✓ XMP metadata schema design session — a consultation on designing XMP custom metadata schemas for your specific business-process requirements — document type classification, routing codes, retention policies, compliance tags, and any domain-specific metadata your workflow demands — and how to embed them in PDF/R output at scan time.
✓ JPEG-XL feasibility evaluation — a technical review of incorporating JPEG-XL as an additional compression option in your PDF/R implementation — covering the ISO/IEC 18181 specification, available open-source encoders/decoders (libjxl), and the performance/quality tradeoffs for your specific document types and target file sizes.
✓ AI/LLM-ready PDF/R architecture review — a technical session on structuring PDF/R output for direct consumption by AI document intelligence pipelines — OCR text with spatial bounding boxes, XMP classification metadata, C2PA provenance for training data integrity, and embedding AI-generated augmentation within the PDF/R file structure.
✓ Properly structured PDF output review — a review of your current or planned PDF/R implementation against PDF document structure best practices — page tree, document catalog, metadata streams, encryption, digital signatures, and OCR invisible text layer — ensuring your output participates fully in PDF-consuming downstream ecosystems.
✓ TWAIN Direct integration pathway — guidance on incorporating PDF/R as the native output format within a TWAIN Direct driverless scanning workflow — covering the TWAIN Direct protocol's PDF/R output specification, cloud delivery, and integration with downstream document management and AI systems.
✓ Direct access to TWG technical experts — TIC programme participants engage directly with TWAIN Working Group engineers and PDF/R experts — including members of the original TWG/PDF Association development team — for implementation questions, architecture review, and emerging capability development.
✓ TIC ecosystem connection — introduction to TIC sponsor companies whose products are directly relevant to your PDF/R implementation: ExactCODE (RISC-V/open-source), C2PA (content provenance), Verve Capture (enterprise capture), JSE Imaging (TWAIN software), Dynamsoft (scanning SDK), Thin Scanner (cloud capture), and others in the TIC ecosystem.

TIC PDF/R Adoption & Development Programme

The TWAIN Working Group will be in touch within 3 business days to connect you with the right technical resources and experts.

* Required fields

First Name *

Last Name *

Business / Developer Email *

Company / Organisation *

Job Title *

Your Role *

Adoption or Development Intent *

New Capabilities of Primary Interest *

Technical Stack / Environment

Describe Your PDF/R Use Case or Development Goal *

I am a registered TWAIN Innovation Cloud (TIC) participant or am applying through the TIC programme. *

I consent to being contacted by the TWAIN Working Group regarding PDF/R technical resources, the TIC programme, and related PDF/R development opportunities. See twain.org for privacy information.

Resources & Contact

🌐

PDF/R Site

pdfraster.org

💻

GitHub Sample Code

github.com/twain/pdfraster

📋

ISO Standard

ISO 23504 (PDF/R-1) — pdfa.org resource

🤝

PDF Association

pdfa.org

🖨

TWAIN Direct

twain.org/twain-direct

✉️

TWG Contact

info@twain.org

🌐

TWAIN Working Group

twain.org

🏷

C2PA Standard

c2pa.org · Content Credentials

🖼

JPEG-XL

ISO/IEC 18181 · jpeg.org/jpegxl

📋

XMP Metadata

ISO 16684-1 · Adobe XMP

💡

TIC Programme

twaininnovationcloud.org

⏱

Response Time

Within 3 business days for PDF/R enquiries

Important Notice

This programme is offered by the TWAIN Working Group as part of the TWAIN Innovation Cloud. PDF/R (PDF/raster) is an open, royalty-free standard published as ISO 23504 and co-developed with the PDF Association; it is freely available for download at pdfraster.org and pdfa.org. The "emerging capabilities" described on this page (C2PA integration, XMP business-process metadata, JPEG-XL compression, AI/LLM-ready structure) represent the TWAIN Working Group's active areas of exploration and community development — they are not all currently incorporated in the ISO 23504 specification, and their development timelines and final form are subject to the standards development process and community participation. C2PA is governed by the Joint Development Foundation; JPEG-XL is governed by JPEG/ISO/IEC; XMP is governed by ISO/Adobe. Participation in this TIC programme is non-binding and exploratory. Information submitted through this form will be shared with the TWAIN Working Group solely for the purpose of supporting PDF/R adoption and development. © 2026 TWAIN Working Group. All rights reserved.

The Open Scanning FormatEvolving for theAI-Ready Document Era.

From the Humble Scanner to the Intelligent Document — PDF/R Is the Foundation

The Open Scanning Format
Evolving for the
AI-Ready Document Era.