No ads. No tracking. No data sold. Ever.
🛡️ Security Guide

PDF Malware Scanner
47 Forensic Engines. Free. Online.

Malware & exploits · document-integrity tampering · content-integrity & AI-ingestion attacks — graded across four forensic axes.

PDF files are one of the most common malware delivery vectors — used in phishing campaigns, APT attacks, and exploit kits for decades. Most scanners can only find threats they have already seen. PQ PDF's 47 forensic engines detect both known and unknown threats: behavioral sandbox execution catches what a PDF does regardless of whether it has a signature, ML anomaly detection flags structurally abnormal files even with no prior example, and differential parsing exposes hidden objects whether or not they match any known exploit pattern.

But this is full document forensics, not just a malware scanner. It also detects integrity tampering — signature forgery, shadow documents, post-signing injection — and content-integrity / semantic-determinism attacks, where a file deliberately shows one thing to a human and a different thing to a parser or AI/RAG pipeline: value-vs-appearance (V/AP) divergence, font glyph remapping, OCR text-layer poisoning, and /Alt & /ActualText prompt injection. The verdict is graded across all four forensic axes — so a document-integrity or AI-ingestion attack that carries no malware at all is still surfaced. All free, with zero data retention.

47
Forensic engines
6.4M+
Offline threat indicators
6
Sandbox renderers
24
Report tabs
0
Data retained
🔬 Scan a PDF Now — Free

No account. No upload limit. File deleted immediately after analysis.

PDF Threats You Can't See by Opening the File

A PDF that opens and looks normal can still be malicious. The PDF specification is complex enough that attack vectors are buried in layers of structure that no viewer surfaces to the reader. Emotet used password-protected PDF lures to deliver macro-laced Word droppers. MuddyWater (Iranian APT) relied on PDF first-stage attachments throughout 2022–2024 campaigns against government targets. APT28 (Fancy Bear) distributed CVE-2015-2545 EPS-exploit PDFs in spear-phishing operations against NATO targets. More recently, QakBot and IcedID campaigns shifted entirely to PDF delivery after Microsoft disabled Office macros by default. And beyond classic malware, a newer class of attack exploits the fact that a PDF can present different content to a human than to a parser or AI pipeline — no exploit code required. We measured how widespread this is across nearly 8,000 PDFs in our PDF security research (parser disagreement, V/AP divergence, reality drift, and AI-ingestion poisoning). These are the threat categories PQ PDF detects in malicious and deceptive PDFs:

Malicious JavaScript
PDF supports a full JavaScript engine. Attackers embed eval(unescape(...)) shellcode loaders, heap spray sequences, and multi-layer obfuscated scripts that execute silently when the file opens in a vulnerable viewer.
💥
CVE Exploit Patterns
Specific byte sequences trigger bugs in PDF renderers. CVE-2009-0658 (JBIG2 heap overflow), CVE-2024-41869 (use-after-free in Adobe Reader), CVE-2024-45112 (type confusion) — the scanner has 24 YARA rules targeting these and more.
📬
AcroForm Exfiltration
Interactive form fields can silently POST all field data — including typed passwords — to an attacker-controlled server via /SubmitForm actions. Hidden fields collect data without user knowledge.
🖼️
OCG Layer Cloaking
Optional Content Groups (layers) can hide malicious content — JavaScript, phishing text, embedded payloads — in layers that are invisible in normal view but present in the file structure and executed by the parser.
🔏
Signature Forgery (Shadow Attack)
A PDF can display a valid digital signature while containing content the signer never approved. The Shadow Attack exploits ByteRange gaps in the signature specification to hide malicious objects outside the signed byte range.
📦
Embedded Executables
PDF allows embedding arbitrary file attachments. Malicious PDFs routinely embed PE (.exe), ELF, OLE compound documents, VBA macro files, ZIP archives, and nested PDFs — activated by /Launch actions or /EmbeddedFile streams.
🎣
Phishing & Brand Impersonation
PDF is a common phishing delivery format. Fake login pages, QR codes pointing to credential-harvesting sites, and brand impersonation (Microsoft, DocuSign, Adobe) are embedded as interactive forms or URI actions.
👻
Invisible & Unicode-Obfuscated Text
Text with rendering mode 3 (invisible) or 7 (clip only) is present in the PDF but never drawn. Combined with RTL override characters (U+202E), attackers reverse filenames and URLs in a way that looks legitimate to a casual reader.
🧩
XFA FormCalc Exploits
XFA (XML Forms Architecture) is a complex XML-based alternative form system supported by Adobe Reader. It contains its own scripting language (FormCalc) and has been the vehicle for multiple critical RCE vulnerabilities rarely analysed by general-purpose scanners.
🎭
Value/Appearance (V/AP) Divergence
A form field can display one value while storing a different one in the signed and extracted data — “I agree to $10” on screen, “$10,000” in the data. Checkbox/radio /V-vs-/AS mismatch, blank or image-based appearance streams, and /NeedAppearances staleness all let the visible document disagree with its machine-readable value. No exploit code — pure document fraud.
🤖
AI-Ingestion & Semantic Poisoning
A document can show one thing to a human and another to an AI/RAG pipeline: font glyph remapping (rendered “$1,200”, extracted “$12,000” via a poisoned /ToUnicode map), hidden OCR/text-layer poisoning on scanned pages, and /Alt & /ActualText prompt injection — invisible to readers but consumed verbatim by LLMs. PQ PDF measures cross-extractor semantic determinism to catch what no malware scanner looks for.

Who Should Scan a PDF Before Opening It?

PDF is the most common malware delivery format in targeted attacks. According to Verizon's DBIR, email attachments account for over 90% of malware delivery — and PDF is consistently in the top two formats alongside Office macros. These are the people who scan before they open:

🛡️
SOC Analysts
Triaging email attachments from phishing alerts. Need MITRE ATT&CK mapping, IOC extraction, and a structured verdict they can attach to a ticket — not just a pass/fail AV result.
💼
IT & Security Administrators
Checking vendor-supplied PDFs, software documentation, or procurement contracts before distributing inside the organisation. One malicious PDF forwarded internally becomes a lateral movement risk.
⚖️
Legal & Compliance Teams
Law firms and compliance officers routinely receive PDFs from opposing parties, regulators, and clients — including adversarial actors who know the recipient will open the file. Privileged documents cannot be uploaded to VirusTotal.
🏥
Healthcare & Finance
Insurance claims, billing statements, and financial reports in PDF format are a known targeting vector for ransomware groups (including LockBit and Cl0p campaigns). Regulations like HIPAA prohibit sharing patient data with cloud services — offline scanning is required.
🔬
Malware Researchers
Analysts studying Emotet, MuddyWater, APT28, and other threat actors that use PDF as a first-stage delivery mechanism. The full 47-engine output and AI forensic report provide the depth needed to document a campaign technically.
🏠
Remote Workers
Receiving an unsolicited PDF on a work laptop — a courier notification, a contract revision, an invoice from an unknown sender. The HR and finance departments are the most-targeted recipients of spear-phishing PDFs.

PQ PDF vs. VirusTotal, Hybrid Analysis, Adobe & MetaDefender

Antivirus engines answer one question: "Have we seen this before?" If a threat has been catalogued, they find it. If it hasn't — a zero-day exploit, a freshly obfuscated payload, a novel XFA FormCalc attack, a new JS shellcode loader — it passes straight through. PQ PDF answers a different question: "What does this PDF actually do?" Behavioral execution, ML anomaly detection, structural differential analysis, entropy profiling, and AI ingestion integrity checks find dangerous files whether or not any signature for them exists anywhere. Here is how the tools compare:

Capability PQ PDF
Free · No account
VirusTotal
Free (account) · Online
Hybrid Analysis
Free (limited) · CrowdStrike
Adobe Acrobat Pro
~$23/month
MetaDefender
OPSWAT · Paid
AV signature scanning ✓ ClamAV 700k+ sigs ✓ 70+ AV engines ✓ CrowdStrike + partners ✗ No AV scanning ✓ 30+ AV engines
YARA rules (PDF-specific) ✓ 24 custom PDF YARA rules ⚠ Community rules, generic ⚠ Generic YARA rules ✗ No ⚠ Limited, generic
Behavioral sandbox execution ✓ 6 PDF renderers, isolated namespaces, strace ⚠ General sandbox — not PDF-specific renderers ✓ Good dynamic analysis, general sandbox ✗ No sandbox ⚠ Basic sandbox, limited PDF renderer coverage
PDF structural analysis (XRef, objects, streams) ✓ 15 static engines built for PDF structure ✗ AV engines scan bytes, not PDF structure ✗ No structural PDF analysis ✗ No structural analysis ✗ No structural PDF analysis
JavaScript AST deobfuscation ✓ Full AST deobfuscator + Acrobat API emulation ✗ No ⚠ Runtime observation only ✗ No ✗ No
XFA FormCalc parsing ✓ Dedicated XFA parser engine ✗ No ✗ No ✗ No ✗ No
Signature forgery / Shadow Attack detection ✓ ByteRange forensics engine ✗ No ✗ No ✗ No ✗ No
AcroForm exfiltration / hidden field analysis ✓ Full field tree, SubmitForm targets, JS triggers ✗ No ✗ No ✗ No ✗ No
Six-parser differential comparison ✓ MuPDF, Poppler, GS, qpdf, pdfminer, pdf.js ✗ No ✗ No ✗ No ✗ No
Machine learning anomaly detection ✓ IsolationForest + RandomForest + LightGBM + SHAP ✗ No ✗ No ✗ No ✗ No
OCR vs. text layer divergence (hidden text poisoning) ✓ Tesseract OCR vs. embedded text layer — Jaccard similarity per page ✗ No ✗ No ✗ No ✗ No
Reading order & spatial ambiguity (AI ingestion) ✓ Multi-column layout detection, parser extraction order conflicts ✗ No ✗ No ✗ No ✗ No
Accessibility tree injection (/Alt, /ActualText) ✓ /StructTreeRoot forensics — prompt injection in semantic layer ✗ No ✗ No ✗ No ✗ No
MITRE ATT&CK technique mapping ✓ Every indicator mapped to technique IDs ⚠ Some detections, not systematic ✓ Good ATT&CK coverage ✗ No ⚠ Limited mapping
AI forensic narrative report ✓ Self-hosted Qwen 2.5 — structured verdict + findings ✗ No ✗ No ✗ No ✗ No
File privacy / zero data retention ✓ Deleted immediately, no external calls, no hashes shared ✗ Files stored; hashes and reports are community-shared ✗ Files stored; can be set private (paid only) ✓ Local processing, file stays on your machine ⚠ Enterprise tier offers private scanning
Offline threat intelligence ✓ 6.4M+ indicators in local databases — zero external calls ⚠ All queries sent to external services ⚠ Online lookups ✗ No threat intel ⚠ Cloud-based lookups
Sanitize / clean the PDF ✓ 9 methods: flatten-to-images, strip JS, remove XFA, PDF/A… ✗ No ✗ No ✓ "Sanitize Document" removes active content ⚠ Basic sanitization in some tiers
Cost ✓ Free — no account required ✓ Free with account (rate limited) ✓ Free tier (limited submissions/day) ✗ ~$23/month subscription ✗ Paid — enterprise pricing

The honest assessment: VirusTotal's 70+ AV engines are the best tool in existence for one specific question — "has this exact file been seen and named by the antivirus industry?" If you need community reputation across 70 vendors, use it. For everything else — detecting what a PDF does, finding zero-days, structural forensics, AI ingestion integrity, sanitization, MITRE ATT&CK mapping, and keeping your file private — PQ PDF does all of it, free, with no account required.

All 47 Forensic Engines Explained

Every uploaded PDF passes through 47 independent analysis engines in a single request. Each engine is orthogonal — designed to catch a different class of threat that the others might miss. Results are correlated by the Correlation Engine (Engine 47) that maps compound indicators to MITRE ATT&CK techniques.

🔍
Static Analysis — Structure & Byte Level
Engines 1–15
ENGINE 1
Structure Validator
Validates the PDF header, version declaration, cross-reference table, trailer dictionary, and byte offsets. Malformed structures are a hallmark of exploit kits that deliberately break parsers to hide objects. Also detects linearized first-page object overrides: incremental updates that re-define an existing Page 1 object to inject JavaScript or actions that renderers fast-pathing the linearization hint table will not see on initial render.
ENGINE 2
Pattern Scanner
Byte-level search for dangerous PDF keywords: /JavaScript, /JS, /Launch, /OpenAction, /AA, /EmbeddedFile, /RichMedia, /XFA, /AcroForm, heap spray constants, and shellcode sequences.
ENGINE 3
Stream Inspector
Decompresses and inspects every stream object in the PDF. Computes per-stream entropy — high-entropy streams hidden inside otherwise clean documents are a strong indicator of encrypted payloads or steganographic content.
ENGINE 4
Object Analyzer
Traverses the full PDF object tree. Maps parent-child relationships, counts suspicious object types, identifies cross-reference anomalies (duplicate object numbers, phantom free entries), and enumerates all dictionary keys.
ENGINE 5
URL Extractor
Extracts all URIs from the PDF including hex-encoded, percent-encoded, and split/obfuscated variants. Flags javascript:, data:, file://, and vbscript: schemes. All URLs are passed to the Threat Intelligence engine.
ENGINE 6
Metadata Analyzer
Examines XMP and Info dictionary metadata: Creator, Producer, Author, creation date, modification date, and custom metadata keys. Detects exploit-kit fingerprints (Metasploit, msfvenom, Canvas, Core Impact) in tool identifiers.
ENGINE 7
Font Analyzer
Inspects every font object for non-standard encoding, oversized /Widths arrays (historic heap-overflow vector), non-embedded fonts that trigger external font lookups, and suspicious glyph name mappings. Analyses ToUnicode CMap tables — the mapping from glyph IDs to Unicode codepoints — detecting remaps where a visually rendered ASCII character (e.g. A) resolves to a non-ASCII Unicode codepoint in the extracted text layer. These remaps make visible text differ from extracted text, corrupting entity extraction, compliance scanning, and AI embeddings without any visible change to the rendered document.
ENGINE 8
CVE Pattern Matcher
Checks for /JBIG2Decode (CVE-2009-0658), /JBIG2Globals exploit parameters, oversized /Widths arrays, and codec parameter combinations associated with heap-overflow and memory corruption CVEs in Adobe Reader and Foxit.
ENGINE 9
Structural Statistics
Computes structural ratios: JavaScript-to-page ratio, stream-to-object ratio, compression diversity index, average object size, and entropy distribution. Statistically anomalous documents are flagged even without specific rule matches.
ENGINE 10
ExifTool Metadata Forensics
Runs ExifTool for deep metadata extraction and cross-source reconciliation across all metadata channels: /Info dictionary, XMP metadata, embedded XML, and attachment timestamps. Detects desynchronization where these sources report conflicting dates — e.g. /Info creation date 2024 while XMP reports 2019 — a strong indicator of document manipulation, backdating, or incremental-update tampering. Also independently confirms XFA form presence, surfaces embedded attachment flags, and detects creator/producer strings from known exploit-generation toolkits.
ENGINE 11
qpdf Structural Integrity
Runs qpdf --check to validate cross-reference tables and trailer dictionaries from a second, independent parser. Intentionally malformed XRef tables are a hallmark of exploit kits designed to hide objects from basic parsers.
ENGINE 12
YARA Rule Engine
Applies 24 custom YARA rules: heap-spray patterns (%u9090, 0x0c0c), CVE-specific byte sequences (CVE-2009-0658, CVE-2024-41869, CVE-2024-45112), obfuscated JS loaders, XFA+script combos, Cobalt Strike beacon signatures, PowerShell encoded commands, and multi-layer encoder chains.
ENGINE 13
PeePDF Deep Analysis
Independent analysis using the PeePDF framework — a completely separate parser that builds its own object tree independently of PyMuPDF. Provides a full second-opinion parse, locating vulnerability patterns with exact object IDs and identifying suspicious elements invisible to the primary parser.
ENGINE 14
ClamAV Signature Scanner
Runs the local ClamAV daemon against the file — 700,000+ signatures including the Pdf.Exploit.* family covering CVE-2009-0927, CVE-2009-4324, and the Exploit.PDF-JS category. A ClamAV match means the file is a confirmed known threat.
ENGINE 15
Polyglot Detection
Detects files that are simultaneously valid in two or more formats. Two detection layers: (1) File-level polyglot — checks whether a recognised format signature (JPEG FF D8 FF, ZIP PK\x03\x04, PNG, GIF, Gzip, OLE, RIFF) appears in the bytes before the %PDF- header. ISO 32000 §7.5.2 permits arbitrary bytes before %PDF-; attackers exploit this to create JPEG+PDF or ZIP+PDF polyglots that bypass format-based content filters — email gateways see a JPEG, the PDF payload executes. (2) Stream-level polyglot — scans every PDF stream (raw and decompressed) for embedded executable magic: ZIP, Windows PE, Linux ELF, Mach-O, Java class, OLE/CFBF, RAR, 7-Zip, WebAssembly, HTML, and PostScript. Polyglot files smuggle dropper payloads past content-type security controls.
🔥
Dynamic Behavioral Analysis
Engine 16
ENGINE 16
Dynamic Behavioral Sandbox
The only engine that actually executes the PDF. Renders through six independent engines — Ghostscript (PostScript + JS interpreter), MuPDF, Poppler, LibreOffice Draw, Chromium PDFium (Chrome's engine — the most common modern viewer), and pdf.js/Node (Firefox engine) — each inside isolated Linux namespaces with its own network stack, PID space, and mount point. All syscalls captured by strace. Detects: network beaconing, anonymous executable memory (shellcode), shell spawning, filesystem escape attempts, and process bombs. Static analysis sees structure; this engine sees what the PDF does.
🧠
Machine Learning & Differential Parsing
Engines 17–18
ENGINE 17
ML Intelligence Engine
Extracts a 38-feature vector from all preceding engine outputs and applies three layers: Bayesian contextual scoring (dampens known-benign creator tools, amplifies exploit-kit fingerprints), IsolationForest anomaly detection (unsupervised, active from the first scan), and RandomForest + LightGBM classifiers with SHAP explainability. Reports top contributing features for each scan so analysts understand the ML verdict, not just the score.
ENGINE 18
Differential Parsing
Runs six independent PDF parsers — MuPDF (mutool), Poppler, Ghostscript, qpdf, pdfminer, and pdf.js — and cross-compares eight structural dimensions: page count, object count, PDF version, JavaScript presence, encryption status, AcroForm presence, embedded file count, and OpenAction. Discrepancies mean the file exploits parser differences to hide objects — the signature of broken-xref exploit staging and incremental-update attacks. See the empirical parser disagreement tests for 11 reproducible examples with live scanner output.
🌐
Threat Intelligence, JavaScript & Campaign Attribution
Engines 19–24
ENGINE 19
JS AST Deobfuscation
Parses embedded JavaScript to its abstract syntax tree, then applies symbolic simplification to undo eval/unescape layers, string-split obfuscation, hexadecimal encoding, and multi-pass encoder chains. Surfaces the final deobfuscated payload for manual review.
ENGINE 20
Threat Intelligence
Queries four fully offline local databases: URLhaus, MalwareBazaar, ThreatFox, and FeodoTracker + OpenPhish — 6.4M+ indicators including URLs, IPs, domains, file hashes, and botnet C2 addresses. Zero external API calls. All extracted URLs and IPs from the PDF are cross-referenced.
ENGINE 21
Signature Forensics
Deep forensics on PDF digital signatures across six dimensions: ByteRange coverage integrity (per ISO 32000 §12.8.1, offsets are from the %PDF- header — o1 must be 0, both segments within file bounds, inner gap must contain only the /Contents blob, and o2+l2 must reach at least %%EOF); shadow document detection (unsigned bytes beyond the signed region containing execution vectors — CVE-2019-14980 class); full-save rewrite detection (when o2+l2 < %%EOF and the unsigned trailing region contains xref/trailer structure without execution vectors, a PDF viewer performed a complete file rewrite — this invalidates the cryptographic signature while the visual signature appearance remains, a pattern used by DocuSign and similar tools); /Contents blob structural validation (all-zero placeholders, sub-32-byte blobs, missing DER SEQUENCE header); SubFilter deprecation (SHA-1 collision risk, legacy no-chain variants, unknown formats); and weak digest algorithm detection (MD5/SHA-1 vulnerable to collision-assisted forgery).
ENGINE 22
Phishing Detection
Combines regex heuristics with NLP analysis to detect credential-harvesting forms, QR codes pointing to phishing domains, brand impersonation (Microsoft, Adobe, DocuSign, DHL, PayPal), urgency language patterns, and deceptive URI display vs. actual destination mismatches.
ENGINE 23
Embedded File Analysis
Enumerates all embedded attachments. Identifies PE executables, ELF binaries, OLE compound documents, VBA macro files, ZIP archives, nested PDFs, and JavaScript files. Flags dangerous /Launch actions that auto-execute embedded files on viewer interaction.
ENGINE 24
Campaign Attribution
Computes a TLSH fuzzy hash of the PDF and compares it against previously scanned samples. Clusters similar files into malware families and named campaigns, reporting a similarity score and any known cluster associations. Reveals whether a file is a variant of a known threat.
📄
PDF-Specific Deep Forensics
Engines 25–43
ENGINE 25
AcroForm Field Forensics
Enumerates every form field and analyses: JavaScript on /A and /AA field events (focus, blur, keystroke, validate), hidden NoExport fields, password-type fields (credential harvesting), /SubmitForm exfiltration targets, and calculation-order chain exploitation. Also performs Value / Appearance Stream (V/AP) divergence detection — flags /NeedAppearances true (stale AP, critical when signed), checkbox/radio /V vs /AS key mismatch (rendering-independent), text/listbox/combobox field AP stream text extraction with font encoding remap (resolves /Encoding /Differences tables so a font mapping byte 0x31 to glyph /nine is decoded correctly before comparison to /V — catches the custom-font evasion path), image-based AP stream detection (AP renders via Do image XObject with no text operators — /V is not visually verifiable without image recognition, flagged high severity), and blank AP streams that hide a signed value from the viewer.
ENGINE 26
Document Revision History
Splits the PDF at each %%EOF boundary and extracts per-revision metadata: author, producer, modification date, and changed/new/deleted object counts per revision. Detects author identity changes, execution vectors injected after original creation, and automated exploit staging via large final-revision object injections.
ENGINE 27
Annotation Forensics
Examines every annotation object for dangerous action dictionaries: javascript: URI schemes, JavaScript triggers on click/hover, /Launch actions that spawn programs, /GoToR remote links, and /SubmitForm in annotation actions — attack vectors completely invisible to byte-level scanners.
ENGINE 28
Named Tree Analysis
Catalogues the full PDF action infrastructure: the Named JavaScript Registry (/Names /JavaScript), /AA additional actions, /OpenAction type classification, and /Perms and /UR3 permission restriction exploitation. Deep DocMDP forensics — parses the /P permission level (1 = no changes, 2 = form fill-ins, 3 = annotations — the most exploitable), validates /TransformParams and /Reference structure, checks /SigFlags AppendOnly bit, detects incremental updates violating MDP constraints, and flags multiple /DocMDP entries (validator confusion attack). FieldMDP per-signature field lock (ISO 32000 §12.8.2.4, "File MDP") — distinct from DocMDP, FieldMDP locks specific named form fields per approval signature and can be selectively permissive: detects Action=Include with empty /Fields (locks nothing despite appearing to certify), Action=Exclude with named fields (those fields are explicitly unlocked), and incremental updates that modify form fields after a FieldMDP signature is in place.
ENGINE 29
Content Stream Forensics
Inspects decompressed content streams for dangerous PostScript operators: exec (dynamic execution), run (file execution), token (string-to-code eval), setpagedevice (PostScript-to-system bridge). Also detects malformed /ICCBased color profiles of anomalous size — the CVE-2021-21017 class of heap buffer overflows.
ENGINE 30
Object Stream Analysis
PDF 1.5+ allows objects to be compressed into /ObjStm containers — invisible to byte scanners. This engine decompresses every object stream and re-scans the content for JavaScript, Launch actions, EmbeddedFile references, and high-entropy payloads (entropy >7.5 bits) suggesting hidden encrypted content.
ENGINE 31
PDF Token Obfuscation Decoder
Decodes hex-escaped PDF name tokens: /J#61vaScript/JavaScript, whitespace-split token injection, and null-byte injection in name objects. These bypass simple pattern matchers while remaining valid to the PDF renderer — a classic evasion technique found in real-world exploit kits.
ENGINE 32
XFA FormCalc Parser
Extracts and parses XFA (XML Forms Architecture) streams including embedded FormCalc scripts. XFA-based attacks (CVE-2021 XFA class) are rarely covered by general-purpose scanners. Detects dangerous FormCalc function calls, script injection in XFA event handlers, and XFA-activated JavaScript triggers.
ENGINE 33
Action Dependency Graph
Maps the full chain of PDF actions: /OpenAction/AA → field actions → annotation triggers → named actions. Visualises multi-hop execution chains where a seemingly innocent trigger leads through a chain of named actions to a final exploit — invisible when examining any single action in isolation.
ENGINE 34
OCG Layer Cloaking
Analyses Optional Content Groups (PDF layers). Malicious content — JavaScript, phishing text, embedded payloads, deceptive instructions — can be placed in a layer set to invisible by default, present in the file but never rendered to the user. This engine enumerates all layers and their visibility states, flagging hidden active content.
ENGINE 35
Unicode & Invisible Text Forensics
Detects text with rendering mode 3 (invisible — clips nothing, draws nothing) and mode 7 (clip only — used to silently position clickable areas). Flags RTL override characters (U+202E) that reverse displayed filenames and URLs, and zero-width joiners used to split and reassemble malicious keywords.
ENGINE 36
Trailer Chain Forensics
Analyses the chain of PDF trailer dictionaries across all incremental updates. Detects Shadow Attack variants where a second document is hidden in the byte gap between the end of the signed region and the actual EOF — allowing content replacement while preserving a valid digital signature.
ENGINE 37
Codec Parameter Validation
Validates stream filter parameters for exploit-relevant codecs: /JBIG2Decode + /JBIG2Globals combinations (CVE-2009-0658 class), abnormally large /Columns and /Rows values in CCITT streams, and unusual parameter combinations in /CCITTFaxDecode and /DCTDecode filters associated with historic heap overflow exploits.
ENGINE 38
Physical Entropy Topology
Maps byte-level entropy across the physical file in sliding windows, producing an entropy profile of the entire document. Locates encrypted or compressed regions at unexpected byte offsets, encrypted blobs appended after the nominal EOF, and entropy spikes that indicate hidden payload injection invisible to object-based analysis.
ENGINE 39
Image Steganography & Tracking Beacons
Runs LSB chi-square statistical analysis on raster images embedded in the PDF — elevated chi-square scores indicate LSB steganographic payload injection. Also detects tracking beacons: 1×1 pixel images with external URI references that phone home on document open, allowing attackers to confirm successful delivery without any explicit JavaScript.
ENGINE 40
PDF/A Compliance Fraud Detection
Checks whether a PDF claiming PDF/A or PDF/UA conformance (typically to pass corporate compliance filters) actually meets the standard. Documents falsely claiming archival conformance to bypass security gateways that whitelist "archival" formats are a known evasion technique.
ENGINE 41
JavaScript Behavioral Emulation
Executes embedded JavaScript in a sandboxed Acrobat API stub environment. Simulates the Acrobat object model (app, this, util) to reveal what JavaScript does without a real viewer — catching payload assembly that requires runtime evaluation to surface. doc.getField() returns the actual /V field values from the PDF (collected by Engine 25) so conditional exploitation chains are correctly evaluated and SUBMIT_FORM events carry real field content.
ENGINE 42
Font CharString Emulator
Emulates Type 1 and Type 2 font CharString programs (the bytecode embedded in font outlines) to detect seac operator abuse (out-of-bounds glyph lookup), stack exhaustion via deeply nested subroutine calls, and arithmetic overflow patterns in CharString arithmetic — a class of font-engine exploits affecting all major PDF viewers.
ENGINE 43
XRef Integrity Graph
Builds a complete cross-reference integrity graph and identifies: phantom objects (objects referenced in the XRef but absent from the file body), orphan sleepers (objects in the file body unreferenced by any XRef entry — hidden until a parser recovers them), and free-entry exploitation (objects with generation numbers manipulated to survive deletion).
🤖
AI Ingestion Integrity
Engines 44–46
ENGINE 44
Reading Order & Spatial Ambiguity
Analyses spatial text positioning to detect multi-column and non-linear layouts where extraction order is ambiguous. PDF has no native concept of paragraphs, reading order, or semantic structure — text is positioned drawing operations. When columns, tables, or complex spatial clusters are present, parsers reconstruct reading order heuristically and disagree. Flags pages where linear text extraction produces semantically incorrect content, creating hallucinated relationships, inverted meanings, and corrupted table structures in LLM and RAG ingestion pipelines that treat parser output as canonical truth.
ENGINE 45
OCR Text Layer Integrity
Compares the embedded PDF text extraction layer against Tesseract OCR output rendered from the page raster image. A significant mismatch between the two is a strong indicator of a hidden text layer attack: malicious instructions, prompt injection, or sensitive data placed in the invisible text stream — invisible to human readers looking at the rendered image, but fully ingested by LLMs and RAG pipelines that consume the text layer. Also detects the blank-image-over-text-layer variant where a white raster overlay covers visible content while a poisoned text layer remains accessible to all text extractors. Empty OCR output combined with substantial embedded text on the same page is scored as maximum divergence (Jaccard = 0.0 / CRITICAL).
ENGINE 46
Accessibility Tree Forensics
Parses the /StructTreeRoot accessibility structure and inspects all semantic elements: /Alt image descriptions, /ActualText character-level overrides, heading hierarchy, logical structure labels, and figure captions. These channels are increasingly preferred by AI document processors because they improve chunking quality — making them high-value injection targets. Detects prompt injection in /Alt and /ActualText attributes: semantic content that exists only in the accessibility tree, fully invisible in rendering but completely visible to LLM ingestion pipelines, tagged-PDF extractors, and screen-reader-style processors. Flags payloads containing instruction-override patterns ("ignore prior instructions", "system:", "INST") that would execute in downstream AI processing.
⚙️
Synthesis & AI Report
Engine 47 + AI
ENGINE 47
Correlation Engine
Evaluates 60+ compound patterns across all preceding engine outputs. Individual indicators may be low-risk in isolation — but /OpenAction + embedded JavaScript + obfuscated URL + non-embedded font is a dangerous combination. The Correlation Engine awards bonus risk points (35–100) for such combinations and maps each compound pattern to MITRE ATT&CK technique IDs.
AI REPORT
🤖 AI Forensic Report — Qwen 2.5
A self-hosted Qwen 2.5 model (running on a private GPU server — no third-party AI API) synthesises all 47 engine outputs into a structured verdict: threat classification, confidence level, key findings, MITRE ATT&CK technique grid, and recommended actions. Zero data leaves pqpdf.com infrastructure.

How the Risk Score Works

Every finding is classified onto one of four forensic axes, and the headline verdict is graded by what each finding actually means — not by a single undifferentiated count. This is what lets a feature-rich but legitimate document (a government form with field JavaScript, an academic paper with hundreds of embedded-font objects) read as clean while a genuine attack still scores.

  • Exploit — code execution, memory corruption, malware/dropper delivery, confirmed-malicious (AV/threat-intel) hits.
  • Tampering — integrity/authenticity violations: signature forgery, shadow documents, post-signing injection.
  • Deception — content/semantic-determinism manipulation: value-vs-appearance (V/AP) divergence, font glyph remapping, OCR text-layer poisoning, /Alt & /ActualText prompt injection, homoglyphs.
  • Structural / informational — neutral modern-PDF capability & structure (object streams, incremental updates, capability presence). Reported for context; never counted as a threat.

The headline Threat Score = exploit + integrity-tampering, and it drives the verdict band below. Because this is a full forensics tool and not just a malware scanner, a confirmed deception finding (e.g. the displayed value ≠ the stored/extracted value) grades the verdict on its own axis even when the malware threat score is zero — a document that shows one thing to a human and a different thing to a parser or LLM is a first-class finding. Deception and structural scores are reported separately and never inflate the malware verdict.

Severity tiers: Critical (+50 pts) · High (+25 pts) · Medium (+10 pts) · Low (+3 pts) — capped at 3 occurrences per indicator. The Correlation Engine adds +35 to +100 bonus points for dangerous indicator combinations — a single /OpenAction is low-risk, but /OpenAction + obfuscated JavaScript + a known-malicious URL is definitively dangerous.

0
Clean
No exploit, tampering or content-deception findings.
1–29
Low
Minor findings, or active content present with no confirmed exploit. Review before use.
30–149
Suspicious
A real semantic/integrity finding or several indicators. Manual review recommended.
150–349
High Risk
Confirmed exploit, tampering or semantic-determinism attack. Sanitize before opening.
350+
Dangerous
Confirmed execution chain. Do not open in any reader. Sanitize or discard.

Your File Never Leaves Our Server

Uploading a potentially malicious PDF to an online scanner is only sensible if the scanner's security model is trustworthy. PQ PDF is designed around the principle that the scanner must be as safe to use as the file is dangerous.

🗑️
Zero retention
The file is deleted from the server immediately after analysis completes. No copy, hash, or metadata is retained. No database entry of your file.
🔒
Four-layer isolation
Every analysis runs inside prlimit resource limits + AppArmor MAC policy + Linux user/mount/network/PID namespaces + private tmpfs. The file cannot escape its container.
📡
Offline threat intelligence
All 6.4M+ threat indicators are stored in local PostgreSQL databases. No hash, URL, or byte from your file is transmitted to URLhaus, VirusTotal, or any external service.
🤖
Self-hosted AI
The AI report uses a Qwen 2.5 model hosted on a private GPU server. No content is sent to OpenAI, Anthropic, Google, or any third-party AI API.
👤
No account required
No login, no email, no registration. There is no way to link a scan to a user identity because no identity is collected.
📊
No tracking
No Google Analytics, no ad pixels, no third-party scripts. The CSP policy explicitly blocks all external script sources and third-party connections.

Frequently Asked Questions

Scan Your PDF Now — Free
No account. No file size limit. Results in under a minute.
File deleted immediately. Zero data retained.
🔬 Open PDF Forensics Scanner