About OCR PDF
This tool runs Tesseract 5 LSTM neural network OCR on your PDF. Each page is rendered to a high-resolution image, then OCR'd to extract recognised text. Best for scanned documents, photographed pages, and image-based PDFs. For PDFs that already have a selectable text layer, use Extract Text instead — it is faster and more accurate for those files.
Processing Time
OCR is compute-intensive. Expect 2–8 seconds per page depending on DPI and page complexity. A 10-page document at 200 DPI typically completes in under 60 seconds. Processing runs entirely server-side — your files are deleted immediately after download.
Drop your scanned PDF here or click to browse
Scanned PDFs, image-based PDFs, photographed documents — up to 50 MB
Initialising OCR…
Tesseract 5 LSTM Engine
State-of-the-art neural network OCR — trained on millions of document samples for high character recognition accuracy.
Searchable PDF Output
Adds an invisible text layer to your scanned images — the original appearance is preserved while text becomes copyable and searchable.
150 / 200 / 300 DPI Control
Match rendering resolution to your scan quality. 200 DPI is the recommended balance; 300 DPI maximises accuracy for small or faded text.
4 Page Segmentation Modes
Auto, single column, single block, and sparse text — choose how Tesseract reads your page layout for better results on forms, receipts, and columns.
In-Browser Text Preview
Read the extracted text directly in the results panel without downloading — see immediately whether OCR succeeded before saving the file.
Confidence Score & Word Count
Every job returns per-word Tesseract confidence averaged across all pages, plus word count and character count — so you know how well OCR performed.
Custom Page Ranges
Target specific pages (e.g. 1–3, 5, 8–12) rather than the entire document — saves time on long scanned books where you only need certain pages.
Up to 100 Pages Per Job
Processes entirely server-side — no browser memory limits. Pages are handled one at a time to prevent disk exhaustion on large documents.
Zero Retention
Your file and all OCR output are deleted from the server immediately after the download begins — nothing is stored, logged, or retained.