[vc_row][vc_column width=”1/1″][mk_padding_divider size=”20″][mk_dropcaps style=”fancy-style”]D[/mk_dropcaps][vc_column_text disable_pattern=”true” align=”left” margin_bottom=”0″]
OCR (Text recognition)
A scanned document is an assembly of digital “photos” of all the pages. Humans can easily read and understand the text just by looking at it, but a computer can’t make much out of it apart from just displaying it on the screen.
To make use of the actual text, the software must first run the document through a process called OCR: Optical Character Recognition. This technology enables computers to analyze and interpret scanned images, and convert them to real electronic text.
OCR increase the value of your scanned documents by making content searchable and reusable.
Searchable PDFs
OCR is important when scanning documents to PDF, because it will make your PDFs searchable. This will allow your Document Management System to index the documents, so that you can quickly search for and retrieve them from the database later on.
PixEdit® software stores both the electronic text and the scanned image in the PDF. We call that “hidden” text. This means that documents will be fully searchable and the text reusable, while still preserving the visual appearance of the original.
Reusing text
OCR is also useful for other purposes:
- Quickly copy text from a scanned document to another application, for example Word, Excel, PowerPoint, Outlook, etc.
- Export to a text file and import it in other applications
- Help you add PDF Bookmarks more quickly
- Forms Processing and extracting metadata
Tutorial Videos
OCR and searchable PDFs (YouTube link)
Creating PDF bookmarks using OCR (YouTube link)
OCR Software
PixEdit Desktop
Image processing with OCR functionality to create searchable PDFs and reuse the text
PixEdit Server
Automatic image processing with OCR functionality to create searchable PDFs
PixEdit Converter Server
Automatic file conversion with OCR functionality to create searchable PDFs
[/vc_column_text][/vc_column][/vc_row]