NO. PDF documents can be created in a variety of ways. PDFs that are
generated from an electronic source – such as a Word document, a computer
generated report, or spreadsheet data – have an internal structure that can be
read and interpreted. These “generated” PDF documents already contain characters
that have an electronic character designation. As such, conversion from such a
PDF can rely on these electronic character designations and provide reliable
output.
PDF documents can also be created through the process of scanning a document
into electronic format. What a scanned document represents is really just a
“picture” of the words contained within that document. In order to convert a
scanned document into an editable format, OCR software is required to analyze
the “image” of each character and match it to an electronic character-based
file. Because of this, it is a much more difficult to ensure that the character
that is “recognized” by the OCR software is the character on the scanned
document. The quality of OCR output is affected by matters such as poor image
quality of the scanned document, mixture of fonts used in the scanned documents,
and italicized and underlined fonts, which may blur the quality and shape of
individual characters.
What is OCR (Optical Character Recognition)?
Optical Character Recognition (OCR) is a visual recognition process that
turns printed or written text into an electronic character-based file. A
document that is scanned and converted into a PDF document provides the basis
for which OCR software may interpret each character image on the PDF and assign
it an electronic character-based file that can then be entered into an editable
format, such as a Text or Word document.
Given the proliferation of scan-to-PDF technology available today,
Investintech’s OCR solutions focus only on the conversion of already created
scanned PDF documents. The quality of the OCR conversion process will largely
depend on the quality of the scanned image and the clarity of the characters of
that image.
The OCR technology that drives Investintech’s OCR-enabled products is
licensed from Nuance, Inc. – a global leader in OCR-based technologies.