Scanned PDF documents are some of the most nefarious types of documents in terms of accessibility. But why is that? Because, in most cases, if one scans a document directly to PDF, or scans and then converts it to PDF, the document will be transferred as a large image file. Each page will be made up of one large image, containing all the text, tables, images, and graphics. Also, the text on the page will not be searchable, nor selectable.
If you want to make a scanned document accessible, you must first convert the image of the document into "real" text. This means that the text must be selectable and scalable. You can accomplish this by using Optical Character Recognition (OCR) software. If the PDF version is also to be your accessible version, you will need to add additional accessibility mark-up. These mark-ups or “tags” serve to provide alternative text for images, graphs, charts and to add header information to data tables. This is important because, in most cases, without tagging the document the text generated from a scanned image is converted into unexpected segments. These segments can be out-of-order, in terms of the expected read-order of the document. You will need to perform several checks to ensure the correct read-order is established once your document is converted.
To make sure you can properly edit a scanned PDF document:
Basically, you have two choices for creating your PDF document from paper. You can scan your document into an image file (typically a TIFF) and then convert the image file into a PDF. Or you can scan directly into PDF using the "Create PDF from Scanner" option in Acrobat; using your scanner's PDF conversion option; or using commercially available PDF conversion software.
Fans of error-free conversion will find Investintech’s Able2Extract PDF Professional satisfactory, as it is well-known for its reliable OCR capabilities and is a popular choice for businesses and business professionals whose work often involves scanned (image) PDF conversion.
PDF experts recommend using the first option scanning to a TIFF and then importing into Acrobat or your PDF creation software. By separating the steps, you can focus first on creating clean, high-quality scans of the document and then worry about converting to an accessible PDF. If you process your documents directly to PDF, you may need to do several rescans at different DPI and different settings, before you have a PDF that can be successfully manipulated. However, once your settings have been established, we found little difference between creating a TIFF and scanning directly to PDF.
Regardless of how you scan your document, you will need to do some follow-up after the PDF version has been created to add accessible features. This can be a very simple process, for simple documents, or a very lengthy and complex process, for complex documents, and much depends on what software you have available.