How to edit a scanned PDF document

Scanned PDF documents are some of the most nefarious types of document in terms of accessibility to users of assistive technology. But why is that? Because, in most cases, if one scans a document directly to PDF, or scans and then converts it to PDF, the document will be transferred as a large image file. Each page will be made up of one large image, containing all the text, tables, images, and graphics. Also, the text on the page is not searchable, nor selectable.

If you want to make a scanned document accessible, you must first convert the image of the document into "real" text. This means that the text must be selectable and scalable. You can accomplish this by using Optical Character Recognition (OCR) software. If the PDF version is also to be your accessible version, you will need to add additional accessibility mark-up, adding "tags", adding alternative text for images, graphs, and charts, and adding header information to data tables. In most cases, text generated from a scanned image of a document is converted into unexpected segments. These segments can be out-of-order, in terms of the expected read-order of the document. You will need to perform several checks to insure the correct read-order is established once your document is converted.

To make sure you can properly edit a scanned PDF document:

Basically, you have two choices for creating your PDF document from paper. You can scan your document into an image file (typically a TIFF) and then convert the image file into a PDF. Or you can scan directly into PDF using the "Create PDF from Scanner" option in Acrobat; using your scanner's PDF conversion option; or using commercially available PDF conversion software.

PDF experts tend to suggest using the first option scanning to a TIFF and then importing into Acrobat or your PDF creation software. By separating the steps, you can focus first on creating clean, high-quality scans of the document and then worry about converting to an accessible PDF. If you process your documents directly to PDF, you may need to do several rescans at different DPI and different settings, before you have a PDF that can be successfully manipulated. However, once your settings have been established, we found little difference between creating a TIFF and scanning directly to PDF.

Regardless of how you scan your document, you will need to do some follow-up after the PDF version has been created to add accessible features. This can be a very simple process, for simple documents, or a very lengthy and complex process, for complex documents, and much depends on what software you have available.

Quick Site Links

PDF Converter Products

Convert PDF to Excel, Word and More with Able2Extract 8!

Able2Extract Professional - Convert Scanned PDF-OCR to Word, Excel + more.

Convert PDF to Word with Able2Doc 7.

Able2Doc Professional - PDF to Word Converter, including scanned PDF conversion.

Sonic PDF Creator - Create PDF files from any Windows application.