Date: 2008-08-13
Scanned PDF documents are some of the most nefarious types of document in terms of accessibility to users of assistive technology. Some would wonder, why? Because, in most cases, if one scans a document directly to PDF, or scans and then converts it to PDF, the document will be transferred as a large image file. Each page will be made up from one large image, containing all the text, tables, images, and graphics. Also, the text on the page is not searchable, neither selectable.
If you want to make a scanned document accessible, you must first convert the image of the document into "real" text. This means that the text must be selectable and scalable. You can accomplish this by using Optical Character Recognition (OCR) software. If the PDF version is also to be your accessible version, you will need to add additional accessibility mark-up, adding "tags", adding alternative text for images, graphs, and charts, and adding header information to data tables. In most cases, text generated from a scanned image of a document is converted into unexpected segments. These segments can be out-of-order, in terms of the expected read-order of the document. You will need to perform several checks to insure correct read-order is established, once your document is converted.
To make sure you can properly edit a scanned PDF document, be careful when scanning:
- Place your document on the scanner bed as straight as possible.
- Press the scan button on the scanner front or open your scanner software and choose "acquire image"
- Scanning in black and white is the best option, unless you really need to maintain color. OCR works better on black and white documents because color can shade text and cause read errors
- Save the scanned document as an image. The best format is TIFF (it is very large, but it maintains the highest quality graphics)
Basically, you have two choices for creating your PDF document from paper. You can scan your document into an image file (typically a TIFF) and then convert the image file into a PDF. Or you can scan directly into PDF using the "Create PDF from Scanner" option in Acrobat; using your scanner's PDF conversion option; or using commercially available conversion software.
PDF experts tend to suggest using the first option scanning to a TIFF and then importing into Acrobat or your PDF creation software. By separating the steps, you can focus first on creating clean, high-quality scans of the document and then worry about converting to an accessible PDF. If you process your documents directly to PDF, you may need to do several rescans at different DPI and different settings, before you have a PDF that can be successfully manipulated. However, once your settings have been established, we found little difference between creating a TIFF and scanning directly to PDF.
Regardless of how you scan your document, you will need to do some follow-up after the PDF version has been created to add accessible features. This can be a very simple process, for simple documents, or a very lengthy and complex process, for complex documents, and much depends on what software you have available.