Setting parameters for PDF Conversion (Absolute PDF & Able2Extract Server only.)

This section describes the meaning of the parameters shown on the Options dialog (Fig. 9).

Conversion type

Select the conversion type from the list of six possible types:

The default value is MS Word Standard.

Common parameters

This section includes parameters which are applicable to all conversion types

Auto-Spacing between words
Some PDF documents are created so that their internal structure does not demarcate "spaces" between words, even though the viewable PDF page does contain spaces between words. As such, the Accumax CT conversion engine automatically adds spaces between document patterns (i.e. words) as a default setting.

In certain cases, such as the case of rarified Justify alignment, the Auto-Spacing Between Words default can result in the insertion of extra spaces between words and a poor conversion results. In these cases, conversion results may improve if the Auto-Spacing Between Words setting is deselected.

Eliminate Repeated Asterisk (*) Characters
Documents occasionally have a line of repeated asterisk characters, which may interfere with PDF conversion results. The Eliminate Repeated Asterisk (*) Characters setting allows the users to replace repeated asterisks (more than 2, **) with the following: "*** ".

Retain Dollar Sign ($) as Separate Symbol
Use this option to retain separate "$" symbols in the converted document. In certain cases, the dollar sign placement is located far from the corresponding number – this makes it difficult for the Accumax CT conversion engine (and sometimes even for the human eye) to match the symbol in the appropriate location. Selecting this option ensures that the dollar sign symbol is retained in for the converted document.

Horizontal Gap between Patterns during Selection
This option allows you to change the minimal gap between patterns and columns during the selection and conversion. Sometimes the distances or gaps between several different patterns in a PDF document are too small – and they are treated as one pattern by our conversion engine.
It can cause problems e. g. for PDF to Excel conversion when two patterns are merged and placed into the same Excel column instead of putting them in two different columns.

OCR engine usage
This option allows you to specify the OCR engine implementation.
There are 3 possible variants of the OCR implementation:

Comment
It is recommended to use Force OCR option only for special Watch folders or Email addresses intended to process scanned documents only. These special Watch folders and Email addresses should be supplementary to the standard ones (i.e. with the OCR engine usage option set to Smart) and should be used for the PDF documents processed in a wrong way with the Smart option.

MS Word Conversion Parameters
This section includes parameters which are applicable to conversion to MS Word only

Page Margin Value
Use the Page Margin Value setting to change the size of the printable margins for a Word document converted from a PDF document.
The default Page Margin Value is 0.00 inches – this is chosen because it provides the best positional output when converting a document from PDF to Word.
Certain office printers cannot print the whole page area of a PDF – i.e. 0.00 inch margins on a page will not print. If this is the case, the Page Margin Value allows you to set the printable margins appropriate for your printer.
A Page Margin Value of 0.2-0.5 inches will generally work best on most printers

Column (Newspaper) Paragraph Minimum Width (applicable only for MS Word Standard)
Many PDF documents are formatted with column paragraphs, or newspaper-style paragraphs. To assist in the recognition and conversion from PDF to Word for these types of paragraphs, the user can designate the minimum width for column/newspaper paragraphs.

The Accumax CT conversion engine contains complex algorithms for differentiating between table columns and paragraph columns – however, in some cases, it is very difficult for the engine to distinguish between these two types of paragraph. The Column (Newspaper) Paragraph Minimum Width setting allows users to improve conversion results by providing input regarding a PDF document’s structure.

E.g. if it is known that a given document does not have any column/newspaper paragraphs with column widths of less than 2.00 inches, changing the Column (Newspaper) Paragraph Minimum Width to 2.00 inches will prevent the Accumax CT engine from treating some table columns as column/newspaper paragraphs.
The default for this setting is 1.00 inch

Place all Images in Background (not applicable for MS Word Simple)
This setting will place all images identified within a PDF document onto the background, as a background image, in the converted Word document. By default, Accumax CT adds images (such as JPG or BMP files) as MS Word pictures, so that you can format each image separately or change/move their position within the document.

In certain cases dealing with masked images, the conversion into Word may not be properly rendered based on our default setting. In certain other cases, attributable to an inappropriate Z-order allocation within the PDF source, images may also be incorrectly displayed – such as problems with image borders or disappearing images. In both of these cases, selecting this option of adding all images to the background image may avoid problems in the image display for the converted document.

Vector Graphics as Background Image (not applicable for MS Word Simple)
This option converts all PDF vector graphics objects within a page into a background image. What are vector graphics? Generally, there are two kinds of graphical objects in PDF documents – pixel-based images and vector graphic images (consisting of lines and shapes).
Pixel-based images that are viewed in Word may result in varying resolution – if the image is resized or redrawn, MS Word will attempt to add/remove pixels based on an algorithm. In most cases, the result is a loss in image quality – for instance, thin lines might disappear under low resolution.

Vector graphics, on the other hand, in PDF may be drawn at any resolution – which may result in better conversion results upon conversion into Word. However, in some cases, the number of vector objects comprising an image may be high, which may compromise the conversion for a particular Word document.

In cases where the rendering of vector graphics poses problems in the display of the conversion output, this option allows the user to display all vector graphics as a background image. By doing so, it ensures the integrity of the vector graphic image – although the drawback is that the vector image is not easily moved within the Word document.

Use Black/White for unsupported Color Schemes
Certain color schemes used in PDF documents are not currently supported by our PDF viewer – by default the conversion is conducted with a random color scheme in such cases. This option allows you to select a Black/White color palette for such situations where the color scheme is not supported.

Excel Output Options
This section includes parameters which are applicable to conversion for Excel Auto only

Trailing Minus Sign (“ – ”) Treatment
In certain financial documents or reports, the minus sign symbol trails to the right of the number that it is associated with (e.g. “4,560-” instead of “-4,560”). Converting negative numbers where the minus sign trails to the right of the number may cause the resulting conversion to Excel to place these numbers as text items, rather than number items.

Use this option to move trailing minus signs from the end of the number to the beginning of the number, to prevent such instances from being converted to textual items in Excel.

European Continental Settings (1.234.567,89 = 1 234 567.89)
In North America, the decimal point is a period – which separates the integer portion of a number from the fractional portion – and the thousand separator is a comma. In certain other countries, the reverse is true: a decimal comma is used to separate the integer portion of a number from the fractional portion, and a period is used as the thousand separator.

This option allows you to convert documents that adopt the decimal comma and “period” thousand separator (referred to here as European continental settings) to Excel formatted numbers correctly.