Converting Scanned AutoCAD PDFs With OCR

As the new 2008 year rolls on, so does the work and no doubt, the PDF conversions as well. Don’t worry, we’re at it too. And every now and again, amidst troubleshooting and developing, we get an email from clients having difficulties with AutoCAD PDFs:

“I downloaded and installed your Pro version as a trial.  When I tried to convert a PDF file which was an AutoCAD drawing scanned and saved as such, it seems as if it was working but it opens Excel and nothing is converted in?”

If you’re experiencing or have experienced the same problem without any luck, don’t give up yet. Here’s a conversion tip: try resizing the image-based/scanned PDF.

This is because AutoCAD files are usually created with huge page dimensions that measure up to 30″ by 40″. In addition, it is difficult for the OCR engine to determine the size (in points) of any letter on an OCR page.  So the OCR engine is oftentimes unable to extract legible text from AutoCAD documents due to the small text size (hence the empty Excel output).

The only way it can determine the size of the text is by comparing it relative to the size of a stated PDF page which the OCR engine can read and support. The OCR engine in Able2Extract Professional can only support AutoCAD file dimensions of up to 22″ by 22″.

To resize the PDF:

1) Open the PDF in either Adobe Reader or Acrobat

2) Select File > Print

3) Change the Printer Name to ‘Adobe PDF’ in the drop box

4) Under the Page Scaling section ensure that ‘Choose Paper Source by PDF page size’ is deselected

AutoCad Print

5) Click OK to print a new PDF

You can also resize the PDF with our trial version of Sonic PDF Creator 2.0.  After installing Sonic, select ‘Sonic PDF’ as a printer (as opposed to Adobe PDF in step 3).

After you’ve resized the PDF, try the conversion again.

Hope this tip helps!

Why Performing OCR On Handwriting Doesn’t Work

Unsurprisingly, OCR is consistently a hot topic in PDF and the PDF user mind in general. In paper intense work environments, PDF conversion and OCR engines have proven to be a successful work-around for transferring paper files into word processing applications. Thus, with the help of scanners and the PDF format, any and all types of paper work can be done electronically and efficiently. Or can it?

While trying to integrate and transfer every non-digital working habit into an electronic equivalent, there are still some things that just can’t be done with ease using the same everyday tools. For instance, what about converting hand printed/written documents?

Three Flavours Of OCR

Many of you have probably wondered why such a thing can’t be done with the OCR technology in PDF conversion products. Well, this is because OCR technology and devices are only capable of recognizing the machine printed characters and fonts. And seeing as how the number of documents that are being scanned in are usually typewritten, OCR is employed in almost all cases.

In other cases, there are documents that contain handwritten sections and/or fields that are used for collecting data—a thing being slowly superseded by the fill-able PDF form. You can create a digital copy from such a document simply by scanning it in, right? Yes. However, it requires a different recognition technology altogether. Using OCR, you can perhaps get maybe one letter to “OCR” into ASCII, if it’s printed clearly and written in ink that’s thick enough to be read. But that’s about it. This is where another flavor of OCR comes in: Intelligent Character Recognition.

ICR is a more advanced form of OCR that translates hand printed letters into digital ASCII equivalents. This version of OCR is primarily used for processing applications and forms on which you “print clearly” and place individual letters in boxes. This structured method of reading a hand printed document is one of the major limitations of the technology, but controls and reduces the amount of human errors that cause misinterpretations.

In addition, there are documents that contain handwriting—aka cursive writing. Can recognition on such documents be performed? The answer: Yes. The third flavor of OCR is IR (Intelligent Recognition), the latest generation of OCR technology to date. This is used to read unconstrained writing (text not contained in boxes) and uses the same methods to translate the characters into ASCII text. From my online searching, there are a good number of companies that provide full fledged OCR/ICR/IR solutions, which can be integrated with digital workflows.

Thus, if you’re looking to OCR handwritten PDFs, you’ll be sorely disappointed. The ability to do everything and anything with technology is perhaps the ultimate goal for developers and users. Practicing it, on the other hand, is perhaps the ideal goal for every worker bee out there. It’s sad to say, but there are some cases in which you can only do so much.

Able2Extract v.5.0 Is Here!

We are proud to announce that we have officially launched the upgraded version of our flagship products, Able2Extract and Able2Doc. It’s a whole new version on a whole new level with a whole new look!

New Able2Extract 5.0 Features

This latest 5.0 version is sporting newer, more advanced features that lets you convert your PDF into more formats than ever before. We’ve managed to pack this upgrade with a lot more conversion options. Like what?, you ask. Read on.

First off the list, Able2Extract v.5.0 now offers PDF to Image conversions. Our new PDF to Image converter can generate popular image file formats, such as JPEG, BMP, GIF, PNG, and TIFF. You can designate the output directory, set image DPI and perform black and white conversions.

Second, with Able2Extract v. 5.0, you can now view and convert Microsoft’s new XPS document format. Convert XPS with all the same output features and conversion settings by simply opening and converting the format as you would a regular PDF file.

Third, this latest upgrade can support PDF Forms conversion. You can convert interactive PDF forms to editable Word Documents which you can fill out, save and modify later on. This conversion feature has the ability to retain form elements, such as text fields, radio buttons, and checkboxes.

Our Able2Doc v.4.0 can perform the same PDF Forms to Word conversion, and can also support XPS to Word conversion capabilities. Ideal for those who are only looking to convert to Word and TXT file formats.

Go ahead and sample these new features for yourself. You can download the free trial, in either the Standard or Professional versions, and take it for a test run. For ordering, product , and pricing details, check out our site—it, too, has undergone a bit of remodeling.

Top 10 Reasons To Buy A PDF Converter

Top reasons to buy a desktop PDF converter software

When you’re considering whether to buy a PDF converter program or not, making the right choice is difficult, especially if you’re sitting on the fence about why you should purchase a conversion product or not to begin with.

You may think that since you’re not a heavy PDF user you don’t really need one, or that you’ll pay a bundle for one then rarely use the program, or that you can get along fine without one since there are other methods that can save you the money.

Well, if you need a little encouragement to justify the purchase of a PDF converter (and do away with the lingering doubt), here it is—ten reasons for you to buy a PDF converter (in no specific order):

1) PDFs Aren’t Editable

PDF converters are primarily used for making PDF content accessible. Major editing or analysis is what most PDF content requires if the format is used for transmission. PDF converters can save you all the retyping and data input. You can extract PDF content into other editable formats where you can perform the needed analysis easily.

2) Access, Generate And Work In Different Formats

Freeing up the locked down PDF content leads into another benefit that PDF converters provide: choice of format. There are many diverse formats to which the PDF format can now be converted. Word, Excel, PowerPoint, RTF and HTML are just a short list of the common ones which you can generate. It’s ultimately up to you and your work.

3) Going Paperless With Your Files

PDF converters are a simple solution for creating a personal e-filing system. With a PDF converter, you can manage PDF files and document information more effectively. A PDF converter is a good way to keep down the paper consumption and keep your edited work in digital files with the least amount of hassle.

4) The PDF Is A De Facto Standard

What does that mean? By common and popular usage, the PDF is the format professionals turn to when data needs to be kept in tact while being transmitted for review. The PDF is being used across industries, and converting PDF content is inevitably part of that usage. Having a PDF converter will allow you to integrate into such workflows effortlessly.

5) PDF Popularity

Take into consideration that PDFs are now created not just by professionals, but by ordinary end users for ordinary purposes. PDFs are being used on personal webpages for posting documents and miscellaneous content that are impractical as HTML pages. And at one point, you might need to convert those documents in order to use them.

6) Repurposing That PDF Data Completely

Opening PDFs in Adobe Acrobat Professional, you can perform minor editing. However, doing that won’t give you the ability to completely repurpose the content; PDF converters will. You can eliminate those makeshift extractions that constantly leave you frustrated in the end.

7) PDF Converters As A Long Term Solution

Admittedly, free online converters are great for quick, one time conversions. Free trials are also great for trying out products. Yet, neither are great for long term solutions. These converters are oftentimes limited, or will restrict your PDF conversions to being done online. With a proper PDF converter, you’ll have unlimited access and the ability to work offline whenever you choose.

8 ) An Investment That’s Worth The Time And Money

Time matters. The money you spend matters. Yet, if you don’t have a PDF converter, you’ll find yourself spending a lot of both looking for other alternatives, alternatives that are perhaps not the best choice. Buying a good PDF converter is a worthwhile investment. Even if you occasionally use PDFs for research or collaboration, it makes working with those PDFs a lot easier.

9) PDF Converters As Learning Tools

It’s general knowledge that you can benefit from everything you do. Expand on what you know about the PDF by learning how to convert one. You’ll learn more about the ins and outs of the PDF than you normally would without a proper PDF converter.

10) PDF Converter Features

PDF conversion features in most applications go beyond the basic one-time quick conversion, and even increase the quality of your conversions. Batch conversions, OCR technology, page extractions, conversion settings—customize your PDF extractions with versatile features and get more out of the conversions you need.

So if you’re now convinced and ready to buy a PDF converter, start looking!

The ABCs Of The PDF: J-L

First off, my apologies. It has been awhile since the last posting in this series, and there is still yet more to uncover and discover. So for this posting, its about the figures behind the PDF world and information behind PDF links and some tips. If you’re stalking up on some backgrounder bits and PDF facts in preparation for the 2007 Conference, then by all means, read on.

John E. Warnock

The name, the man, the father behind the PDF, Dr. John E. Warnock. Along with Charles Geschke, he co-founded Adobe Systems, Inc. in 1982. Before that, he worked as a computer scientist at Xerox Palo Alto Research Center (PARC) in 1978, where he first developed the Post Script language (aka “Interpress” in its early Camelot development), the page description language which would be the building blocks of the PDF.

For the first two years of the company, Warnock served as president of Adobe Systems Inc. and then was CEO until his retirement from that position in 2000. Warnock has led an active career of achievements in computer technology, being highly distinguished in numerous associations and winner of countless awards for his innovation and influence. He is now co-chairman of Adobe with Charles Geshcke.

Also interesting to note is that he has an Adobe typeface named after him—Warnock Pro. Here is a quoted description: “Warnock Pro’s structure is both rational and dynamic, striking a balance between innovation and restraint.” I wonder if the look of the typeface design reflects his personality. . . .

Kurt Foss

As regular day-to-day PDF user, you hardly notice how quickly names can stick in your head—Kurt Foss being one of them. I first came upon Kurt Foss in the beginning of my PDF research, when I read many blogs and searched many articles (I still do). And his were among the top of the research piles.

And doing a bit of research on Foss himself, you’ll find that he has a major presence in the PDF world too. As a long time veteran of the PDF industry, his attraction to the PDF world was a passion for the format, a curiosity about the potential uses and users of the format: “I became increasingly aware of looming technological changes that seemed poised to change the way my profession worked. So I immersed myself in learning about it.” One of his most “notable notables,” in fact, was being one of the first to globally publish newspaper pages in the PDF format, experimenting with how the digital format could work for the printed word.

Foss started out as an Adobe evangelist with the company from 1993-2003 when Acrobat was still in its first version. Since then, he has been web editor of both PDFZone and Planet PDF, and has written articles on issues surrounding the uses of the PDF format. Currently, Foss is the online editor of the Acrobat User Community, a site you may have heard of or been to at one point or another for resources and tips. He has written numerous posts, commenting, reviewing and reporting anything and everything having to do with PDF.

Links

Linking within a PDF file itself is a great way in which to include more background content within your file. Yet, the hypertext link, itself, has a background history of its own, one associated with Vannevar Bush’s influential work ,“As We May Think.”

The paper contained the first rough concept of the computer, called the Memex, an idea which inspired the creation of the actual hypertext as we know it today. The term “hypertext” was first coined by Ted Nelson in 1965, and its invention is usually accredited to him and American scientist, Douglas Engelbart. In 1968, with Engelbart’s historic “Mother of All Demos,” the first hypertext interface was demoed. And by 1980, Tim Berners-Lee, also another famous name, created a hypertext database system, a system created out of a motivation that became the same driving force behind the World Wide Web and the Internet—to meet the demand of automatic information sharing. The implementation of such hypertext link databases in the late 80’s eventually led to the first stages of the World Wide Web.

Of course, needless to say, when you add links within your PDF, you create the same mini-network of information resources and sites. However, behind the convenience of endless information resources, is the frustration of usability. Web usability guru, Jakob Nielsen , has a few words on the use of links and the PDF, which might come in handy in making that PDF user-friendly and informative.

Hope this helps out with that small talk. ‘Till next time!