Why Performing OCR On Handwriting Doesn’t Work

Unsurprisingly, OCR is consistently a hot topic in PDF and the PDF user mind in general. In paper intense work environments, PDF conversion and OCR engines have proven to be a successful work-around for transferring paper files into word processing applications. Thus, with the help of scanners and the PDF format, any and all types of paper work can be done electronically and efficiently. Or can it?

While trying to integrate and transfer every non-digital working habit into an electronic equivalent, there are still some things that just can’t be done with ease using the same everyday tools. For instance, what about converting hand printed/written documents?

Three Flavours Of OCR

Many of you have probably wondered why such a thing can’t be done with the OCR technology in PDF conversion products. Well, this is because OCR technology and devices are only capable of recognizing the machine printed characters and fonts. And seeing as how the number of documents that are being scanned in are usually typewritten, OCR is employed in almost all cases.

In other cases, there are documents that contain handwritten sections and/or fields that are used for collecting data—a thing being slowly superseded by the fill-able PDF form. You can create a digital copy from such a document simply by scanning it in, right? Yes. However, it requires a different recognition technology altogether. Using OCR, you can perhaps get maybe one letter to “OCR” into ASCII, if it’s printed clearly and written in ink that’s thick enough to be read. But that’s about it. This is where another flavor of OCR comes in: Intelligent Character Recognition.

ICR is a more advanced form of OCR that translates hand printed letters into digital ASCII equivalents. This version of OCR is primarily used for processing applications and forms on which you “print clearly” and place individual letters in boxes. This structured method of reading a hand printed document is one of the major limitations of the technology, but controls and reduces the amount of human errors that cause misinterpretations.

In addition, there are documents that contain handwriting—aka cursive writing. Can recognition on such documents be performed? The answer: Yes. The third flavor of OCR is IR (Intelligent Recognition), the latest generation of OCR technology to date. This is used to read unconstrained writing (text not contained in boxes) and uses the same methods to translate the characters into ASCII text. From my online searching, there are a good number of companies that provide full fledged OCR/ICR/IR solutions, which can be integrated with digital workflows.

Thus, if you’re looking to OCR handwritten PDFs, you’ll be sorely disappointed. The ability to do everything and anything with technology is perhaps the ultimate goal for developers and users. Practicing it, on the other hand, is perhaps the ideal goal for every worker bee out there. It’s sad to say, but there are some cases in which you can only do so much.

Able2Extract v.5.0 Is Here!

We are proud to announce that we have officially launched the upgraded version of our flagship products, Able2Extract and Able2Doc. It’s a whole new version on a whole new level with a whole new look!

New Able2Extract 5.0 Features

This latest 5.0 version is sporting newer, more advanced features that lets you convert your PDF into more formats than ever before. We’ve managed to pack this upgrade with a lot more conversion options. Like what?, you ask. Read on.

First off the list, Able2Extract v.5.0 now offers PDF to Image conversions. Our new PDF to Image converter can generate popular image file formats, such as JPEG, BMP, GIF, PNG, and TIFF. You can designate the output directory, set image DPI and perform black and white conversions.

Second, with Able2Extract v. 5.0, you can now view and convert Microsoft’s new XPS document format. Convert XPS with all the same output features and conversion settings by simply opening and converting the format as you would a regular PDF file.

Third, this latest upgrade can support PDF Forms conversion. You can convert interactive PDF forms to editable Word Documents which you can fill out, save and modify later on. This conversion feature has the ability to retain form elements, such as text fields, radio buttons, and checkboxes.

Our Able2Doc v.4.0 can perform the same PDF Forms to Word conversion, and can also support XPS to Word conversion capabilities. Ideal for those who are only looking to convert to Word and TXT file formats.

Go ahead and sample these new features for yourself. You can download the free trial, in either the Standard or Professional versions, and take it for a test run. For ordering, product , and pricing details, check out our site—it, too, has undergone a bit of remodeling.

Top 10 Reasons To Buy A PDF Converter

When you’re considering whether to buy a PDF converter program or not, making the right choice is difficult, especially if you’re sitting on the fence about why you should purchase a conversion product or not to begin with.

You may think that since you’re not a heavy PDF user you don’t really need one, or that you’ll pay a bundle for one then rarely use the program, or that you can get along fine without one since there are other methods that can save you the money.

Well, if you need a little encouragement to justify the purchase of a PDF converter (and do away with the lingering doubt), here it is—ten reasons for you to buy a PDF converter (in no specific order):

1) PDFs Aren’t Editable

PDF converters are primarily used for making PDF content accessible. Major editing or analysis is what most PDF content requires if the format is used for transmission. PDF converters can save you all the retyping and data input. You can extract PDF content into other editable formats where you can perform the needed analysis easily.

2) Access, Generate And Work In Different Formats

Freeing up the locked down PDF content leads into another benefit that PDF converters provide: choice of format. There are many diverse formats to which the PDF format can now be converted. Word, Excel, PowerPoint, RTF and HTML are just a short list of the common ones which you can generate. It’s ultimately up to you and your work.

3) Going Paperless With Your Files

PDF converters are a simple solution for creating a personal e-filing system. With a PDF converter, you can manage PDF files and document information more effectively. A PDF converter is a good way to keep down the paper consumption and keep your edited work in digital files with the least amount of hassle.

4) The PDF Is A De Facto Standard

What does that mean? By common and popular usage, the PDF is the format professionals turn to when data needs to be kept in tact while being transmitted for review. The PDF is being used across industries, and converting PDF content is inevitably part of that usage. Having a PDF converter will allow you to integrate into such workflows effortlessly.

5) PDF Popularity

Take into consideration that PDFs are now created not just by professionals, but by ordinary end users for ordinary purposes. PDFs are being used on personal webpages for posting documents and miscellaneous content that are impractical as HTML pages. And at one point, you might need to convert those documents in order to use them.

6) Repurposing That PDF Data Completely

Opening PDFs in Adobe Acrobat Professional, you can perform minor editing. However, doing that won’t give you the ability to completely repurpose the content; PDF converters will. You can eliminate those makeshift extractions that constantly leave you frustrated in the end.

7) PDF Converters As A Long Term Solution

Admittedly, free converters online are great for quick, one time conversions. Free trials are also great for trying out products. Yet, neither are great for long term solutions. These converters are oftentimes limited, or will restrict your PDF conversions to being done online. With a proper PDF converter, you’ll have unlimited access and the ability to work offline whenever you choose.

8 ) An Investment That’s Worth The Time And Money

Time matters. The money you spend matters. Yet, if you don’t have a PDF converter, you’ll find yourself spending a lot of both looking for other alternatives, alternatives that are perhaps not the best choice. Buying a good PDF converter is a worthwhile investment. Even if you occasionally use PDFs for research or collaboration, it makes working with those PDFs a lot easier.

9) PDF Converters As Learning Tools

It’s general knowledge that you can benefit from everything you do. Expand on what you know about the PDF by learning how to convert one. You’ll learn more about the ins and outs of the PDF than you normally would without a proper PDF converter.

10) PDF Converter Features

PDF conversion features in most applications go beyond the basic one-time quick conversion, and even increase the quality of your conversions. Batch conversions, OCR technology, page extractions, conversion settings—customize your PDF extractions with versatile features and get more out of the conversions you need.

So if you’re now convinced and ready to buy a PDF converter, start looking!

The ABCs of the PDF: M to O

A lot has happened with the PDF format in the last year—submission for standardization, release of a new specification, software upgrades, and improvements with graphic and dynamic PDFs. In this series posting, you get a look at the PDF’s recent format competition and past legal issues as well as the other uses of PDF related technology. Here it is.

Macromedia

Adobe Systems, Inc. acquired Macromedia Inc. in 2006 and has, since then, injected Macromedia technology into their software. However, Adobe and Macromedia had come into close, legal contact even before the acquisition—over patent disputes.

The patent dispute according to past articles in early 2000-2002, was over a tabbed palette interface element that was awarded to Adobe. The issue dated back to 1996, right up until 2000, during which time Adobe had confronted Macromedia about the palette’s inclusion in several of the company’s products.

Yet Macromedia’s argument against the suit, filed in August of 2000, was that the patent was invalid. This escalated to a point where Macromedia countersued against Adobe in September 2000 for infringing on three of Macromedia’s own patents. After two years of back and forth legal battles, Adobe won the lawsuit and was awarded 2.8 million.

And five years later, Macromedia is now one of Adobe’s acquisitions. . . .

Native PDFs

As you know, native PDFs are ones that are generated from electronically created documents. Yet, while these native PDFs are beneficial when it comes to conversion, they can also produce just as much legal hubbub as patent disputes can. Moving the ability to create PDF files, or PDF-like formats, directly into the authoring application was definitely a complex issue that became a major headliner in PDF news this year.

Back in February, I wrote three postings on factors that made creating digital documents and native PDFs a more significant matter than ever before. There were the legal issues between Adobe and Microsoft; the PDF specification submission to ISO; and then, there was OpenOffice.org, Microsoft’s word processing app rival whose applications sport ODF creation, a format that became a statewide standard in Massachusetts.

Creating native PDFs and PDF-like formats now involves more politics at the authoring application level. Microsoft has the convenience of a widely used platform, Adobe has the ubiquity as de facto standard, and OpenOffice has the state of Massachusetts. Creating a native PDF, or PDF-like format is now, in one sense, a matter of “moral??? choice: are you an Acrobat advocate, a loyal MS Office user, or an open source supporter?

OCR

You know it by its three letter acronym, you know what it does when it comes to converting scanned PDF files. Yet, as a software that literally recognizes and translates digitally imaged characters into character codes (ASCII or Unicode), OCR isn’t just for converting scanned PDFs.

OCR has been used for a wide range of data processing systems. It’s been used by the Standard Oil Company of California for credit card imprints for billing purposes. At the Ohio Bell Telephone Company, OCR was used for reading bill stubs. Even the United States Air Force used OCR for reading and transmitting typewritten messages.

Another big use for OCR technology is postal office work. The first use of OCR in Europe was by the British General Post Office for automating the mail sorting process. OCR scanners read the routing barcodes marked upon the envelopes that are based on corresponding postal codes, resulting in faster organization and shipment times. In 1965, the United States Postal Services adopted the method, followed by Canada Post in 1971.

Today, OCR is being further enhanced as a data input method ranging from simple text to digital scanning processes to sophisticated ICR (Intelligent Character Recognition), a more advanced version of OCR that recognizes hand printed documents.

Whether the PDF world is buzzing with long standing issues from the past or just slowly unfolding with new developments, the PDF world, can be an interesting place, indeed.

The ABCs Of The PDF: J-L

First off, my apologies. It has been awhile since the last posting in this series, and there is still yet more to uncover and discover. So for this posting, its about the figures behind the PDF world and information behind PDF links and some tips. If you’re stalking up on some backgrounder bits and PDF facts in preparation for the 2007 Conference, then by all means, read on.

John E. Warnock

The name, the man, the father behind the PDF, Dr. John E. Warnock. Along with Charles Geschke, he co-founded Adobe Systems, Inc. in 1982. Before that, he worked as a computer scientist at Xerox Palo Alto Research Center (PARC) in 1978, where he first developed the Post Script language (aka “Interpress” in its early Camelot development), the page description language which would be the building blocks of the PDF.

For the first two years of the company, Warnock served as president of Adobe Systems Inc. and then was CEO until his retirement from that position in 2000. Warnock has led an active career of achievements in computer technology, being highly distinguished in numerous associations and winner of countless awards for his innovation and influence. He is now co-chairman of Adobe with Charles Geshcke.

Also interesting to note is that he has an Adobe typeface named after him—Warnock Pro. Here is a quoted description: “Warnock Pro’s structure is both rational and dynamic, striking a balance between innovation and restraint.” I wonder if the look of the typeface design reflects his personality. . . .

Kurt Foss

As regular day-to-day PDF user, you hardly notice how quickly names can stick in your head—Kurt Foss being one of them. I first came upon Kurt Foss in the beginning of my PDF research, when I read many blogs and searched many articles (I still do). And his were among the top of the research piles.

And doing a bit of research on Foss himself, you’ll find that he has a major presence in the PDF world too. As a long time veteran of the PDF industry, his attraction to the PDF world was a passion for the format, a curiosity about the potential uses and users of the format: “I became increasingly aware of looming technological changes that seemed poised to change the way my profession worked. So I immersed myself in learning about it.” One of his most “notable notables,” in fact, was being one of the first to globally publish newspaper pages in the PDF format, experimenting with how the digital format could work for the printed word.

Foss started out as an Adobe evangelist with the company from 1993-2003 when Acrobat was still in its first version. Since then, he has been web editor of both PDFZone and Planet PDF, and has written articles on issues surrounding the uses of the PDF format. Currently, Foss is the online editor of the Acrobat User Community, a site you may have heard of or been to at one point or another for resources and tips. He has written numerous posts, commenting, reviewing and reporting anything and everything having to do with PDF.

Links

Linking within a PDF file itself is a great way in which to include more background content within your file. Yet, the hypertext link, itself, has a background history of its own, one associated with Vannevar Bush’s influential work ,“As We May Think.”

The paper contained the first rough concept of the computer, called the Memex, an idea which inspired the creation of the actual hypertext as we know it today. The term “hypertext” was first coined by Ted Nelson in 1965, and its invention is usually accredited to him and American scientist, Douglas Engelbart. In 1968, with Engelbart’s historic “Mother of All Demos,” the first hypertext interface was demoed. And by 1980, Tim Berners-Lee, also another famous name, created a hypertext database system, a system created out of a motivation that became the same driving force behind the World Wide Web and the Internet—to meet the demand of automatic information sharing. The implementation of such hypertext link databases in the late 80’s eventually led to the first stages of the World Wide Web.

Of course, needless to say, when you add links within your PDF, you create the same mini-network of information resources and sites. However, behind the convenience of endless information resources, is the frustration of usability. Web usability guru, Jakob Nielsen , has a few words on the use of links and the PDF, which might come in handy in making that PDF user-friendly and informative.

Hope this helps out with that small talk. ‘Till next time!