Archiving PDF

With the PDF, you've done away with the old filing cabinets and large volumes of paper. Yet now you need a proverbial electronic filing cabinet that can stand the test of time - a format that can ensure the preservation of the contents and their accessibility in the future and, most importantly, the retrieval of intact files. One step in this direction is the PDF/A.

What is PDF/A (Archive)?

PDF/A is the ISO standard for long-term archiving. It was approved as an international standard by the International Organization for Standardization (ISO) in 2005. The PDF/A is a subset of the PDF standard and basically defines a profile on how to use Adobe PDF format for the long-term preservation of electronic documents. Based on current PDF formatting, the PDF/A standard is applicable to documents with characters, bitmap and vector images.

In order for PDF/A documents to endure long term preservation and save their visual appearance over time, they need to be completely self-contained. This implies that, unlike the regular PDF format, a PDF/A document can't be dependent on information from other external sources such as font programs and hyperlinks. That doesn’t mean that the document cannot contain links, but it means that they can’t direct users to another active destination.

A PDF file that is "PDF/A compliant" is one that contains all the components required to view it and that contain metadata (data which describes information about the file itself). Thus, fonts need to be embedded, metadata has to be standardized, passwords and encryption can't be used. Even some regular PDF functionalities, such as those dealing with sound and movie actions, are restricted from the PDF/A.

In general, the standard specifies in detail what kind of content is allowed and what kind is restricted. These guidelines enhance a PDF's capability and practicality of being maintained and modified to the latest PDF standard.

Why the PDF/A?

This need for permanence is a key essential for sectors dealing with collections of data such as newspaper, library, government and legal industries. All deal with textual data that is integral to the general public. These industries rely on the long term availability of information.

The reasons for implementing an archiving strategy with the PDF format lie in the advantages of the PDF format itself. Recall that the PDF is platform-independent, can be viewed on any computing device and is considered a universal format for its accessibility and compression abilities. The combination of these features ensures that the documents can be both archived and accessed efficiently and faithfully in the future. With the search for a format that provides consistent and predictable rendering, PDF proves to be the best choice.

Obstacles

Yet, PDF/A is only one step in the archiving strategy. The standard doesn't guarantee a permanent and complete solution in itself. For instance, it can't presently guarantee that ten years into the future archived documents will display the information as it was at the time of PDF creation.

Also at issue is that the PDF/A standard must also consider its users on an international level. Document requirements are individually determined by an organization's needs. These generally include concerns regarding authorization, management, preservation policies and the creation of records.

For instance, there are variants of the standard that outline different text extraction requirements for PDF/A compliant documents. PDF/A-1a regulations ensure that the text can be extracted and viewed, retaining the preservation of a document's logical structure. PDF/A-1b requirements, on the other hand, make sure that the PDF file can be correctly displayed over time, but doesn't ensure that any extracted text will be legible or comprehensible. These divisions can complicate the transmission of PDF/As between different organizations following different compliance measures.

The standard, as well, has been updated with newer versions of PDF specifications. Each latest version brings new and different functionalities, implying that characteristics of the standard will continually be modified.

The PDF/A-2 standard was developed in 2011, based on the PDF Reference for version 1.7. and PDF/A-3 was released in 2012 based on the same PDF version. Thus, although the PDF/A standard has its disadvantages, it makes long-term archiving of PDFs possible.

Undoubtedly, the PDF format is rich in potential. Archiving is just another way of putting the PDF format to the test. With the growing reliance on electronic management systems, the PDF/A is well established as a permanent format within the PDF community.