Overview
PDF/A-2 is the second part of the PDF/A standard. PDF/A is a series of ISO standards, and the first part was published in October 2005, based on PDF 1.4. It can be purchased from ISO at http://www.iso.org.
PDF/A-2 – officially to be referred to as “ISO 19005-2: Document management — Electronic document file format for long-term preservation — Part 2: Use of ISO 32000-1 (PDF/A)” is expected to be approved by ISO Technical Commitee TC 171 in December 2010, and if all goes well will be published a couple of months thereafter once it has gone through the quality assurance and publication process in the Central Secretariat of ISO in Geneva.
What is PDF/A?
According to the Library of Congress, PDF/A-1 is “a constrained form of Adobe PDF version 1.4 intended to be suitable for long-term preservation of page-oriented documents for which PDF is already being used in practice.” (Library of Congress http://digitalpreservation.gov/formats/fdd/fdd000125.shtml)
PDF/A attempts to maximize:
- Device independence: The same information and appearance are available on all devices (within the constraints of the device)
- Self-containment: Each file issufficient in itself to convey the information and appearance; no other resources are required
- Self-documentation: The document contains suitable metadata to identify itself; its history can also be recorded within the document
To achieve these goals, the following constraints are imposed on PDF/A files:
- Based on PDF 1.4: Constrained version ensures stability over time
- Audio and video content are forbidden: Avoids difficulty with external files; simplified the development of the standard by focussing on page-based documents.
- Javascript and executable file launches are prohibited: Avoids different appearance under different circumstances
- All fonts and encodings must be embedded and also must be legally embeddable for unlimited, universal rendering: Ensures repeatable appearance of text
- Colour spaces are specified in a device-independent manner: Ensures repeatable appearance of colour
- Encryption is disallowed: No password is required to access anything within the document
- Use of standards-based metadata is mandated: For self-documentation
Since 2005 PDF/A-1 has become quite successful and is used in a number of industries as well as in the public sector. In some countries it is mandated by law, whereas elsewhere it is the preferred or recommended format for archiving ‘ePaper’ content.
Why a second part?
PDF/A is based on the PDF format. One of the goals of PDF/A is to be able to archive content stored inside PDF files. As the development of the PDF format itself continues, every once in a while a new part may have to be added to PDF/A to support archiving of newer features in the underlying PDF format.
In addition, as the PDF format itself as of PDF version 1.7 is not an Adobe controlled specification anymore but an ISO standard it seemed like a good idea to many to come up with a second part of PDF/A based on the ISO version of PDF 1.7 – an ISO standard based on an ISO standard. There are enough interesting additions to the PDF format between PDF 1.4 and PDF 1.7 which really made this worthwhile
What is new in PDF/A-2?
The guiding principle was that everything in PDF 1.7 that is within the scope of PDF/A shall be allowed, and that everything that is outside of that scope will be disallowed:
This part of ISO 19005 specifies the use of the Portable Document Format (PDF) 1.7, as formalized in ISO 32000-1, for preserving the static visual representation of page based electronic documents over time.
The focus on static content precludes for example archiving of any multimedia content inside PDF/A files.
The following paragraphs will explain a number of new features in PDF/A-2 that might have an impact on the use of PDF for archiving.
PDF/A files can be embedded in PDF/A-2 files
PDF/A-2 makes it possible to embed other PDF/A files – whether PDF/A-1 or PDF/-2 – inside a PDF/A-2 files. This can be handy when several independent are to be kept together without merging them into a single PDF file. Possible uses: convert an email with attachments to a PDF/A-2 where the attachments become embedded PDF/A files, or a binder with several PDF/A files which have been digitally signed indivudally so that merging them into one file would destroy the validity of the signatures, or simply by converting a binder of folder with several docuemnts into a PDF/A-2 file with embedded PDF/A files which reflect the original nature of each of the files.
Transparency
Transparency actually has been part of PDF already in PDF version 1.4 but was disallowed in PDF/A-1, mostly because it was felt that technology for processing transparency was not yet sufficiently wide spread. In addition, a few technical details had not yet been specified at that time which could have led to inconsistent rendering of transprency under certain circumstances. Half a decade onward technology has matured and the PDF standard now defines trasparency precisele enough to be implemented consistently.
Transparency is being used in a number of ways – demanding designs can more easily be achieved with transparency effects, slide presentations often make use of effects best encoded by means of transparency, and text highlight annotations – looking like yellow felt markers – constitute probably the most frequent use of the transparency in PDFs. Archiving transparency as PDF/A-1 so far has required to “flatten transparency” – this will not be required any longer once PDF/A-2 is published.
JPEG2000
JPEG2000 – an ISO standard in itself (ISO/IEC 15444) – defines a powerful image compression algorithm. For certain uses it can be more efficient than JPEG. Institutions like museums and galleries value the possibility to use JPEG2000 in a lossless way – which typically creates less data than ZIP but still does not introduce any losses like JPEG.
Layers
Layers in PDFs introduce the possibility to turn the visibility of selected content on or off. Technically layers are actually called “optional content” in the PDF syntax but most user interface implementations use the term “layer” or “layers”.
The engineering community makes use of layers in complex technical drawings where it can be useful to turn some aspects of a drawing on th e be abel to more asily focus on the remaining information. For example when looking at the techncial drawing for a house it might be helpful at times to turn off plumbing data while looking into aspects of cabling, and vice versa. Layers are also used for multi-lingual publishing where out of two or more languages only one is displayed ata time. As graphics and images can typically be shared between languages, more compact files can be achieved, and quality assurance of the multi-lingual content can more easily be carried out.
Conformance level for Unicode support
PDF/A-2 introduces a third conformance level. In PDF/A-1 conformance level “B” would indicate that the PDF/A file can be reliably reproduced visually, whereas the conformance level “A” indicates that the PDF/A file contains enough information – in the form of both text supporting Unicode and tagged PDF – that its contents can be accessed and retrieved in a structured way. This is an important prerequisite for indexing text properly, as well for retrieving content for repurposing or migrating it to some other representation in the future.
While the “A” conformance level obviously is preferable it is not always feasible. Scanned documents for example by definition lack any structural content information. PDFs created from applications that do not support the creation of tagged PDF are also not suitable for archiving according to conformance level “A”. Nevertheless in many cases such PDFs can still be created in a way that guarantees that all text can be mapped to Unicode, even if the reading order of the text in the PDF is not actually explicitly defined. Such Unicode support was deemed relevant enough to introduce a separate conformance lebel “U” for it in PDF/A-2.
OpenType fonts
The PDF 1.7 format supports the use and emebdding for OpenType fonts, and in PDF/A-2 it is also possible to embed OpenType fonts. The main advantage is that where OpenType fonts are used to create documents they do not have to be re-encoded anymore to a different technical representation but can rather be embedded as is.
New annotation types
Between PDF 1.4 and PDF 1.7 a number of new annotations types have been introduced ntot he PDF format – all of them used for applying comments or markup to a PDF can now also be archived in PDF/A-2: Polygon, PolyLine, Caret, Watermark and Redact annotations.
What is not allowed in PDF/A-2?
As stated above the scope for PDF/A prohibits embedded of no-static content in PDF/A files. There are several areas where non-static content plays a big role:
- multimedia features – annotations that make it possible to included movies, audio or 3D models in PDF files
- forms based on XFA, a new archtitecture introduced by Adobe to support complex interactive forms
The approach to actions and JavaScript is stil the same as in PDF/A-1: JavaScript is completely prohibited in PDF/A-2, and onyl a small set of actions is allowed for which there is no risk that they could change the content of he archived PDF.
This does not imply though that a PDF/A file can have no interactivity at all – rather the opposite: all links can be actionable, whether within a PDF file or pointing to other PDF files or other files or destinations.
Summary
PDF/A-2 introduces interesting new features that everyone responsible for archiving should be aware of. At the same time it is crucial to understand that PDF/A-2 does not obsolete PDF/A-1. Where currently, or even in the future, PDF/A-1 is successfully used for archiving, there is no reason to move to PDF/A-2 based archiving. Under no circumstances will it make any sense to migrate PDF/A-1 files to PDF/A-2 – nothing can be gained from doing so.
Nevertheless where PDF/A-1 has not been a perfect option yet, and where PDF/A-2 offers precisely what has been missing in PDF/A-1, it should be considered to opt for PDF/A-2 based archiving, rather than to force PDF files to be archived into a PDF/A-1 conforming representation.
With the publication of the PDF/A-2 standard to be expected in the second quarter of 2011, and as of today limited availability of tools and solutions that support PDF/A-2 – some companies like Adobe, callas software or Luratech have already announced PDF/A support – now is a good time to review the implications of PDF/A-2 and to define a strategy for the next years: whether that implies to keep using PDF/A-1, or to start using PDF/A-2 in the foreseeable future.
More information about PDF/A
The best resource for information around PDF/A is the PDF/ Competence Center (http://www.pdfa.org), an international association with over 100 members from all over the world, most of them vendors who offer software with support for PDF/A.