When can a PDF reader claim conformance with the PDF standard?


PDF 1.7 has become an ISO standard in 2008 under the auspices of ISO TC 171 SC 2. As a consequence the further development of PDF is not controlled any longer by Adobe but rather by ISO. Unknown to many, ISO is not a standards producing service organisation, but rather a democratically organized international standards body. ISO standards are not produced by ISO, but rather by technical committees coordinated by ISO where each participating country delegates their national experts to do the actual work, and each participating country has one vote when it comes to accepting or rejecting a draft ISO standard. National standards bodies are open to participation from parties affected by such standardization work – usually it is a matter of getting in touch with the national standards body (and in some countries an annual fee is to be paid to fund the standardization work) in order to become involved in national and ISO standardization work.

ISO TC 171 SC 2, now being in charge of the PDF format, is currently working on the next version, most probably to be called PDF 2.0. A number of additions to the PDF syntax are being discussed. One of the relatively new activities consists of work that defines more explicitly what a conforming reader must support under all circumstances, and where a reader may decide not to support a certain aspect of PDF. In ISO language, a “PDF reader” is an application that reads or consumes a PDF, and then offers some kind of access to the PDF and its contents. Thus both an onscreen viewer for PDFs as well as a printer reproducing a PDF on paper are “PDF readers”, but also a metadata extraction tool for extracting metadata from a PDF file is a PDF reader.

While working on the definition of what conforming readers are, and when a reader is not to be considered a conforming reader,  it turned out there are so many different types of readers that a single definition may not be feasible. Why should a PDF tool that only extracts text from a PDF be bothered with getting the display of page content of the PDF right? As a consequence it is now most likely that the guiding principle in PDF 2.0, when it comes to defining “reader conformance”, will be: a reader can freely chose which aspects of the PDF 2.0 standard to implement – but for any aspect that is implemented it must completely follow the rules for that aspect.

Some examples:

  • if a reader offers rendering of page content, such rendering must be done in full conformance with PDF 2.0 provisions about rendering of page content – all color spaces, transparency blend modes, optional content and so forth must be supported
  • a reader does not have to offer support for video, Flash or 3D models – but if it does offer such features, they must be implemented in a way that fully conforms to PDF 2.0 (otherwise the reader can not be considered a PDF 2.0 conforming reader)

While this sounds easy and straightforward it should not be forgotten that a number of otherwise good quality PDF readers are far even from having implemented all relevant features. A lot of these PDF readers are still catching up with features introduced a number of years back in PDF 1.4 (for example transparency) or PDF 1.5 (for example optional content). Even when it only comes to rendering page content, only products from Adobe or based on Adobe technology seem to achieve a good coverage of all applicable features in the currently most recent version of PDF, i.e. PDF 1.7 as defined in ISO 32000-1.

With the more explicit guidance and rules around reader conformance to be introduced by PDF 2.0 the topic of reader conformance will become more obvious and more important, and in order to be successful in the market of PDF tools and solutions, it might help to be able to claim reader conformance for a given PDF product.

It will still take some time before PDF 2.0 is finalized and published (probably some time in 2012) but now is the time to provide input to the standardization process if needed. Talk to the experts in your national standards organisation. Or get in touch with the PDF/A Competence Center’s Technical Working Group – the PDF/A Competence Center has entered into a “Category A” liaison with ISO TC 171 SC 2 and is entitled to provide input to the work on PDF standards, though it does not have any voting rights (this is and will always be a privilege of participating national standards bodies).

Posted in PDF 2.0 | 1 Comment

Support for layers (optional content) in PDF/X-4 and PDF/A-2


Both the PDF/X-4 ISO standard and the PDF/A-2 ISO standard (yet to be formally approved, publication expected for the second quarter of 2011) contain provisions about layers, or – technically more correct – optional content. Many of those provisions are pretty self-explanatory, but some are not immediately obvious to everybody.

What is optional content used for in PDF?

The main function of optional content is that it can be turned on or off whether a given object on a page is actually displayed or not, without actually changing the page.

Building blocks of optional content in PDF

The building blocks for optional content are

  • optional content group (OCG): ties content – a portion of a content stream, an XObject  or even an annotation – directly to a logical entity named optional content group
  • optional content membership dictionary (OCMD): ties content – a portion of a content stream, an XObject  or even an annotation – indirectly to a logical entity named optional content group; the connection is indirect insofar as the state of visibility for content belonging to an OCMD may be determined in relation to one or more OCGs. For example, objects belonging to an OCMD only are displayed if all the OCGs to which it is tied are displayed as well.
  • optional content configuration dictionary (OCCD): defines for which of the OCGs in the document display is On or Off when the OCCD is in effect (there can be more than one OCCD in a PDF, though once optional content is used in a PDF at least one OCCD – the default OCCD – must be present).
  • Default optional content configuration dictionary (default OCCD): the default OCCD is the OCCD that is the value of the D entry in the OCProperties entry in the root object of a PDF, and it defines the initial visibility of OCGs in the PDF, for example when printing or when opening a PDF in an interactive PDF viewer.
  • Order array in an optional content configuration dictionary: the Order is an optional entry inside an OCCD, and it can contain a (optionally) nested list of OCGs to be displayed in the user interface of a PDF viewer, allowing for switching individual OCGs on or off.
  • RBGroups in an optional content configuration dictionary: the RBGroups – or Radio Button Groups – entry is optional and allows the definition of one or more lists of OCGs where the visibility for the OCGs in each lists follows a radion button behavior, that is, only a maximum of one of the OCGs within a list can be On. Turning an OCG in such a list Off does not have an effect, so in that case all OCGs within that list will be Off.

Provisions for all conforming PDF/X-4 and PDF/A-2 readers

The purpose of optional content in PDF/X-4 and PDF/A-2 files is “to allow multiple variants of a document to be supplied in a single file”. Optional content configuration dictionaries are used to identify the variants.

Unless explicit instructions are in effect, visibility of objects in a PDF file shall be according to the information in  the default OCCD as described in ISO 32000-1:2008, section 8.11.4, “Determining the State of Optional Content Groups”.

Provisions for all conforming interactive PDF/X-4 and PDF/A-2 readers

If an OCCD contains the Order entry, the interactive reader must provide a means to display the contents of the Order entry.

If an OCCD, which is not the default OCCD, does not contain the Order entry, but the default OCCD contains an Order entry, then that Order entry shall be used for display.

If one or more OCCDs are present in addition to the default OCCD, then a conforming interactive reader shall provide a means to display the list of OCCDs from which a user can choose which one to view and print.

Provisions for all conforming PDF/X-4 and PDF/A-2 writers

It is required that each optional OCCD, including the default OCCD, must contain a Name entry. If OCCDs other than the default OCCD are present, each Name entry must be unique among all OCCDs in the PDF file.

For each OCCD, if the Order entry is present, it must contain all OCGs.

Examples of software that create optional content

Simple creation of optional content

Outside the world of PDF layers are used in a slightly different way than in PDFs in the form of optional content. In authoring applications in the graphic art and publishing industries layers are typically used for logically grouping objects while at the same time implying a stacking order between layers, whereas in PDF for optional content stacking order or sequence of appearance in the page description is of no relevance. As a consequence, layers in an authoring application can be transcoded into PDF syntax and mostly maintain their original behavior (as PDFs are typically not used for editing anymore, all that remains to be achieved is to be able to toggle the visibility of each layer). The other direction – creating an authoring file with layers where the layers reflect the behavior of optional content in PDF – is typically not feasible, as optional content does not have to be contiguous in the page content.

Among the applications that support layers and can export or convert to PDF maintaining these layers are:

  • Adobe Illustrator
  • Adobe Indesign
  • QuarkXPress
  • Microsoft Visio (through the PDF Maker feature in Acrobat Pro)
  • several 3D formats by means of import into Acrobat 9 Pro Extended

Advanced creation of optional content

Creation of optional content for the purpose of this article is considered ‘advanced’ if it supports creation of OCCDs (beyond the default OCCD). Currently the following tools are know to support creation of OCCDs:

  • Acrobat 9 Pro Preflight (limited support)
  • callas pdfToolbox 4.5
  • callas pdfaPilot 2.3
  • axaio MadeToPrint 2.4 for Indesign (announced for end of 2010)
  • axaio MadeToPrint 1.1 for Illustrator (announced for end of 2010)

Conclusions

While currently there is only limited support for all aspects of optional content as defined  in PDF/X-4 and PDF/A-2 it is obvious that all conforming readers have to comprehensively support optional content if they claim full conformance to the respective standards.

As new features in any format or technology sometimes take some time before they are widely adopted it might be expected that adoption of optional content, and most notably the use of OCCDs, might pick up over time as well, as tools become more readily available. For PDFs that make use of numerous “layers”, or which use “layers” in complex ways, OCCDs make dealing with such PDFs much ore efficient.

Bibliography

ISO 32000-1:2008, Document management – Portable document format – Part 1: PDF 1.7, http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=51502

ISO 15930-7:2010, Graphic technology – Prepress digital data exchange using PDF – Part 7: Complete exchange of printing data (PDF/X-4) and partial exchange of printing data with external profile reference (PDF/X-4p) using PDF 1.6, http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=55843

ISO 19005-2, Document management — Electronic document file format for long-term preservation — Part 2:  Use of ISO 32000-1 (PDF/A), http://www.iso.org (yet to be published)

Posted in PDF/A, PDF/X | Leave a comment

What is PDF/A-2?


Overview

PDF/A-2 is the second part of the PDF/A standard. PDF/A is a series of ISO standards, and the first part was published in October 2005, based on PDF 1.4. It can be purchased from ISO at http://www.iso.org.

PDF/A-2 – officially to be referred to as “ISO 19005-2: Document management — Electronic document file format for long-term preservation — Part 2: Use of ISO 32000-1 (PDF/A)” is expected to be approved by ISO Technical Commitee TC 171  in December 2010, and if all goes well will be published a couple of months thereafter once it has gone through the quality assurance and publication process in the Central Secretariat of ISO in Geneva.

What is PDF/A?

According to the Library of Congress, PDF/A-1 is “a constrained form of Adobe PDF version 1.4 intended to be suitable for long-term preservation of page-oriented documents for which PDF is already being used in practice.” (Library of Congress http://digitalpreservation.gov/formats/fdd/fdd000125.shtml)

PDF/A attempts to maximize:

  • Device independence: The same information and appearance are available on all devices (within the constraints of the device)
  • Self-containment: Each file issufficient in itself to convey the information and appearance; no other resources are required
  • Self-documentation: The document contains suitable metadata to identify itself; its history can also be recorded within the document

To achieve these goals, the following constraints are imposed on PDF/A files:

  • Based on PDF 1.4: Constrained version ensures stability over time
  • Audio and video content are forbidden: Avoids difficulty with external files; simplified the development of the standard by focussing on page-based documents.
  • Javascript and executable file launches are prohibited: Avoids different appearance under different circumstances
  • All fonts and encodings must be embedded and also must be legally embeddable for unlimited, universal rendering: Ensures repeatable appearance of text
  • Colour spaces are specified in a device-independent manner: Ensures repeatable appearance of colour
  • Encryption is disallowed: No password is required to access anything within the document
  • Use of standards-based metadata is mandated: For self-documentation

Since 2005 PDF/A-1 has become quite successful and is used in a number of industries as well as in the public sector. In some countries it is mandated by law, whereas elsewhere it is the preferred or recommended format for archiving ‘ePaper’ content.

Why a second part?

PDF/A is based on the PDF format. One of the goals of PDF/A is to be able to archive content  stored inside PDF files. As the development of the PDF format itself continues, every once in a while a new part may have to be added to PDF/A to support archiving of newer features in the underlying PDF format.

In addition, as the PDF format itself as of PDF version 1.7 is not an Adobe controlled specification anymore but an ISO standard it seemed like a good idea to many to come up with a second part of PDF/A based on the ISO version of PDF 1.7 – an ISO standard based on an ISO standard. There are enough interesting additions to the PDF format between PDF 1.4 and PDF 1.7 which really made this worthwhile

What is new in PDF/A-2?

The guiding principle was that everything in PDF 1.7 that is within the scope of PDF/A shall be allowed, and that everything that is outside of that scope will be disallowed:

This part of ISO 19005 specifies the use of the Portable Document Format (PDF) 1.7, as formalized in ISO 32000-1, for preserving the static visual representation of page based electronic documents over time.

The focus on static content precludes for example archiving of any multimedia content inside PDF/A files.

The following paragraphs will explain a number of new features in PDF/A-2 that might have an impact on the use of PDF for archiving.

PDF/A files can be embedded in PDF/A-2 files

PDF/A-2 makes it possible to embed other PDF/A files – whether PDF/A-1 or PDF/-2 – inside a PDF/A-2 files. This can be handy when several independent are to be kept together without merging them into a single PDF file. Possible uses: convert an email with attachments to a PDF/A-2 where the attachments become embedded PDF/A files, or a binder with several PDF/A files which have been digitally signed indivudally so that merging them into one file would destroy the validity of the signatures, or simply by converting a binder of folder with several docuemnts into a PDF/A-2 file with embedded PDF/A files which reflect the original nature of each of the files.

Transparency

Transparency actually has been part of PDF already in PDF version 1.4 but was disallowed in PDF/A-1, mostly because it was felt that technology for processing transparency was not yet sufficiently wide spread. In addition, a few  technical details had not yet been specified at that time which could have led to inconsistent rendering of transprency under certain circumstances. Half a decade onward technology has matured and the PDF standard now defines trasparency precisele enough to be implemented consistently.

Transparency is being used in a number of ways – demanding designs can more easily be achieved with transparency effects, slide presentations often make use of effects best encoded by means of transparency, and text highlight annotations – looking like yellow felt markers – constitute probably the most frequent use of the transparency in PDFs. Archiving transparency as PDF/A-1 so far has required to “flatten transparency” – this will not be required any longer once PDF/A-2 is published.

JPEG2000

JPEG2000 – an ISO standard in itself (ISO/IEC 15444) – defines a powerful image compression algorithm. For certain uses it can be more efficient than JPEG. Institutions  like museums and galleries value the possibility to use JPEG2000 in a lossless way – which typically creates less data than ZIP but still does not introduce any losses like JPEG.

Layers

Layers in PDFs introduce the possibility to turn the visibility of selected content on or off. Technically layers are actually called “optional content” in the PDF syntax but most user interface implementations use the term “layer” or “layers”.

The engineering community makes use of layers in complex technical drawings where it can be useful to turn some aspects of a drawing on th e be abel to more asily focus on the remaining information. For example when looking at the techncial drawing for a house it might be helpful at times to turn off plumbing data while looking into aspects of cabling, and vice versa. Layers are also used for multi-lingual publishing  where out of two or more languages only one is displayed ata time. As graphics and images can typically be shared between languages, more compact files can be achieved, and quality assurance of the multi-lingual content can more easily be carried out.

Conformance level for Unicode support

PDF/A-2 introduces a third conformance level. In PDF/A-1 conformance level “B” would indicate that the PDF/A file can be reliably reproduced visually, whereas the conformance level “A” indicates that the PDF/A file contains enough information – in the form of both text supporting Unicode and tagged PDF – that its contents can be accessed and retrieved in a structured way. This is an important prerequisite for indexing text properly, as well for retrieving content for repurposing or migrating it to some other representation in the future.

While the “A” conformance level obviously is preferable it is not always feasible. Scanned documents for example by definition lack any structural content information. PDFs created from applications that do not support the creation of tagged PDF are also not suitable for archiving according to conformance level “A”. Nevertheless in many cases such PDFs can still be created in a way that guarantees that all text can be mapped to Unicode, even if the reading order of the text in the PDF is not actually explicitly defined. Such Unicode support was deemed relevant enough to introduce a separate conformance lebel “U” for it in PDF/A-2.

OpenType fonts

The PDF 1.7 format supports the use and emebdding for OpenType fonts, and in PDF/A-2 it is also possible to embed OpenType fonts. The main advantage is that where OpenType fonts are used to create documents they do not have to be re-encoded anymore to a different technical representation but can rather be embedded as is.

New annotation types

Between PDF 1.4 and PDF 1.7 a number of new annotations types have been introduced ntot he PDF format – all of them used for applying comments or markup to a PDF can now also be archived in PDF/A-2: Polygon, PolyLine, Caret, Watermark and Redact annotations.

What is not allowed in PDF/A-2?

As stated above the scope for PDF/A prohibits embedded of no-static content in PDF/A files. There are several areas where non-static content plays a big role:

  • multimedia features – annotations that make it possible to included movies, audio or 3D models in PDF files
  • forms based on XFA, a new archtitecture introduced by Adobe to support complex interactive forms

The approach to actions and JavaScript is stil the same as in PDF/A-1: JavaScript is completely prohibited in PDF/A-2, and onyl a small set of actions is allowed for which there is no risk that they could change the content of he archived PDF.

This does not imply though that a PDF/A file can have no interactivity at all – rather the opposite: all links can be actionable, whether within a PDF file or pointing to other PDF files or other files or destinations.

Summary

PDF/A-2 introduces interesting new features that everyone responsible for archiving should be aware of. At the same time it is crucial to understand that PDF/A-2 does not obsolete PDF/A-1. Where currently, or even in the future, PDF/A-1 is successfully used for archiving, there is no reason to move to PDF/A-2 based archiving. Under no circumstances will it make any sense to migrate PDF/A-1 files to PDF/A-2 – nothing can be gained from doing so.

Nevertheless where PDF/A-1 has not been a perfect option yet, and where PDF/A-2 offers precisely what has been missing in PDF/A-1, it should be considered to opt for PDF/A-2 based archiving, rather than to force PDF files to be archived into a PDF/A-1 conforming representation.

With the publication of the PDF/A-2 standard to be expected in the second quarter of 2011, and as of today limited availability of tools and solutions that support PDF/A-2 – some companies like Adobe, callas software or Luratech have already announced PDF/A support – now is a good time to review the implications of PDF/A-2 and to define a strategy for the next years: whether that implies to keep using PDF/A-1, or to start using PDF/A-2 in the foreseeable future.

More information about PDF/A

The best resource for information around PDF/A is the PDF/ Competence Center (http://www.pdfa.org), an  international association with over 100 members from all over the world, most of them vendors who offer software with support for PDF/A.

Posted in PDF/A | 2 Comments

Safari assumes display profile for images without embedded profile


As I just learnt during a discussion on Apple’s colorsync mailing list, that Safari, when rendering images that do not have an ICC profile embedded, will assume the display profile as the source profile. I had hoped that Safari would rather use sRGB in such a case, especially given that with more and more wide gamut displays the choice of using the display profile will more and more result in incorrect color display. I thought (had hoped) defaulting to display profiles is a thing of the nineties and is gone by now…

With other browser moving again to not properly manged color display (whether by not looking for emdedded ICC profiles anyway) or by using faster but more limited color engines I have the impression the world of browser is moving away again from predictable display of colors (not only in images). What a pain given how much the relevanc of displaying content in a browser is still increasing.

Posted in Color | Tagged | Leave a comment

Disappointing: VoiceOver screen reading for PDFs


Loaded a few PDFs into iBook and let VoiceOver’s screen reading read it to me – not quite what I was expecting (hoping for). Despite the PDFs being nicely tagged, the order in which the text was read aloud failed to reflect the tagging (and thus correct semantic) order. What a shame – so much work went into the overall design and implementation of VoiceOver, and the reading PDFs still fails miserably.

As it turns out, the PDF reading on the iPad seems to use the same engine as Preview on Mac OS X (not that that would surprise anybody) – and doing text selections in Preview reveals the same heuristics of the PDF engine on Mac when it comes to identifying ‘order’.

Thus having a PDF read aloud on iPad is not what it should be. Maybe some orther PDF readers do a better job? I will try to find them if they exist…

Posted in Accessibility | 1 Comment

(Not?) Move stuff between computer and iPad


It is a real déjà vu – I feel like in the eighties, when it was a challenge to move files between the IBM 3080 host computer at university and my brandnew 8MHz 640KB DOS PC at home – it was painful (how do you read mainframe tapes on a home PC? You simply don’t…). Now with my iPad and my MacBook Pro in front of me, I have to launch iTunes each time to transfer a PDF to my iPad. Probably there is a way to achieve this more easily (there must be a reason for the zillion apps), but then I expected the iPad to be easy to use right away…?

Posted in iStuff | 1 Comment

iPad – how accessible is it?


Finally bought an iPad – mostly to find out how accessible it is. Looked at the VoiceOver implementation earlier this week and have to admit that I was somehow impressed – Apple does get a few things right besides design and making tons of money. So the question is: is iPad a relevant option for handicapped people? Will do some more research and experimentation in the next days and weeks…

Posted in Accessibility | Leave a comment