[Athen] Automating accessibility tagging of PDF

N Dogbo ndogbo at gmail.com
Fri Jul 19 15:34:17 PDT 2013

Hey Sean,

Thanks very much. Very much appreciated.


----- Think not with your EYES and you shall have a perfect VISION! ---


From: athen-list-bounces at mailman1.u.washington.edu
[mailto:athen-list-bounces at mailman1.u.washington.edu] On Behalf Of Sean
Sent: Friday, July 19, 2013 10:59 AM
To: Access Technology Higher Education Network
Subject: Re: [Athen] Automating accessibility tagging of PDF

Hi Nicaise,

For automating MS Word to tagged PDF - this is something we have built into
our SCRIBE tool and it makes the assumption that the document author will
include the appropriate accessibility information into the MS Word file.
This includes using headings, text descriptions for images, using tables
appropriately, etc. Remember, you are limited as to how much accessibility
information you can include into a MS Word document compared to the full
markup possible in a tagged PDF. To get the full set of tags, you would need
to use Acrobat Pro or another tool (e.g., NetCentric's CommonLook PDF), but
then you are no longer automating the process.

What I am (attempting) to do within my institution is to provide an
automated tool that will support the basics of converting an MS Office
document to tagged PDF. With this framework, I can then work with document
authors to say "do these five things and the major accessibility issues are
no longer an issue". From there, I can begin to work on more specific cases
in which such automation may not be an option (e.g., PDF forms, math,
foreign language documents, etc.).

The plugins we used to automate this process in the SCRIBE tool were from
Cognidox - http://www.cognidox.com/products/opensource/officetopdf

The Robobraille/Sensus Access converters (online, free) will also support
the automatic conversion of MS Word and PowerPoint to tagged PDF -

I do know that some people have scripted Open Office to perform this
functionality as well as Open Office can save out a tagged PDF, but I never
really had any success with that workflow (most likely due to my lack of

For automating PDF to tagged PDF - this one is a bit more problematic as
accessibility is more than just "tagging" a PDF. While the tagging can be
useful for creating that document structure, when it is automated you do not
know with what accuracy the tags have been applied. Further, automated
processes will not be able to add text descriptions to images, appropriately
mark up data tables, and may not be accurate in specifying a heading

In our SCRIBE tool (and also available via the Robobraille/Sensus Access
tools), we do support the automated process of tagging a PDF by default by
using the recognition capabilities of Abbyy Finereader. For the most part,
this functionality has worked well in delivering a tagged PDF in which the
logical reading order of the a document is controlled. We are not doing
anything special and are relying on the capabilities of the OCR engine to
recognize a page layout and put the text into the appropriate order. So,
while we can automate the output of a tagged PDF from any PDF document, it
only provides organization to the reading order and no other support for
image descriptions, etc. That part has to be completed manually.

To automate at least some of the process, my suggestion would be Abby
Finereader Corporate Edition as this supports a Hot Folder model where you
can dump files, have them processed, and then specify the output location.
You can also go with Abbyy Recognition Server, but this is VERY expensive
and does not do more in terms of automating PDF tagging.

Some may argue the AT applications don't take into consideration all the
possible PDF tags, so what's the point and it's better to focus on the basic
tagging capabilities. To a certain extent, I think it really depends on the
types of documents you are creating and/or retrofitting and the population
of individuals you are serving. For example, if you are dealing with
documents that are not that complex, then an automated process may give you
exactly what you need. On the other hand, if you are dealing with documents
that include a complex visual layout (e.g., magazine layout, lots of images,
etc.), then you may find an automated process alone does not work all that

Hope this helps.

Take care,

Sean Keegan
Associate Director, Assistive Technology
Office of Accessible Education - Stanford University

On Jul 19, 2013, at 9:28 AM, "N Dogbo" <ndogbo at gmail.com> wrote:

Hi Sean,

Yes both-- MS Word to tagged PDF and PDF to tagged PDF. So any help,
resources and advice you can send out would be greatly appreciated.

Thanks a million!



----- Think not with your EYES and you shall have a perfect VISION! ---

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman12.u.washington.edu/pipermail/athen-list/attachments/20130719/23dfac21/attachment.html>

More information about the athen-list mailing list