[Athen] Automating accessibility tagging of PDF
skeegan at stanford.edu
Fri Jul 19 10:59:05 PDT 2013
For automating MS Word to tagged PDF - this is something we have built into our SCRIBE tool and it makes the assumption that the document author will include the appropriate accessibility information into the MS Word file. This includes using headings, text descriptions for images, using tables appropriately, etc. Remember, you are limited as to how much accessibility information you can include into a MS Word document compared to the full markup possible in a tagged PDF. To get the full set of tags, you would need to use Acrobat Pro or another tool (e.g., NetCentric's CommonLook PDF), but then you are no longer automating the process.
What I am (attempting) to do within my institution is to provide an automated tool that will support the basics of converting an MS Office document to tagged PDF. With this framework, I can then work with document authors to say "do these five things and the major accessibility issues are no longer an issue". From there, I can begin to work on more specific cases in which such automation may not be an option (e.g., PDF forms, math, foreign language documents, etc.).
The plugins we used to automate this process in the SCRIBE tool were from Cognidox - http://www.cognidox.com/products/opensource/officetopdf
The Robobraille/Sensus Access converters (online, free) will also support the automatic conversion of MS Word and PowerPoint to tagged PDF - http://sensusaccess.com/
I do know that some people have scripted Open Office to perform this functionality as well as Open Office can save out a tagged PDF, but I never really had any success with that workflow (most likely due to my lack of abilities).
For automating PDF to tagged PDF - this one is a bit more problematic as accessibility is more than just "tagging" a PDF. While the tagging can be useful for creating that document structure, when it is automated you do not know with what accuracy the tags have been applied. Further, automated processes will not be able to add text descriptions to images, appropriately mark up data tables, and may not be accurate in specifying a heading structure.
In our SCRIBE tool (and also available via the Robobraille/Sensus Access tools), we do support the automated process of tagging a PDF by default by using the recognition capabilities of Abbyy Finereader. For the most part, this functionality has worked well in delivering a tagged PDF in which the logical reading order of the a document is controlled. We are not doing anything special and are relying on the capabilities of the OCR engine to recognize a page layout and put the text into the appropriate order. So, while we can automate the output of a tagged PDF from any PDF document, it only provides organization to the reading order and no other support for image descriptions, etc. That part has to be completed manually.
To automate at least some of the process, my suggestion would be Abby Finereader Corporate Edition as this supports a Hot Folder model where you can dump files, have them processed, and then specify the output location. You can also go with Abbyy Recognition Server, but this is VERY expensive and does not do more in terms of automating PDF tagging.
Some may argue the AT applications don't take into consideration all the possible PDF tags, so what's the point and it's better to focus on the basic tagging capabilities. To a certain extent, I think it really depends on the types of documents you are creating and/or retrofitting and the population of individuals you are serving. For example, if you are dealing with documents that are not that complex, then an automated process may give you exactly what you need. On the other hand, if you are dealing with documents that include a complex visual layout (e.g., magazine layout, lots of images, etc.), then you may find an automated process alone does not work all that well.
Hope this helps.
Associate Director, Assistive Technology
Office of Accessible Education - Stanford University
On Jul 19, 2013, at 9:28 AM, "N Dogbo" <ndogbo at gmail.com> wrote:
> Hi Sean,
> Yes both-- MS Word to tagged PDF and PDF to tagged PDF. So any help, resources and advice you can send out would be greatly appreciated.
> Thanks a million!
> ----- Think not with your EYES and you shall have a perfect VISION! ---
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the athen-list