[Athen] FW: Structured accessible HTML from PDF

Ron Stewart ron.stewart at dolphinusa.com
Sun Apr 8 06:48:48 PDT 2007

More on the RiverDocs pdf - html solution that was discussed a while back.


-----Original Message-----
From: blindnews-bounces at blindprogramming.com
[mailto:blindnews-bounces at blindprogramming.com] On Behalf Of BlindNews
Mailing List
Sent: Sunday, April 08, 2007 12:57 AM
To: BlindNews at blindprogramming.com
Subject: Structured accessible HTML from PDF

IT-Director.com (UK)
Thursday, March 29, 2007

Structured accessible HTML from PDF

By Peter Abrahams

Peter Abrahams, Practice Leader, Accessibility and Usability, Bloor Research
Published: 29th March 2007

There are hundreds of millions of PDF files on the web. The two main reasons
for this popularity are:

The document always looks the same irrespective of the type of printer,
browser or device. This is important aesthetically but may also have legal
The document is secure; it cannot be altered.

Unfortunately PDF files on the web can be problematic for people with
disabilities, especially users of screen-readers because:

Free screen readers, such as Thunder, do not support PDF documents because
the complexity of the file format has made it too expensive to develop the
support. Commercially available screen readers, such as JAWS, that do
support PDF, are too expensive for a large number of people who use
computers infrequently or access the web via a PC in a library or Internet
Adobe have defined an extension to the PDF format to provide more
information to screen-readers such as alternative text for images and
heading levels to aid navigation around the document. Most existing PDF
files have not been created as accessible PDF, and the task of converting
existing documents is complex and not always achievable.
Creating new documents as accessible PDF is perfectly possible and
straightforward but requires the use of specific tools and the understanding
and cooperation of the document originators. So it is inevitable that many
new documents will be produced that are not accessible.

PDF documents are an ideal format for downloading off the web and printing
out, but because of all the above reasons there is a need to provide these
documents in an alternative format. The obvious alternative is for the
document to be available in HTML that is designed for use by users who are
blind or have a vision-impairment. The user is not interested in the
document looking identical to the original but needs a document that can be
read efficiently using a screen reader; to do this the document must:

Be linearised, that is any text in multiple columns or around pictures, in
the original, must be presented in the correct order.
Have alternative text for any images.
Mark up tables so that information can be accurately and quickly found in
Include document structure information such as headings, so that the user
can navigate quickly around and find the relevant information.

There are a number of pdf-to-html converters available but I believe that
the recently announced RiverDocs Converter is the first aimed specifically
at the creation of structured, accessible html documents that are optimized
for screen-reader usage.

The converter will take any PDF document and analyse it to recognise
multi-column pages, headings, tables, images and other formatting and
convert it all into XHTML. Correctly recognising text that wraps around a
picture, or the cells in a table requires sophisticated artificial
intelligence algorithms.

Having completed the conversion it checks the output for accessibility
issues that could not be fixed automatically. The most obvious issue is the
lack of descriptions of images using the alt tag.

The user interface to the product allows the user to see the list of issues
and at the same time see the relevant sections of the original PDF file, the
generated XHTML and a preview of the document on a browser. Clicking on an
issue will position the preview to the context of the issue and then the
user can fix the problem.

The final output will be a well-structured and annotated document that will
give a blind user an excellent experience whilst reading the document.

The UK Disability Equality Duty, that I discussed in a recent blog, has put
significant pressure on public authorities and their suppliers to ensure all
the content of their web sites is accessible. Providing structured,
accessible XHTML versions of all the PDF files is considered to be the only
way to comply with the Duty.

The volume and size of the files that need to be converted has meant that
the authorities have outsourced this task to specialist web agencies.
RiverDocs Converter automates most of the conversion process and means that
an agency using it will provide a very competitive bid.

RiverDocs Converter should appeal to any organisation that has a large
number of existing documents that need to be made accessible, or that
publishes new documents that are not created to be accessible and will need



-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ATT00323.txt
URL: <http://mailman12.u.washington.edu/pipermail/athen-list/attachments/20070408/6c60ee22/attachment.txt>

More information about the athen-list mailing list