[Athen] Structured accessible HTML from PDF

Terry Thompson tft at u.washington.edu
Wed Apr 11 07:21:43 PDT 2007


I tried the demo version of RiverDocs on the EDUCAUSE survey article:
http://www.educause.edu/ir/library/pdf/ERB0512.pdf

It's tough to make conclusive judgements since the trial version will only
convert two pages, and it replaces about half the text with XXXX's. From
what I saw of the first two pages, I was not impressed with its "artificial
intelligence". The title of the article "Information Technology
Accessibility in Higher Education", which is by far the largest text in the
original PDF, was converted to a <p> tag, whereas the smaller subtitle
"Research and Promising Practies" was (correctly) converted to an <h2>. On
page 2, the section heading "Overview" was converted to <h3>, but
structurally I think that should be <h2>. Other than that it did ok with
paragraphs and unordered lists, but the really annoying thing was that it
captured background images and scattered them as img tags in odd places
throughout the HTML output. I'm sure that's just a case of "garbage in
garbage out", but for the asking price and the lofty claims I would expect
something a little more reliable.

Terry

Terry Thompson
Technology Specialist, DO-IT
University of Washington
tft at u.washington.edu
206/221-4168
http://www.washington.edu/doit


> -----Original Message-----

> From: athen-bounces at athenpro.org

> [mailto:athen-bounces at athenpro.org] On Behalf Of E.A. Draffan

> Sent: Wednesday, April 11, 2007 2:28 AM

> To: skeegan at htctu.net; 'Access Technologists in Higher

> Education Network'

> Subject: Re: [Athen] Structured accessible HTML from PDF

>

> I have experimented a little bit - the interface is easy to

> use and the output is seen alongside the original.

>

> I have used it for PDF to Word and found that it was saved in

> a RiverDocs format and it was easier for me to copy and paste

> the contents into Word.

> Tables if accessible were linearalised in a rather odd way as

> has been said

>

> If the entire PDF is a picture then obviously there is no

> conversion - it just puts the picture into the conversion

> area and you can re-size it etc!

>

> I had one PDF form with the labels to the left and lines for

> writing on beside the labels - the conversion put all the

> labels in a list and

> collected all the lines after the list. It was in fact

> easier just to copy

> and paste the contents of that PDF into Word to make

> necessary changes!

>

> Google and Opera can make better versions at times for free :>))

>

> However, there is a useful 'issues' button that flags up why

> things may not have converted well.

>

> Best Wishes E.A.

>

> Mrs E.A. Draffan

> Assistive Technologist

> Mobile: 07976 289103

> http://www.emptech.info/

>

> -----Original Message-----

> From: athen-bounces at athenpro.org

> [mailto:athen-bounces at athenpro.org] On Behalf Of Sean Keegan

> Sent: Wednesday, April 11, 2007 12:05 AM

> To: 'Access Technologists in Higher Education Network'

> Subject: Re: [Athen] Structured accessible HTML from PDF

>

>

> > The RiverDocs converter will take any PDF document and analyse it to

> recognise

> > multi-column pages, headings, tables, images and other

> formatting and

> convert

> > it all into XHTML. Correctly recognising text that wraps around a

> > picture,

>

> > or the cells in a table requires sophisticated artificial

> intelligence

> algorithms.

>

> Has anyone actually used the RiverDocs Converter and what

> have you thought of it?

>

> The press release and the information on their website a bit

> lacking as to exactly how they really are different from the

> other tools that currently exists (you can edit a PDF for

> accessibility in Acrobat and export from PDF to HTML). I

> found their price estimate of creating accessible PDF

> versions with other tools in excess of 50 pounds per page

> somewhat high.

>

> The website URL is: http://www.riverdocs.com

>

> I am planning to get my hands on a demo copy, I just have not

> had a chance to look into the tool much.

>

>

> Thanks,

> Sean

>

>

>

> _______________________________________________

> Athen mailing list

> Athen at athenpro.org

> http://athenpro.org/mailman/listinfo/athen_athenpro.org

>

> --

> No virus found in this incoming message.

> Checked by AVG Free Edition.

> Version: 7.5.446 / Virus Database: 269.2.0/756 - Release

> Date: 10/04/2007

> 22:44

>

>

> --

> No virus found in this outgoing message.

> Checked by AVG Free Edition.

> Version: 7.5.446 / Virus Database: 269.2.0/756 - Release

> Date: 10/04/2007

> 22:44

>

>

>

> _______________________________________________

> Athen mailing list

> Athen at athenpro.org

> http://athenpro.org/mailman/listinfo/athen_athenpro.org

>






More information about the athen-list mailing list