[Athen] FW: Athen Digest, Vol 25, Issue 38 Processing PDFs into Kurzweil 3000 Files

Londergan, M D londerga at indiana.edu
Wed Feb 27 10:17:21 PST 2008


-------------------------

At the Adaptive Technology Center, in the mix of techniques employed to provide e-text for our students, we too process tons of PDF's through Kurzweil 3000. These are mostly PDF's we have received from publishers, PDF's posted as e-readings for a class by instructors, or PDF's we have generated ourselves.

We have upgraded OCR Rocket to pick up PDF's as well as the TIFF files from scanning. But, though Kurzweil 3000's ability to open a PDF directly and convert it to a KES file on the fly is tempting, in practice it only works well on certain types of small PDF's. Kurzweil 3000 has more problems converting certain PDF types than others. Notably, PDF's containing any grayscale images will be troublesome when opened in Kurzweil 3000, and the resulting KES file will be around 10x larger than the original PDF and it may take an entire CD to store only 125 pages worth of the resulting KES files. This is odd as Kurzweil 3000 does a better job compressing PDF's with color images.

We find that the best way to process most PDF files, especially PDF files that are composed of grayscale or color images, is to send them to the KESI Virtual Printer. For color or grayscale PDF's whose content dictates generating a color KES file, we configure the KESI Virtual printer's preferences for print quality of 300 DPI and set it to use true color (24 bit) output. Otherwise, the black & white KESI Virtual Printer settings are used. The KESI Virtual Printer works by creating a temporary TIFF file that it feeds to Kurzweil 3000, so unfortunately when KESI Virtual printing in true color at 300 DPI, the TIFF file gets rather large when dealing with documents of more than a dozen or so pages! Kurzweil 3000 is limited in that it can't currently process a TIFF file larger than 2GB, and with a color job on the KESI Virtual printer this 2GB limit can be reached within 50-70 pages! If you are getting the message "There was an error loading the requested item" from Kurzweil 3000 when Virtual Printing larger PDF's, then this is likely the problem!

Since we often "print" our PDF's to the KESI Virtual Printer from Adobe Reader, we can select the page range to print which allows us to make this process workable by limiting the number of pages we virtual print for a given print job. For smaller page sizes, like a novel, you can print more pages. For larger books, like a standard 8.5x11" text book somewhat less can be virtual printed before hitting the wall (50-70 pages). We do our best to chunk up the books by sections or chapters as appropriate, but always have to fit them into the workable operating constraints.

Adobe Reader's printing preferences usually has a page scaling setting that defaults to "Shrink to Printable Area". We usually turn this off (set it to "None") as it allows the text to be slightly larger, and since Adobe reader isn't downscaling the image to fit the page, the OCR results are better.

Occasionally we find it necessary to do some image processing of the PDF before it is Virtual Printed or otherwise processed. We frequently use ABBYY finereader to quickly run through the entire PDF to split pages or quickly crop the pages as needed. An entire PDF of 100's of pages can be cropped in a couple minutes and then resaved as a PDF or a multi-page TIFF file as needed for processing in Kurzweil. A side effect of using ABBYY, though, is that it converts all PDF's it processes into bitmap images, so if we are only needing to work on a vector based PDF, then we use Adobe Acrobat Professional to do the processing if possible, as the resulting PDF remains vector based and the original quality is retained.

Kurzweil 3000 Version 10 has an interesting quirk/bug in that it frequently fails to find the Intel JPEG Processing Library that it uses internally to do certain image conversions. This can result in KES files that are missing thumbnail images. If you are processing PDF's or TIFF files in Kurzweil 3000 and are getting error messages when trying to close Kurzweil that warn of document corruption (ie: "some DocumentManager objects were not released; document files might be corrupted. Objects are: CNode CNode CLeaf...."), then your installation of Kurzweil 3000 is likely experiencing the same problem. Using system monitoring tools, we discovered that when this is happening, the Kurzweil 3000 process is looking for the "ijl15.dll" (Intel JPEG Processing Library) DLL, but fails to find it. If this is happening to you, it can be fixed simply by copying the "ijl15.dll" file into the "C:\Program Files\Kurzweil Educational Systems\Kurzweil 3000" folder from the "C:\Program Files\Kurzweil Educational Systems\Common Files" folder (be sure the file ends up located in both places). After copying this file to where Kurzweil 3000 is really looking for it, many of our stranger seemingly random Kurzweil 3000 Version 10 problems went away!

When we have to scan a book, we still prefer to scan the book into black and white TIFF files using the scanning utility provided with the scanner high speed duplex scanners as this is the fastest and provides the cleanest end results. The scanning utility lets us drop out certain colors (useful for dealing with books that have been highlighted in, have color backgrounds, etc.), and manually set the brightness, contrast, and noise clean-up settings to optimally utilize the scanner. The TIFF files we produce are detected upon creation by the OCR Rocket program and are automatically processed in Kurzweil 3000.

If judgment dictates that black and white scanning won't provide the best results, then we normally use our DR-9080C with Kurzweil 3000 to color scan the book directly into Kurzweil 3000. We set the brightness as needed, set the Kurzwiel 3000's OCR for ScanSoft Accurate and set the DynamicThresholding option to "On" for most books and get excellent results. But, as everyone knows this process of color scanning with Kurzweil 3000 is slow. Since Kurzweil 3000 doesn't work with our older DR-9050C scanners, we can't use the DR-9050C's to scan in color with Kurzweil 3000, but we will if we get swamped with texts requiring color scanning use the DR-9050C's to scan to color PDF's which we then KESI Virtual print to get them into Kurzweil.


Margaret Londergan and Brian Richwine
Manager, Adaptive Technology Centers
Bloomington and Indianapolis
Indiana University
812-856-4112

londerga at indiana.edu
-----Original Message-----
From: athen-bounces at athenpro.org [mailto:athen-bounces at athenpro.org] On Behalf Of athen-request at athenpro.org
Sent: Tuesday, February 26, 2008 2:50 PM
To: athen at athenpro.org
Subject: Athen Digest, Vol 25, Issue 38

Send Athen mailing list submissions to
athen at athenpro.org

To subscribe or unsubscribe via the World Wide Web, visit
http://athenpro.org/mailman/listinfo/athen_athenpro.org
or, via email, send a message with subject or body 'help' to
athen-request at athenpro.org

You can reach the person managing the list at
athen-owner at athenpro.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Athen digest..."


Today's Topics:

1. Re: File Size (Wiersma, Constance A)


----------------------------------------------------------------------

Message: 1
Date: Tue, 26 Feb 2008 13:50:18 -0600
From: "Wiersma, Constance A" <wiersmac at uww.edu>
Subject: Re: [Athen] File Size
To: "Access Technologists in Higher Education Network"
<athen at athenpro.org>
Message-ID:
<6A6DDE8A258FF94D86A95DC1898B909E0165DE45 at facmail3.uww.edu>
Content-Type: text/plain; charset="us-ascii"

You can scan materials into .pdf files and then process them directly
with Kurzweil using the KESI virtual printer. We scan a number of our
books using our Canon photocopier and FTP the file directly to a server
from which we download the files and run them through the Kurzweil
virtual printer.



We use Adobe Acrobat Professional for books that we get from publishers,
as they usually come as a single file, often with 500 or even 1000
pages. We "break" the file into chapters, which we then run through
Kurzweil and save as KESI files as Kurzweil does not work well if the
files are too large. This is a tremendous time saver for our
Alternative Media program as we get request to provide hundreds of texts
in an alternate format each semester.





Connie Wiersma, Assistant Director

Center for Students with Disabilities

University of Wisconsin-Whitewater

Whitewater, WI 53190

Ph. 262-472-5244





From: athen-bounces at athenpro.org [mailto:athen-bounces at athenpro.org] On
Behalf Of dann
Sent: Monday, February 25, 2008 2:54 PM
To: Access Technologists in Higher Education Network; Access
Technologists in Higher Education Network
Subject: Re: [Athen] File Size



But Gaeir - in order to use the KESI Virtual Printer don't you need to
have Adobe Acrobat Professional?



This is not something everyone has access to.





---------------

Daniel Berkowitz, CEO

DigiLife Media, LLC

1 Bryant Avenue

Bradford, MA 01835-7424



phone: 617-512-4315

mobile: 978-914-4601

e-mail: dann at digilifemedia.biz

web: www.digilifemedia.biz <http://www.digilifemedia.biz/>



________________________________

From: athen-bounces at athenpro.org on behalf of Gaeir Dietrich
Sent: Mon 2/25/2008 3:35 PM
To: 'Access Technologists in Higher Education Network'
Subject: Re: [Athen] File Size

The solution is to scan to TIFF, not to PDF. Kurzweil really does not
like PDF as much as it does TIFF, and your files will be much smaller.



Part of the problem is also that the best way to go from PDF is through
the KESI Virtual Printer, not to load the file directly. Using the
virtual printer will create smaller files.



******************************************************
Gaeir (rhymes with "fire") Dietrich
High Tech Center Training Unit of the
California Community Colleges
De Anza College, Cupertino, CA
www.htctu.net
408-996-6043

________________________________

From: athen-bounces at athenpro.org [mailto:athen-bounces at athenpro.org] On
Behalf Of John Elmer
Sent: Thursday, February 21, 2008 9:51 PM
To: Access Technologists in Higher Education Network
Subject: [Athen] File Size



Speaking of file size........

We have found that scanning documents using Capture Perfect with our
Canon 9080C into PDF's as Black and White for conversions to K3000 files
results in the K3000 files being gigantic. We are better off scanning
B/W documents as 24 bit color, as when we convert to K3000 the files are
many may times smaller.

Do others have similar experience? Solutions?




John F. Elmer
Alternate Media Specialist
Ventura College
Educational Assistance Center (DSP&S)
4667 Telegraph Road
Ventura, CA 93003
805.654.6400, x1278



-----athen-bounces at athenpro.org wrote: -----

To: "'Access Technologists in Higher Education Network'"
<athen at athenpro.org> <mailto:athen at athenpro.org>
From: "Ron Stewart" <ron.stewart at dolphinusa.com>
<mailto:ron.stewart at dolphinusa.com>
Sent by: athen-bounces at athenpro.org
Date: 02/21/2008 08:03PM
Subject: Re: [Athen] Abbyy FineReader v9?



I think it was on the DSSHE list but what we are seeing is files sizes
with Abbyy 9 being about 10 times larger than with Abbyy 8.



Ron Stewart



From: athen-bounces at athenpro.org [mailto:athen-bounces at athenpro.org] On
Behalf Of dann
Sent: Thursday, February 21, 2008 10:31 PM
To: Access Technologists in Higher Education Network; athen at athenpro.org
Subject: Re: [Athen] Abbyy FineReader v9?



There was an extended conversation about Abbyy Fine Reader version 9 on
this or another mailing list recently. The problem with Abbyy Version 9
is the size of the files produced in the scanning process. The general
consensus is that it is safest to stay with version 8 for the time being
especially when scanning textbooks and other large items.



---------------

Daniel Berkowitz, CEO

DigiLife Media, LLC

1 Bryant Avenue

Bradford, MA 01835-7424



phone: 617-512-4315

mobile: 978-914-4601

e-mail: dann at digilifemedia.biz

web: www.digilifemedia.biz <http://www.digilifemedia.biz/>



________________________________

From: athen-bounces at athenpro.org on behalf of normajean.brand
Sent: Thu 2/21/2008 9:43 PM
To: athen at athenpro.org
Subject: [Athen] Abbyy FineReader v9?

Has anyone tried out the new FineReader? I know there were discussions,
pros and cons on FineReader 8.0 and technical issues with regards to 8.0
for some folks. Just curious...



~NJ



-----------------------------------------------------------------------

Personal Mission Statement

Integrity, trust, commitment and talent is the foundation of my
existence.

My dedication to providing the highest level of technical expertise to
help

solve client's problems is why I'm here. I strive to create, innovate,
and to

research solutions to meet and exceed expectations.

-- NJ Brand

-----------------------------------------------------------------------

NJ Brand

Houston Community College-NW

Technical Support and Innovation Center

Assistive Technology Specialist/Sr. Lab Assistant

Town and Country Square Campus

MC 1379 Room RC13

1010 W. Sam Houston Pkwy N.

Houston TX 77043

VM/Office: 713.718.5604

FAX: 713.718.5430

Email: normajean.brand at hccs.edu

http://nwc.hccs.edu <http://nwc.hccs.edu/>

http://learning.nwc.hccs.edu/members/normajean.brand

-----------------------------------------------------------------------

_______________________________________________
Athen mailing list
Athen at athenpro.org
http://athenpro.org/mailman/listinfo/athen_athenpro.org



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://athenpro.org/pipermail/athen_athenpro.org/attachments/20080226/1ddd6bf9/attachment.html

------------------------------

_______________________________________________
Athen mailing list
Athen at athenpro.org
http://athenpro.org/mailman/listinfo/athen_athenpro.org


End of Athen Digest, Vol 25, Issue 38
*************************************




More information about the athen-list mailing list