[Athen] MathML Scanning and Conversion

Kathleen Cahill kcahill at MIT.EDU
Thu May 30 13:45:55 PDT 2013


You can try Infty Reader which does OCR of Math notation and says it can output in MathML: http://www.sciaccess.net/en/InftyReader/

Make sure you read all the caveats at the end of the page - it is buggy and not a straightforward process. We did use Infty Reader to scan some Physics textbooks into Latex but made sure we had Physics students doing the proofing and editing on the other end. Good luck!

Kathy


Kathleen Cahill
Assistive Technology Specialist
MIT ATIC (Assistive Tech. Info. Center)
77 Mass. Ave. 7-143
Cambridge MA 02139
(617) 253-5111
kcahill at mit.edu

* Features
Here are some features of InftyReader Ver. 2.8 :

1. It uses the OCR engines of Toshiba Corporation, "ExpressReaderPro", and of MediaDrive Corporation, "WinReader", simultaneously to improve the recognition results of characters in ordinary text areas. (As for the characters and math symbols in formulae, it uses Infty's OCR).
2. It can recognize tables including math expressions in the cells (in case the ruled lines are not broken),
3. It can convert PDF files into LaTeX or XHTML(MathML) including mathematical expressions, except for PDF including color images or gray images. (Note that InftyReader can process only black and white binary images)
It recognizes the page images of PDF files refering to the text information imbedded in PDF.

Attention: The original PDF should be of high resolution equivalent to 600dpi scanned images. Someimes PDF files existing on the WEB are of low resolution of the level 200dpi images, in order to reduce those file sizes. In such cases, the recognition results will be of very low quality of the level almost useless!
* Caution ---- Important!

1. Source documents have to be clearly printed.
2. It should be scanned in "binary" image, in 600dpi (or 400dpi).
3. InftyReader erases small noises, segments page images into picture areas, table areas and text areas automatically, and then recognizes text/table areas including mathematical expressions.
However, to get better recognition results, users are <<recommended>> to erase noises and pictures before the recognition.
4. In scanning, it is important to adjust the binarization threshold of the scanner so that the number of the touched or broken characters is less than 1% of the total number of the characters in each scanned page image.
* Operating Environment
InftyReader runs on Windows 7, Vista, XP, on a PC equipped with 500MB free memory or more.
Note that it does not run on Windows 98, Me, nor 2000. .


From: athen-list-bounces at mailman1.u.washington.edu [mailto:athen-list-bounces at mailman1.u.washington.edu] On Behalf Of Justin Hicks
Sent: Thursday, May 30, 2013 4:36 PM
To: athen-list at u.washington.edu
Subject: [Athen] MathML Scanning and Conversion

Hello,

Does anybody know of any services that can scan a math equation heavy book into MathML? Thanks.

--
Justin Hicks

University of Arkansas at Little Rock | Disability Resource Center
501.569.3143 | jxhicks at ualr.edu<mailto:jxhicks at ualr.edu> | ualr.edu/disability<http://ualr.edu/disability> | Provide Feedback on the DRC<http://ualr.edu/disability/feedback>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman12.u.washington.edu/pipermail/athen-list/attachments/20130530/3952ca2f/attachment.html>


More information about the athen-list mailing list