[Athen] Title, tags, lang - where are they in a PDF document? Beginning or end?

Corrine Schoeb kschoeb1 at swarthmore.edu
Wed May 24 08:16:51 PDT 2017


Thank you to everyone who has responded so far.

I think I need to clarify - this is a scan using code not a physical
scanner. We've developed a scan for our Moodle instance. Right now, it
can recognize text vs. an image of text but we are working on refining that
scan further. Large documents take up a lot of cpu/memory so we are
thinking we might be able to limit our scan the first 5-10 pages to see if
there is a title, tags, etc. I'm just not sure where that data is stored -
at the beginning or at the end of the PDF.

I know this is very technical question and a bit obscure but I figured this
might be the right group.


On Wed, May 24, 2017 at 8:34 AM, Corrine Schoeb <kschoeb1 at swarthmore.edu>
wrote:


> We are working on creating a scan of PDF documents, some of which are 100+

> pages. Rather than scan the full document to find out if it is tagged, has

> a title and language we thought we might be able to do the first 5-10 pages

> but I'm not sure where the title, tag, lang data is stored in a PDF.

>

> So my question is, is title, tag, lang attributes of a PDF stored at the

> beginning of a PDF or at the end?

>

> --

>

> Corrine Schoeb

> Technology Accessibility Coordinator, ITS

> 610-957-6208 <(610)%20957-6208>

>

> *** Swarthmore College ITS will never ask you for your password, including

> by email. Please keep your passwords private to protect yourself and the

> security of our network.

>

> To learn more about web security visit http://www.swarthmore.

> edu/its/security

>

>



--

Corrine Schoeb
Technology Accessibility Coordinator, ITS
610-957-6208

*** Swarthmore College ITS will never ask you for your password, including
by email. Please keep your passwords private to protect yourself and the
security of our network.

To learn more about web security visit
http://www.swarthmore.edu/its/security
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman12.u.washington.edu/pipermail/athen-list/attachments/20170524/8768bd57/attachment.html>


More information about the athen-list mailing list