[Athen] Creating an accessible PDF from a web page

Joseph Polizzotto MA jpolizzotto at berkeley.edu
Thu Jun 24 11:27:56 PDT 2021


Hi Debee:

If you are more comfortable using the command line, you can download the
HTML from the URL (e.g., you can type google-chrome --headless --dump-dom
'URL' > ./PATH/TO/FILE.html) and then use free utilities to convert the
(accessible Markdown) TXT version to PDF.

You could chain together these commands into a script to make it a simple
process. For instance, some possible tools in the script could be HTML2Text
<https://pypi.org/project/html2text/>, Pandoc <https://pandoc.org/>, or
OfficeToPDF <https://github.com/cognidox/OfficeToPDF>.

Of course with OfficeToPDF, there may be things that you would still need
to check in Acrobat, but the images + alt text, lists, and headings will
all be there...

HTML2Text is a great tool in that it would allow you to optionally exclude
things that you might not need or want from the web page. For instance, if
the images are not so important to the user who just needs alt text, you
can use the --images-to-alt option.

HTH,

Joseph

On Thu, Jun 24, 2021 at 9:02 AM Karen McCall <K4mccall at outlook.com> wrote:


> Whenever you choose “Print > Adobe PDF you will create an inaccessible PDF.

>

>

>

> You will need Acrobat to create a tagged PDF from a webpage.

>

>

>

> Choose File > Create > PDF from Webpage and copy the URL into the dialog.

>

>

>

> The dialog has a Settings button where you can check the checkbox to

> create Bookmarks and another to add PDF tags.

>

>

>

> You can also choose how many pages/layers of the website you want to

> convert to tagged PDF.

>

>

>

> Cheers, Karen

>

>

>

>

>

>

>

> *From:* athen-list <athen-list-bounces at mailman12.u.washington.edu> *On

> Behalf Of *Deborah Armstrong

> *Sent:* Thursday, June 24, 2021 11:45 AM

> *To:* Access Technology Higher Education Network <

> athen-list at u.washington.edu>

> *Subject:* [Athen] Creating an accessible PDF from a web page

>

>

>

> When I print a PDF of a web page from Edge or Chrome in Windows 10, I

> always get an inaccessible PDF.

>

>

>

> I haven’t needed to do this for a student yet, but I don’t understand why

> this happens. The text of the web page is already available to whatever

> default driver is printing my PDF. I’m using whatever the Windows default

> is, though I’ve tried other solutions.

>

>

>

> Before I need to do this for real, does anyone know how to easily print an

> accessible PDF from an accessible web page?

>

>

>

> --Debee

>

>

> _______________________________________________

> athen-list mailing list

> athen-list at mailman12.u.washington.edu

> http://mailman12.u.washington.edu/mailman/listinfo/athen-list

>



--
*Alternate Media Supervisor*
Disabled Students' Program
University of California, Berkeley
https://dsp.berkeley.edu/
<https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdsp.berkeley.edu%2F&data=02%7C01%7C%7C4e0abffcb5b34567a22308d5e13137b3%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636662523854357148&sdata=yB5%2BUm2W6TBwpc%2BOF4DvN8wPoo1dozUwz8eCepYhTyY%3D&reserved=0>
(510) 642-0329
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman12.u.washington.edu/pipermail/athen-list/attachments/20210624/e093c272/attachment.html>


More information about the athen-list mailing list