[Athen] Docsoft

Wink Harner wink.harner at mcmail.maricopa.edu
Thu May 8 13:21:55 PDT 2008

Hi All,

Our school has been launching live video feeds/flash media w/o
captioning and since we now have a CDI in our office, she has
volunteered to 'test' every single one that comes out for captioning
access for the Deaf (I love my staff!). The information on the product
was emailed to someone on our staff and I, ever the skeptic, thought the
advertisement was a bit too good to be true. I think my recommendation
to media services will be to either hire a CART transcriptionist for the
script & schedule captioning work in our Center for Teaching & Learning
lab, or dupe the video & upload to an off-site service.

Your evaluation is really clear and I like that you mentioned what it is
good for as well as where the limitations are.

Thanks, Sean.


Sean Keegan wrote:

> Hi Wink,



>> What experience/hands-on knowledge might any of you have with

>> Docsoft software used for captioning?



> First the short version:

> As a tool to generate basic text information that could be used for

> searching an audio file, this is pretty good. As a tool to automatically

> generate captioned text information, I do not believe this tool is

> sufficient by itself (i.e., you *will* need someone to review and edit the

> content).


> Here is a longer version:

> We did some testing with the system on a audio file that was recorded in a

> studio quality environment with multiple takes to get the audio track "just

> right". When we ran the file through the DocSoft engine, we got about 91-92%

> accuracy. That is about one wrong word out of 10. Others that were testing

> the system (for podcasting-type situations) were able to get similar levels

> of recognition provided that they scripted out what they planned to say

> before recording.


> We then ran some audio clips that were recorded in more of a "classroom"

> type environment (i.e., non-studio, more dynamic interaction, etc.) and were

> able to get about 80-85% accuracy, which was similar to what others were

> getting. In reality, it was not so much that the system was not recognizing

> the words, it was that the system was mis-recognizing whole

> phrases/sentences. Because it is automated speech recognition, it is not so

> much an issue of misspelling a word, but mis-recognizing the spoken word as

> something altogether different. We ended up with text content that was very

> different from what was spoken.


> We found that when the recognition went below 90%, it became much more

> difficult to edit the generated transcript. The generated text content was

> very different from the spoken audio content to the point it did not make

> sense. This was not an issue of correcting just one or two words, but

> having to repeatedly review whole sentences/phrases to edit the text content

> vs. the spoken content. For content that was in the 80% region, there were

> significant problems with the content being totally out of context. It may

> be more effective to transcribe/parrot the audio file separately as opposed

> to using an automated solution.


> At the time, DocSoft had an editor tool in beta development that we did

> not get a chance to use, but their developers thought that it would be

> approximately 1-1.5 times the length of the audio clip for a person to

> edit the recognized text (this is after the audio clip has already been

> processed). So, for a 30 minute audio clip, you would be looking at a

> total processing time of 1 hour to 1 hour 15 minutes (30 minutes for audio

> clip processing, and 30-45 minutes for post-production audio clip editing).


> The DocSoft tool is basically running the Dragon Naturally Speaking

> engine (from Nuance), which is probably one of the better automated

> speech recognition engines commercially available. There is an option that

> you could use to train the user and this may result in an improvement. I

> was unable to test this component.


> So, as a tool to create basic text from an audio file for searching,

> then I think DocSoft is a good option. As a tool to automatically

> create transcripts (or captioned files), I think there is a lot more

> work that needs to be done. From what I have seen, you will need someone

> would need to go back through and proof the generated text AND audio to

> ensure accuracy.


> The question I ask is, what is the intent of using such a system? If

> you have an accurate transcript, then there are various vendor options

> for creating the time-stamped text file (and the level of searchability is

> FAR more granular). If all you are interested in doing is basic audio

> mining to add searchability to the audio content, then I think DocSoft has a

> very useful platform.


> Take care,

> Sean




> _______________________________________________

> Athen mailing list

> Athen at athenpro.org

> http://athenpro.org/mailman/listinfo/athen_athenpro.org


-------------- next part --------------
A non-text attachment was scrubbed...
Name: wink.harner.vcf
Type: text/x-vcard
Size: 154 bytes
Desc: not available
URL: <http://mailman12.u.washington.edu/pipermail/athen-list/attachments/20080508/e839c79d/attachment.vcf>

More information about the athen-list mailing list