[MV] Plaintalk accuracy statistics

Chuck Rogers chuck.rogers at macspeech.com
Mon Nov 5 23:39:14 PST 2007


John (and everyone else):

Unfortunately, speech recognition is not something that you can easily  
subject to scientific measurement. Why? Because the accuracy is  
dependent on so many factors, not the least of which is the speaker's  
voice, the microphone being used, and the environment in which speech  
recognition is being used. There are no "white papers" on the subject  
of which I am aware.

That having been said, I can provide some insight:

Technically, the term "PlainTalk" refers to a legacy speech  
recognition system built-into Mac OS 9 (as well as earlier) versions  
of the Macintosh operating system. Apple no longer refers to its built- 
in speech recognition system, which is command & control only, as  
"PlainTalk" but "English Speech Recognition." To make things a little  
easier moving forward, let's call it "ESR."

ESR is optimized to accept speech from any voice through the Mac's  
built-in microphone. Although many methods are used to optimize  
accuracy, one way is by limiting vocabulary. While you can add your  
own vocabulary to the system, you can't add enough to do dictation,  
only command and control. My personal experience is that ESR starts to  
"get confused" after about 1000 words. We did some extensive tests  
against the system about 4 years ago. To be honest, it may be capable  
of more today, not only because of improvements to the programming,  
but simply due to increased processing power of the computers that are  
shipping today versus those of 4 years ago. What happens at some point  
is that the more commands you add, the greater the likelihood some  
will sound similar to others. Since ESR is a speaker independent  
system (meaning no training is involved for it to learn the user's  
voice), it will begin mis-recognizing spoken commands sooner than  
speaker dependent systems where the speaker has trained the system.

This can be mitigated somewhat by the use of a noise-canceling  
microphone instead of the built-in mic, as that will filter out some  
of the background noise.

The question regarding GUI scripting is irrelevant in terms of ESR.  
You can make anything that is scriptable, including GUI Scripting a  
"speakable item." Your restriction is with the number of commands you  
have asked speakable items to manage, not with the nature of the  
script. Technically speaking, "GUI Scripting" is simply a series of  
AppleScript commands that access a feature of the Mac OS that is  
turned on when "Enable Access for Assistive Devices" is turned on in  
the Universal System Preference Pane. Since any AppleScript can be  
made a speakable item, your only restriction is the above mentioned  
"confusion" with command names that sound similar. The more you take  
care to make your command names unique, the more commands you can add,  
although there would still be a problem at some point due to the very  
nature of ESR.



Best Regards,

Chuck Rogers, Chief Evangelist
MacSpeech, Inc.





On Nov 5, 2007, at 5:30 PM, John F. Richardson wrote:

> Hello,
>
> What is the Plaintalk accuracy for speakable items?
>
> What is the Plaintalk accuracy for GUI scripting?
>
> Have there been any Apple technical reports on the subject?
>
> Have there been any open literature studies of Plaintalk?
>
> John F. Richardson
>
>
> _______________________________________________
> MacVoice mailing list
> MacVoice at listserver.themacintoshguy.com
> http://listserver.themacintoshguy.com/mailman/listinfo/macvoice





More information about the MacVoice mailing list