[MV] Voice Recognition Issues
Jay A. Kreibich
jay at kreibi.ch
Wed Feb 21 13:17:08 PST 2007
On Wed, Feb 21, 2007 at 08:55:28AM -0800, Regina Sadono scratched on the wall:
> One more thing, while I'm at it.... I have been a writer for all of my life
> and find that it's a very specific process that starts with creating
> words/sounds in the quiet of my mind and then these get recorded through the
> activity of my hands either by writing or typing. Writing is a very
> specific neurological process and I have not been able to access this
> process orally. Speaking is a completely different neurological process and
> puts me in a completely different place where I can't "compose."
This is a very good point that-- I suspect-- surprises a lot of people
that are new to continuous voice recognition systems. I'd even take
it one further than you have and say that the skills of "speaking"--
such as talking on the phone, giving a presentation, or just general
articulation in a conversation-- are much much different than the skill
involved in "dictation", which is really what current continuous voice
recognition systems are all about.
I played around with ViaVoice "back when" and was amazed at how poorly
it fit my writing style. Not accuracy, but things like "You mean I
actually have to SAY 'period' and 'comma'!??" Stuff like that slips
while typing without conscious thought. Like Regina, I too "hear
voices" in my head while I write, but those voices don't SAY "comma"--
the sub-conscious part of my brain that takes a thought and "outputs
typing" just understands where to insert all the non-spoken parts of
the written language as the thoughts I'm trying to express stream
through my head.
While I'm sure I could eventually learn the skill of dictation, it is
definitely a distinct, learned skill. My experiences with ViaVoice gave
me new respect for a few of the old-school lawyers I've worked with
that still do all their formal writing via (human) dictation. Before
playing around with ViaVoice I used to always think "Why don't you just
learn to type?!?" In a few cases, I was really confused because I knew
the person in question could, in fact, type fairly well as evident by
their email and other on-line communication, but still did most formal
writing (such as client letters or legal summaries) via dictation.
After my own attempts at dictation I realized what a unique skill it is,
and have come to appreciate that if that's the way you learned to
"write," changing that method-- even to something as "simple" as a
keyboard-- can be extremely difficult. It screws with the "writing"
(authoring?) process.
While I appreciate the need for accurate and specific dictation (and
think that today's systems to a fair job of delivering that, even if
only in a constructed environment), opening speech systems up to a
more general market-- and challenging the keyboard in a serious way--
will require a lot of language awareness and a great deal of "do what
I mean" analysis. Basically on-the-fly built-in grammar checking and
such that converts "the spoken word" into "the written word" with a
full awareness of the numerous non-spoken elements of the written
language (of course, they aren't truly non-spoken, they just aren't
words). In effect, something you could hook up to your TV to get a
correctly written transcript (accuracy and learning aside).
People just don't deal well with specialized input skills. Consider the
simple keyboard. Despite it's ubiquitous placement and fairly simple
operation, less than 20% of IT professionals can actually touch-type
(and that number is even lower for general computer users). People
aren't willing to develop specific skills to improve their usage.
Dictation style input is a keyboard with no letters on the keys. It
doesn't slow down someone with the skill to use the instrument
"correctly," but that's actually very few people. Printing letters
on the keys of a keyboard provide a bridge for those that can't use
the device correctly, but only need to use it "good enough." A "for
the people" speech system needs to do the same, because right now
developing the skills to efficiently use a speech recognition system--
from microphone placement to how one thinks and constructs thoughts while
dictating-- are highly unique and represent a huge barrier that
requires a great deal of motivation to overcome; it is not unlike
moving from a QWERTY to Dvorak keyboard. In the current market, it
seems that main motivation to use voice software is something of the
form of "because I have no other choice." That's an extremely poor
position from which to motivate your market.
-j
--
Jay A. Kreibich < J A Y @ K R E I B I.C H >
"'People who live in bamboo houses should not throw pandas.' Jesus said that."
- "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"
More information about the MacVoice
mailing list