[MV] Voice Recognition Issues

Wed Feb 21 13:17:08 PST 2007

On Wed, Feb 21, 2007 at 08:55:28AM -0800, Regina Sadono scratched on the wall:

> One more thing, while I'm at it.... I have been a writer for all of my life
> and find that it's a very specific process that starts with creating
> words/sounds in the quiet of my mind and then these get recorded through the
> activity of my hands either by writing or typing.  Writing is a very
> specific neurological process and I have not been able to access this
> process orally.  Speaking is a completely different neurological process and
> puts me in a completely different place where I can't "compose."

  This is a very good point that-- I suspect-- surprises a lot of people
  that are new to continuous voice recognition systems.  I'd even take
  it one further than you have and say that the skills of "speaking"--
  such as talking on the phone, giving a presentation, or just general
  articulation in a conversation-- are much much different than the skill
  involved in "dictation", which is really what current continuous voice
  recognition systems are all about.

  I played around with ViaVoice "back when" and was amazed at how poorly
  it fit my writing style.  Not accuracy, but things like "You mean I
  actually have to SAY 'period' and 'comma'!??"  Stuff like that slips
  while typing without conscious thought.  Like Regina, I too "hear
  voices" in my head while I write, but those voices don't SAY "comma"--
  the sub-conscious part of my brain that takes a thought and "outputs
  typing" just understands where to insert all the non-spoken parts of
  the written language as the thoughts I'm trying to express stream
  through my head.

  While I'm sure I could eventually learn the skill of dictation, it is
  definitely a distinct, learned skill.  My experiences with ViaVoice gave
  me new respect for a few of the old-school lawyers I've worked with
  that still do all their formal writing via (human) dictation.  Before
  playing around with ViaVoice I used to always think "Why don't you just
  learn to type?!?"  In a few cases, I was really confused because I knew
  the person in question could, in fact, type fairly well as evident by
  their email and other on-line communication, but still did most formal
  writing (such as client letters or legal summaries) via dictation. 
  After my own attempts at dictation I realized what a unique skill it is,
  and have come to appreciate that if that's the way you learned to
  "write," changing that method-- even to something as "simple" as a
  keyboard-- can be extremely difficult.  It screws with the "writing"
  (authoring?) process.

  While I appreciate the need for accurate and specific dictation (and
  think that today's systems to a fair job of delivering that, even if
  only in a constructed environment), opening speech systems up to a
  more general market-- and challenging the keyboard in a serious way--
  will require a lot of language awareness and a great deal of "do what
  I mean" analysis.  Basically on-the-fly built-in grammar checking and
  such that converts "the spoken word" into "the written word" with a
  full awareness of the numerous non-spoken elements of the written
  language (of course, they aren't truly non-spoken, they just aren't
  words).  In effect, something you could hook up to your TV to get a
  correctly written transcript (accuracy and learning aside).

  People just don't deal well with specialized input skills.  Consider the
  simple keyboard.  Despite it's ubiquitous placement and fairly simple
  operation, less than 20% of IT professionals can actually touch-type
  (and that number is even lower for general computer users).  People
  aren't willing to develop specific skills to improve their usage.
  Dictation style input is a keyboard with no letters on the keys.  It
  doesn't slow down someone with the skill to use the instrument
  "correctly," but that's actually very few people.  Printing letters
  on the keys of a keyboard provide a bridge for those that can't use
  the device correctly, but only need to use it "good enough."  A "for
  the people" speech system needs to do the same, because right now
  developing the skills to efficiently use a speech recognition system--
  from microphone placement to how one thinks and constructs thoughts while
  dictating-- are highly unique and represent a huge barrier that
  requires a great deal of motivation to overcome; it is not unlike
  moving from a QWERTY to Dvorak keyboard.  In the current market, it
  seems that main motivation to use voice software is something of the
  form of "because I have no other choice."  That's an extremely poor
  position from which to motivate your market. 

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"'People who live in bamboo houses should not throw pandas.' Jesus said that."
   - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"