On Thu, Feb 22, 2007 at 08:33:36AM -0600, Chuck Rogers scratched on the wall: > Rena (and everyone else): > > Simply put, no speech recognition can do what you ask, and it will > probably be many years before computers have the processing power > necessary to do so. That's true, but one might make a strong argument that this is true of all of today's dictation style system. The need for per-user training and near perfect microphone placement all points to this being a technology that isn't really ready for the masses. It's interesting research, but for someone like me who is able to use a keyboard quite well, it is nothing more than a toy... and one I gave up on fairly quickly, at that. Even if the benchmark is a human trained in dictation-- as opposed to someone such as a Court Recorder trained in stenography (and essentially does do on-the-fly transcriptions)-- modern software still isn't up to the task. In the case of my post, I'm well aware that what I proposed and envision as the ideal solution is far beyond current systems. Long term goals usually are. > The problem will be that the speaker is not > speaking his or her punctuation, Yes, they are, they just aren't using words. Oral languages came first. All the "extra" bits in the written language are there to fill the expression gap that the oral language carries in pauses, speed changes, pitch bends, and a number of other nuances. Those "extra marks" wouldn't be in the written language if the concepts they're attempting to express weren't in the oral language first. Saying punctuation isn't spoken is like saying every mark on a musical score that doesn't happen to be a note isn't "played." > and not speaking in an environment > with a consistent noise level, nor will they be using a noise- > canceling microphone that is in a consistent position in relation to > their mouth. Again, true, but the human ear and auditory processing systems are extremely good at dealing with this. A typical non-technical customer doesn't care about the fact that we don't really understand why the human auditory system is so amazingly good at isolating and tracking a single voice in a noisy environment. All they know is that it is really easy for them to do, so that is the expectation. Once more, I would say this is an example of why the technology is not ready for main-stream mass use. The fact that computers can't overcome these issues means the technology is lacking, not that consumers should re-adjust their expectations. Nature has shown us all in a very personal way that this is possible. I know it's extremely hard. Good things usually are. > All of these factors will introduce enough inaccuracy in > the transcribed text to make it not worth the effort. Exactly. So unless you're willing to learn the new and non-trivial skill of dictation and are able/willing to setup an environment in which that works, current voice recognition systems are a bust. Don't get me wrong-- I love the fact this technology is on the market and available those that need it. While I think this type of technology has a long way to go, I also think it is "good enough" to justify being on the market, having people pay money for it, and for research and development to continue. If you have no other choice, even the existing systems are a god-send. But I'll stand by the idea that, for the general consumer market, voice recognition in it's current state is a "no other choice" type thing. Building a market based off "no other choice" is an extremely poor position to work from. > We have many people using our transcription solution and what they do > is re-speak the audio in their own voice, inserting punctuation as > they go. This produces much more reliable transcription and still > saves about 30% of what it would take to type in the text manually. You must type very slowly. In my experience this method actually took longer, in all but the most informal or short writings. It is extremely difficult to go back though several pages of text that contain no punctuation what-so-ever and figure out something even as simple as where all the periods and commas go. You more or less re-create the whole authoring process to understand the flows and blocking of the thoughts as they were put into words. And once all that is done, all you have is a rough draft that still needs all the required editing and revising any other draft would require. This is actually why I first got into speech recognition systems. It wasn't worth it. I'm glad others have had better luck, as I'm happy to see the technology continue to evolve and improve-- and there is a lot of room for improvement. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "'People who live in bamboo houses should not throw pandas.' Jesus said that." - "The Ninja", www.AskANinja.com, "Special Delivery 10: Pop!Tech 2006"