On October 17th, 2016, Microsoft announced the
most advanced speech recognition software to date. Behind the breakthrough are deep neural
networks that utilize a large amount of data to train the software to recognize
patterns from inputs. These processing units are specialized in sounds as well
as graphics, allowing computers to process algorithms and deliver results at
speed not previously possible.
Originally reported a Word Error Rate of 6.3% in the month
of September, Microsoft achieved a 0.4% improvement in just a month, lowering
the percentage to 5.9. Although the error rate is far from perfection, it is
reported that this speech recognition software recognize just as well as humans
who were asked to transcribe the identical conversation. Created by the
National Institute of Standards and Technology, the test is comprised of a set
of telephone conversations in English, Spanish and Mandarin Chinese and has
been used as a benchmark to test speech recognition technologies by many tech
giants since the 1990s.
Moving forward, the team hopes to transition from recognition
to understanding. Now, the next challenge for Microsoft is to be able to employ
speech recognition technologies even in complex situations such as heavy background
noises and multi-party conversation.
No comments:
Post a Comment