Google quietly introduced an audio search tool called GAudi this week in Google Labs. For now, Google is using to the tool for experimentation purposes to index political content on YouTube videos, but chances are they are exploring this for more than good citizenship points and will expand it at some point in the future. Arnaud Sahuguet, the Product Manager for the Google Audio Indexing project explained to me exactly what this new development means.
Up until now, only a few small multimedia search companies have been able to index audio effectively including Nexidia and TVEyes, two pioneering companies in the audio indexing space. In fact I wrote about these companies in a Streaming Media Magazine article last year called The Search is On. These companies are able to index with incredible speed with Nexidia building a phonetic index and TVEyes using a hybrid approach of phonetic indexing and using a library of known terms.
According to Sahuget, Google Audio Indexing uses speech technology to transform spoken words into text and uses the Google indexing technology to return the best results to the user. "The returned videos are ranked based -- among other things -- on the spoken content, the metadata, and the freshness of the video. We periodically crawl the YouTube political channels for new content. As soon as a new video is uploaded to YouTube, it is processed by our system and made available in our index for users to search."
When Google purchased YouTube a a couple of years ago, I wondered at the time what it intended to do with it, especially after investing $1.6 billion. With so many people looking at YouTube video, it certainly had potential to generate substantial advertising revenue, but beyond monetization of the medium, Google is first and foremost a search company, so developing some way to search video makes a lot of sense. Sahuget says that more and more content is being created in online video formats, which could explain why Google has decided to develop this tool now.
"Speech-to-text transcription is a useful tool which enables users to instantly find and consume video content by searching across videos for specific terms, even if the video's publisher hasn't transcribed it themselves." He explains that it's time-consuming for users to watch an entire video when they are only looking for a certain part. "Using speech-to-text technology, we can identify the portions of the video where the relevant content is spoken."
For now Sahuget says they are limiting the project to the YouTube political content, but it seems unlikely it will stop there even though he would not commit to anything specific. "Speech recognition is a challenging problem and we are constantly working to improve our technology, but we have nothing to announce at this time," he says. I'm betting this is only the beginning.