I've worked with the Speech API in the past (VB6), and I'm now looking to use it in my project. For those who haven't read previous posts of mine, I'm developing an AI system in C#/.NET which learns from the internet. Well, I'm going to say I told a porky now, as my first step is to actually make it learn from a human oracle (myself).

As a first experiment, I want my system to pluck an image from the net, and present it to me. I will then tell it what the image is (e.g. "tree", "plane", "dog"). This is where I'm wondering about the Speech API's capabilities; is it possible to look at some kind of 'confidence' values for different possible words? What I mean here is, say for example somebody said "thicken"; the speech recognition engine might be 84% certain I said "thicken" and 58% certain I said "chicken"; is there any way to get at this information, as opposed to just the word recognised?

Thanks in advance.

7 Years
Discussion Span
Last Post by __avd

Your work sounds really interesting. Unfortunately, I don't have any concrete input for you. I'm familiar with some time-frequency stuff but applying that would be a guess on my part. However a good place to direct your search(in addition to posting here of course) would be in IEEE Xplore database (http://ieeexplore.ieee.org/search/freesearchresult.jsp?newsearch=true&queryText=speech+recognition&x=0&y=0). Even reading the first set of results there seems to be quite a few methodologies used. You need a subscription to access the full text (I don't have one any longer) but you could probably go through the abstracts and google off of the terms that sound promising or see if the investigators have copies on their website.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.