Hi there! For starters, let me say I am something of a Python newbie at the moment, but I'm learning more every day! Unfortunately, my script is at a standstill, and I can't figure out how to continue. I don't have an error, per se, but rather the inability to create one. :)

You see, due to my semi-lazy nature, I am creating myself a computer assistant in Python. So far, it's going swell! I can have it open programs and files, respond to simple phrases, and stuff of that sort. It can even use the Microsoft Speech API to speak verbally.

Now, my problem--- I want to be able to speak as well! At the moment, I'm using raw_input() to state commands. ("Open Firefox, please! Thanks!") Now, that works just fine, but I want to be able to use the Microsoft Speech API to not only synthesize voice, but recognize speech input as well. I already have a default Speech Recognition profile, et cetera, but I have no way to implement it into Python! I just want a function that captures voice input like raw_input() does, but as of present, I have no idea how to do that.

http://surguy.net/articles/speechrecognition.xml

I've already seen the above example, but I haven't been able to modify it to suit my needs. I want to use the built-in Microsoft dictionary, not one I define by myself.

Any help would be greatly appreciated! Thanks a lot! Unfortunately, I don't have any code examples, because I haven't made any headway on voice recognition. Sorry, and thanks!

Recommended Answers

All 14 Replies

Search this forum for posts by Seagull One. She has gotten a lot of this code working.

Jeff

Thanks for your response:

Unfortunately, in my searches, I can't find what I'm looking for. I guess I could just use predefined words, if there is no way to access the Microsoft Dictionary, but now Inigo Surguy's code (which was working for me in the past...) doesn't work anymore.

from win32com.client import constants
import win32com.client
import pythoncom

class SpeechRecognition:
    """ Initialize the speech recognition with the passed in list of words """
    def __init__(self, wordsToAdd):
        # For text-to-speech
        self.speaker = win32com.client.Dispatch("SAPI.SpVoice")
        # For speech recognition - first create a listener
        self.listener = win32com.client.Dispatch("SAPI.SpSharedRecognizer")
        # Then a recognition context
        self.context = self.listener.CreateRecoContext()
        # which has an associated grammar
        self.grammar = self.context.CreateGrammar()
        # Do not allow free word recognition - only command and control
			# recognizing the words in the grammar only
        self.grammar.DictationSetState(0)
        # Create a new rule for the grammar, that is top level (so it begins
			# a recognition) and dynamic (ie we can change it at runtime)
        self.wordsRule = self.grammar.Rules.Add("wordsRule",
                        constants.SRATopLevel + constants.SRADynamic, 0)
        # Clear the rule (not necessary first time, but if we're changing it
			# dynamically then it's useful)
        self.wordsRule.Clear()
        # And go through the list of words, adding each to the rule
        [ self.wordsRule.InitialState.AddWordTransition(None, word) for word in wordsToAdd ]
        # Set the wordsRule to be active
        self.grammar.Rules.Commit()
        self.grammar.CmdSetRuleState("wordsRule", 1)
        # Commit the changes to the grammar
        self.grammar.Rules.Commit()
        # And add an event handler that's called back when recognition occurs
        self.eventHandler = ContextEvents(self.context)
        # Announce we've started
        self.say("Started successfully")
    def say(self, phrase):
        self.speaker.Speak(phrase)


"""The callback class that handles the events raised by the speech object.
    See "Automation | SpSharedRecoContext (Events)" in the MS Speech SDK
    online help for documentation of the other events supported. """
class ContextEvents(win32com.client.getevents("SAPI.SpSharedRecoContext")):
    """Called when a word/phrase is successfully recognized  -
        ie it is found in a currently open grammar with a sufficiently high
	confidence"""
    def OnRecognition(self, StreamNumber, StreamPosition, RecognitionType, Result):
        newResult = win32com.client.Dispatch(Result)
        print "You said: ",newResult.PhraseInfo.GetText()
    
if __name__=='__main__':
    wordsToAdd = [ "One", "Two", "Three", "Four" ]
    speechReco = SpeechRecognition(wordsToAdd)
    while 1:
        pythoncom.PumpWaitingMessages()

I get the following error:

Traceback (most recent call last):
File "C:\Python25\Projects\SUSIE\VoiceRECO.py", line 54, in <module>
speechReco = SpeechRecognition(wordsToAdd)
File "C:\Python25\Projects\SUSIE\VoiceRECO.py", line 13, in __init__
self.context = self.listener.CreateRecoContext()
File "C:\Python25\lib\site-packages\win32com\gen_py\C866CA3A-32F7-11D2-9602-00C04F8EE628x0x5x0.py", line 2468, in CreateRecoContext
ret = self._oleobj_.InvokeTypes(10, LCID, 1, (9, 0), (),)
com_error: (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147221164), None)

Do you have any idea what that means?

No, I don't.

Do you know what Python module versions that script requires?

Hi,

I've written a simple speech synthesis and recognition module for Python that should help you out: speech.py . To install it, type
easy_install speech
in the C:\Program Files\Python25\Scripts directory. (If that doesn't work, save and run the script at http://peak.telecommunity.com/dist/ez_setup.py , then try it again.)

then try running the following code:

import speech
    import time
    def do_stuff(phrase, listener):
       speech.say("You said: %s" % phrase)
    speech.listenforanything(do_stuff)
    while True:
        time.sleep(.1)

That accesses the Windows dictionary that you were talking about, so that whatever you say, Windows tries its best to hear you, and then runs do_stuff with the phrase that it heard. speech.say will speak your text back to you out loud. You can use speech.listenfor() instead if you have a specific set of phrases in mind.

Be aware that Windows speech recognition sucks for general dictation. You need to train it via the Speech Control Panel entry before it does even a half-good job. And an excellent microphone helps.

Good luck, and if you run into trouble visit http://pyspeech.googlecode.com) to report bugs or ask more questions! I *think* I'll see replies to this thread show up in email, but I'm new to this forum, so I'm not sure...

- Michael

I don't quite understand how to install it. What do you mean to type that in the directory? Is there a specific python script I have to type it in? You just gave me a folder name. O_o

Hi,

What I meant was to type that at the Windows Command Prompt. Click the Start Menu, then click Run, and then type "cmd". That gives you the command prompt.

Then change directories to get to the directory I mentioned: type
cd C:\Program Files\Python25\Scripts
or maybe
cd C:\Python25\Scripts
depending on where you installed Python.

Then in there, type
easy_install speech
and see what happens. If it says that easy_install isn't found, you have to install it, by saving the Python script at the link that I gave you, then running it.

On the other hand, if easy_install runs, then it will have installed speech for you, and you'll be able to type "import speech" in Python and it will work.

Hope this makes things clearer,
Michael

Okay... I managed to install it. However, I have a problem.

import speech
import time
def do_stuff(phrase, listener):
	speech.say("You said: %s" % phrase)
speech.listenforanything(do_stuff)

That's my code... but I get an error. :P I'm thinking it's something obvious (do_stuff isn't defined... I don't know how the function works...)

Is it possible to, like raw_input(), use the do_stuff function, and have it put what I said in a string variable?

EX:
I say "hi".
Code puts "hi" in a variable.

Thnaks for all of your help!

Is it possible to, like raw_input(), use the do_stuff function, and have it put what I said in a string variable?

EX:
I say "hi".
Code puts "hi" in a variable.

Thnaks for all of your help!

Hi,

Your suggestion of a raw_input() like function is fabulous! There wasn't one, and I've been working with the code too long to notice the need for something like that. I just added one and released a new version of speech.py. Thank you!

To get it, type
easy_install speech
as you did before -- this will suck down the latest version (0.5.0).

That makes simple programs much easier. Try typing this in and see if it works:

import speech

# Prints a prompt and returns whatever text it heard you say
answer = speech.input("How do you like your eggs?")
print "Oh, you like them %s" % answer

# Prints a prompt and only returns when it hears one of a few phrases
answer = speech.input("Are you there?", ["Yes", "No", "Shut up"])
print "You said: %s" % answer

# Both the prompt and the list of phrases are optional.
answer = speech.input()
print "You said: %s" % answer

answer = speech.input(phraselist=["Goodbye", "Hello"])
print "You said: %s" % answer

I've given you credit on the pyspeech homepage -- look at http://pyspeech.googlecode.com at the bottom. Thanks again and good luck!

Michael Gundlach

Sorry, I thought that easy_install was smart enough to get the latest version when I updated. Just do:

easy_install speech==0.5.0

And it will get the latest (as of today! :) )

The install worked, but I'm still getting errors. =D

import speech


# Prints a prompt and returns whatever text it heard you say
#answer = speech.input("How do you like your eggs?")
#print "Oh, you like them %s" % answer

# Prints a prompt and only returns when it hears one of a few phrases
#answer = speech.input("Are you there?", ["Yes", "No", "Shut up"])
#print "You said: %s" % answer

# Both the prompt and the list of phrases are optional.
answer = speech.input()
print "You said: %s" % answer

#answer = speech.input(phraselist=["Goodbye", "Hello"])
#print "You said: %s" % answer

ERROR

Traceback (most recent call last):
File "C:\Python25\Projects\SUSIE\1.0.py", line 19, in <module>
answer = speech.input()
File "C:\Python25\lib\site-packages\speech-0.5.0-py2.5.egg\speech.py", line 160, in input
listener = listenforanything(response)
File "C:\Python25\lib\site-packages\speech-0.5.0-py2.5.egg\speech.py", line 191, in listenforanything
return _startlistening(None, callback)
File "C:\Python25\lib\site-packages\speech-0.5.0-py2.5.egg\speech.py", line 220, in _startlistening
context = _recognizer.CreateRecoContext()
File "C:\Python25\lib\site-packages\win32com\gen_py\C866CA3A-32F7-11D2-9602-00C04F8EE628x0x5x0.py", line 2468, in CreateRecoContext
ret = self._oleobj_.InvokeTypes(10, LCID, 1, (9, 0), (),)
com_error: (-2147352567, 'Exception occurred.', (0, None, None, None, 0, -2147221164), None)

Nevermind, I think I got it working... I had to reinstall the SAPI 5.1... thanks a lot! This should be exactly what I was looking for.

No need to give me credit-- actually making use of my suggestion is helpful enough. :D

Now you just need to have the python developers make "speech" a default module..

Hmm... is it possible to type, as well as speak, using the speech.input function? Meaning, if I want to type, that's fine, but have it also allow speech? Thanks. :)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.