Sorry...this turned out to be such a long post.:sweat:

Hello. This was originally a problem I was facing with the user interface for my robot, but my latest post was never answered there. Right now I'm experimenting specifically with this problem: verbal raw input.

I've posted this problem in another python forum, but I don't think anyone there knows the answer...so I was wondering if anyone could help me in this community.

Here's what I'm trying to do. I've modified Inigo Surguy's windows speech recognition program to create something like a chat bot. I can get this chat-bot to respond to my sentences if they're predefined, which took me a while to figure out, but now its a really easy task.

What I want to do now is get this python program to understand new words, or "make friends" when it is introduced to other people. For instance I want to be able to say a new word to my computer while chatting with it, and ask me what that word means, or remember the name of a new person (there are all sorts of names out there). Its sort of an A.I. experiment I'm doing.

To get my computer to understand new words, I'm using a modified windows speech recognition python script by Inigo Surguy, which relies of the Windows speech recognition SDK and TTS. What I want to achieve is a verbal raw_input ability. I want to say something new to my computer, have the python script add that to a string, ask for an explanation, and add it to a list or database for reference.

Here's an example. If I say to my computer, "Will you be my friend?" I want my computer to say "Sure! What's your name?" It then waits for raw input from a string. The big cheese here: I want this raw input to come from my voice from the computer's microphone and dictate my words into the string. If I say, "John" I want my computer to put "John," in the raw input and add that name to a list.

Is there any possible way to do this using python?:-/

Recommended Answers

All 10 Replies

Almost certainly.

BTW, this is an amazingly cool project.

/gush

How does the input take place? For example: I speak a sentence. How is that sentence represented internally in Python? Does the speech recognition module return a list of words, or what?

Jeff

Thanks jrcagle!

Yes, vegaseat I'm modifying Inigo's python script from that sight, but not the one on the page, the download for the GUI one for wxPython. I couldn't figure out how to get the former to work for my application.

Here's how it works, jrcagle.

Using Surguy's script to harness the windows Speech SDK, I list some strings for phrases that the computer listens for. i.e. "Do you love me?" It will respond to me by speaking a corresponding string via TTS, or randomly choosing a slightly different phrase each time it recognizes the phrase, i.e. "Yes! I love everybody!"

Here's how it looks in actual code (modified from Surguy's Speech Reco GUI script):

class MyApp(wxApp):
    ADD_BUTTON_ID = 10
    DELETE_BUTTON_ID = 20
    LISTBOX_ID = 30
    EDITOR_ID = 40
    TEST_BUTTON_ID = 50
    TURNON_BUTTON_ID = 60 
    TURNOFF_BUTTON_ID = 70
    SAVE_FILENAME = "save.p"
    def setItems(self):
        try:
            self.items = pickle.load(open(self.SAVE_FILENAME))
        except IOError:
            self.items = {"Hello Nina" : "speaker.Speak(random.choice(Greet1))",
                          "I'm fine thank you" : "speaker.Speak(random.choice(Greet3))"}

Earlier on, random is imported and I have several lists of string responses with names such as "Greet3" or "Animals6."
Incidentally I'm afraid of posting all of Surguy's orginal code, because its gargantuan! But its for download on the link posted by vegaseat. I haven't taken anything out of that code, just added my own phrases, added the strings for the responses, and imported random.

So IOW, the dictionary self.items contains a list of recognized phrases ... such as "Hello Nina" ... and when you speak that phrase or something close, then the module looks up the phrase in the dictionary and then calls the corresponding action ... such as "speaker.Speak(random.choice(Greet1))"

Is that the right understanding?

If so, then I think we could hack this by simply changing the action to something like

"Hello Nina":"return 'Hello Nina'"

I don't know if that would work, but it's worth a shot.

Jeff

I'm still no wizard on python, but that seems right, Jeff. In the self.items, you simply type in any word or phrase you want the computer to listen for, and then type a corresponding operation.

I'm not sure what you mean by dictionary. The window's speech recognition dictionary? The python plugin dictionary?

Can you please illustrate to me how your hacking plan will help me obtain a verbal raw_input, Jeff? Something like "return 'Hello Nina'" means to jump to a specific line in the python script, is that right?

As you can see I still get lost pretty easily, but I'm making this project a learning process in itself.

Oh. Sorry. self.items in line 14 is a "dictionary" -- a hash table, if you are familiar with C -- that consists of key-value pairs joined by a colon, separated by commas.

So in line 14, self.items is set to be a default dictionary with two items:

key: "Hello Nina"
value: "speaker.Speak(random.choice(Greet1))"

key: "I'm fine thank you"
value: "speaker.Speak(random.choice(Greet3))"

in use, your MyApp object will have some method ... not shown in your code ... that looks up the spoken phrase "Hello Nina" and (apparently) executes the code associated with it.

To complete the hack, you will need to find that method.

Here's how the return statement will help. Hopefully, the calling method will not expect a string return value. So, our new dictionary will (hopefully) cause the program to crash and show you what method is accessing the self.items dictionary, which will then allow you to tweak or override that method. Be sure to try to run the program from an IDLE shell so that you get the traceback.

Jeff

P.S. If for some reason it doesn't cause a crash, we can try to feed it more bizarre commands.

Okay, Jeff. I implemented "return 'Hello Nina'" as you suggested and got lots of good stuff!

pythoncom error: Python error invoking COM method.

Traceback <most recent call last>:
  File "C:\Python25\lib\site-packages\win32com\server\policy.py", line 285, in _
Invoke_
    return self._invoke_<dispid, lcid, wFlags,args>
  File "C:\Python25\lib\site-packages\win32com\server\policy.py", line 290, in _
invoke_
    return S_OK, -1, self.invokeex_<dispid, lcid, wFlags, args, None, None>
  File "C:\Python25\lib\site-packages\win32com\server\policy.py", line 588, in _
ivokeex_
   return func<*args>
  File "C:\Users\Owner\Desktop\Python Script\Nina Verbal Raw Input.py", line 86,
 in On Recognition
    "for text ' "+newResult.PhraseInfo.GetText<>+"'">
<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'tb_linen
o'

Its pretty much greek to me, but let me see if I can decipher it.

The first part is pretty self explanatory: it uses pythoncom.
Then it gives me a directory, so it must be using something called "policy.py" in there as well. It gives me the same directory for several indicated lines three times over.
Then apparently it lists one of the lines I recognize in Inigo's original code "newResult.PhraseInfo.GetText"

Okay, now I just have to take this stuff and make it work for me somehow. Okay, I've never done this before, so if you could direct me a little, maybe?:?:

Oh ... nevermind. This should be easier. On the website vega linked to, we find this:

...
def OnRecognition(self, StreamNumber, StreamPosition, RecognitionType, Result):
        newResult = win32com.client.Dispatch(Result)
        print "You said: ",newResult.PhraseInfo.GetText()

I'm not sure yet how the results are gotten ... your code does that already, I presume ... but newResult, the return value from win32com.client.Dispatch(Result), has a GetText() method.

That's your equivalent to raw_input().

Jeff

Sorry I haven't replied in a while: been busy.

Ah, ha! So for verbal raw_input, I would use a GetText() somewhere in my script! Okay, so what I have to do is implement that operation.

But I wonder if I would have to implement a database of names for the computer to recognize, or is this there a way to make the computer listen and get text from anything you say? (I'm skeptical of the latter...).

I suspect not, but I could be surprised. Only thing to do is experiment.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.