There are a few ways to do so. There are some online solutions, some free you can use and simply implement them in your project. For example.. http://www.captcha.net/
You can also easily build a solution. Using your server side code, you can generate a random string of characters. store that value in a session variable. Then convert that string of characters into a picture, add some lines and other things to the picture, send that picture to the user, requesting that the user display what characters they see on the screen. Upon user submission, compare what the user submitted to that of what you have stored in the session variable. if they match, the user was able to successfully extract the characters represented in the picture to actual text.
Other solutions I've seen is to ask a question such as "what is two plus two". you already have the answer server side. so you simply compare what the user submits to what you have stored in the session variable (or db) server side.
If you dont need to build your own, take a look at the captcha reference i provided above. That free solution includes audio as well. If you need to build you own, how you build it will depend on what server side scripting language you use. You can build it in a variety of ways. many examples on the internet using audio file clips. Or if you server side scripting has a speech Speech Synthesizer built into the framework you are using, that is an option as well.
root = tkinter.Tk()
buttons=[i for i in range(10)]
#If specific button is pressed, output "YES"
for num in buttons: