It is unclear to what exactly you want to do in terms of "Speech". Do you just want to output what they are typing? For example:
the voice would say: "A for apple"
This wouldn't be too difficult, however, if you wanted to implement an algorithm that can identify what someone is saying, then it would be more difficult so probably best to use a library if you have no background in Signal Processing (The mathematics are intensive).
Post back a full description (in a lot more detail) to what it is exactly you want to do.