For a hobby project i'm working on i'd like to be able to generate some text based on some random inputs. Now i'd like this text to seem like it was writting by a human. I realise this is pretty hard but i was wondering if anyone could point me to some good books/papers about this subject. My google searches didn't quite give the results i was hoping for.
So basicly i need sources on how to generate language/written text.
Natural language generation is a rather advanced topic. Do you need the text to just LOOK natural, as in what is generated by http://www.lipsum.com/ or do you want actual English, as an example langauge?
You might want to look at the Wisdom of Chopra, which is a random quote generator from a database of mumbo-jumbo terms collected from the non-sensical tweets of Deepak Chopra (the meta-physics "guru"). They say you can contact them to get the sources.
I think that generating natural language is a very difficult topic, depending on the level of sophistication. It really depends on how much effort you want to put into forming complex grammatical sentences (as opposed to very simple "subject verb complement" forms), and if you want to actually have meaningful sentences (as opposed to just random, but grammatically correct, sentences). On the grammar side, you really want to look into linguistics, especially generative grammars, and there are libraries out there that can help you in this regard, like Boost.Spirit (which has grammar construction, parsing, and generation, in one unified set of DSEL C++ libraries). As for meaning, that's a whole different ball-game, and I would only have a vague idea of where to start. There are also ways to mimic meaning, drawing from machine learning concepts, such as predictive modeling and expectation-maximization (EM) algorithms, like for example, scanning databases of written work (books) and drawing correlations between words and their likelihood of being found together in sentences and stuff, such that you can generate sentences that kind of make sense (e.g., you are more likely to generate "a tree with green leaves" than "a car with blue leaves").
The problem really is that generating natural language is, in general, as hard as understanding natural language, which is definitely a big challenge. IBM got pretty good at it with Watson, but look at the size of that server farm!