0

Hello everyone, I have created my own random text generator with a custom method, no Markov chains included, and now I would like to try it on a different text corpus that is larger from that of NLTK's and I wanted to know which Data structure should I use in order to make the code work faster since additional text files will surely make the code a painstaking procedure to execute. My algorithm is as follows:

1- Enter the trigger sentence -only once, at the beginning of the program-
2- Get the longest word in the trigger sentence
3- Find all the sentences of the corpus that contain the word at step2
4- Randomly select one of those sentences
5- Get the sentence (named sentA to resolve the ambiguity in description) that follows the sentence picked at step4 -so long as sentA is longer than 40 characters-
6- Go to step 2, now the trigger sentence is the sentA of step5

Which data structure would be the most optimal for this one ? -I originally used Lists for the code I created- Thanks in advance.

2
Contributors
2
Replies
3
Views
6 Years
Discussion Span
Last Post by koveras vehcna
1

Profile your code with cProfile to see what operations take most time. i would think that dictionary of list of sentences (or their index) containing given word would be helpfull.

0

Profile your code with cProfile to see what operations take most time. i would think that dictionary of list of sentences (or their index) containing given word would be helpfull.

Thanks for the information.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.