Hello everyone, I am trying to process various texts by regex and NLTK of python -which is at http://www.nltk.org/book-. I am trying to create a random text generator and I am having a slight problem. Firstly, here is my code flow:

Step-1)Enter a sentence as input -this is called trigger string, is assigned to a variable-

2)Get longest word in trigger string

3)Search all Project Gutenberg database for sentences that contain this word -regardless of uppercase lowercase-

4)Return the longest sentence that has the word I spoke about in step 3

5)Append the sentence in Step 1 and Step4 together

6)Assign the sentence in step4 as the new 'trigger' sentence and Repeat the process. Note that I have to get the longest word in second sentence and continue like that and so on-

So far, I have been able to do this only once. When I try to keep this to continue, the program only keeps printing the first sentence my search yields. It should actually look for the longest word in this new sentence and keep applying my code flow described above. Below is my code along with a sample input/output :

import nltk
from nltk.corpus import gutenberg
triggerSentence = raw_input("Please enter the trigger sentence: ")#get input str
split_str = triggerSentence.split()#split the sentence into words
longestLength = 0
longestString = ""

montyPython = 1

while montyPython:

    #code to find the longest word in the trigger sentence input
    for piece in split_str:
        if len(piece) > longestLength:
            longestString = piece
            longestLength = len(piece)


    listOfSents = gutenberg.sents() #all sentences of gutenberg are assigned -list of list format-
    
    listOfWords = gutenberg.words()# all words in gutenberg books -list format-
    
    lt = longestString.lower() #this line tells you whether word list has the longest word in a case-insensitive way. 

    longestSentence = max((listOfWords for listOfWords in listOfSents if any(lt == word.lower() for word in listOfWords)), key = len)
    #get longest sentence -list format with every word of sentence being an actual element-

    longestSent=[longestSentence]

    for word in longestSent:#convert the list longestSentence to an actual string
        sstr = " ".join(word)
    print triggerSentence + " "+ sstr
    triggerSentence = sstr

Sample input: "Thane of code"
Sample output:"Thane of code Norway himselfe , with terrible numbers , Assisted by that most disloyall Traytor , The Thane of Cawdor , began a dismall Conflict , Till that Bellona ' s Bridegroome , lapt in proofe , Confronted him with selfe - comparisons , Point against Point , rebellious Arme ' gainst Arme , Curbing his lauish spirit : and to conclude , The Victorie fell on vs"

Now this should actually take the sentence that starts with 'Norway himselfe....' and look for the longest word in it and do the steps above and so on but it doesn't. Any suggestions ? Thanks.

Recommended Answers

All 3 Replies

Line 10 is a problem: How do you ever quit? It should probably look something like while split_str: (Or: What is your halt condition?)

line 33 is also a problem: Your while loop starts after split_str is created the first time, so you need to add line 34: split_str = sstr.split() I have not paid any attention to whether this code would do what it should other than spotting the obvious problems, so no guarantees.

longestSentence = max((listOfWords for listOfWords in listOfSents if any(lt == word.lower() for word in listOfWords)), key = len)
    #get longest sentence -list format with every word of sentence being an actual element-
 
    longestSent=[longestSentence]
 
    for word in longestSent:#convert the list longestSentence to an actual string

Break down the list comprehension into something readable. Perhaps splitting on the sentence break, ". ", and sending each sentence to a function which checks for the trigger word, and then returns the length of the sentence if the trigger is found, or zero if not found.

longestSentence = max((listOfWords for listOfWords in listOfSents if any(lt == word.lower() for word in listOfWords)), key = len)
    #get longest sentence -list format with every word of sentence being an actual element-
 
    longestSent=[longestSentence]
 
    for word in longestSent:#convert the list longestSentence to an actual string

Break down the list comprehension into something readable. Perhaps splitting on the sentence break, ". ", and sending each sentence to a function which checks for the trigger word, and then returns the length of the sentence if the trigger is found, or zero if not found.

That, I will work on. Thanks for the tip.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.