Hello everyone, I am currently teaching myself language processing by using the book of NLTK -found at http://www.nltk.org/book - and I have a problem.

The following code retrieves every sentence in Shakespeare's Macbeth respectively as a list of list of list -or something like that- format:

from nltk.corpus import gutenberg

mySents = gutenberg.sents('shakespeare-macbeth.txt')

for example, mySents[0:5] results in:

[['[', 'The', 'Tragedie', 'of', 'Macbeth', 'by', 'William', 'Shakespeare', '1603', ']'], ['Actus', 'Primus', '.'], ['Scoena', 'Prima', '.'], ['Thunder', 'and', 'Lightning', '.'], ['Enter', 'three', 'Witches', '.']]

The first 5 sentences of Macbeth are pritten in the stuff up there.

My problem is, I want to turn the sentences of books in project Gutenberg to uppercase so I can perform a search without worrying about case sensitivity. So I should be able to get the output like the following:

[['[', 'THE', 'TRAGEDIE', 'OF', 'MACBETH', 'BY', 'WILLIAM'... and so on

Any help would be appreciated, thanks.

Recommended Answers

All 4 Replies

Like this?

mySents = [['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'], ['VOLUME', 'I']]
print [[sent.upper() for sent in sentences ] for sentences in mySents]

Like this?

mySents = [['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'], ['VOLUME', 'I']]
print [[sent.upper() for sent in sentences ] for sentences in mySents]

Hello, thanks for the reply. I have updated the example I gave since the Jane Austen example was too inadequate. Well, your solution covers some of my problem but I actually want to have a list like the Gutenberg's that has every sentence in uppercase format. -The reason is, in my algorithm, when I enter a sentence, I should be able to find another sentence that has the longest word in the first sentence I entered as an input. And the result will be two sentences together. For example when my trigger input is "Thane of code", Thane will be the longest word and -naturally- the longest sentence in macbeth that has the word "Thane" is, "Norway himselfe , with terrible numbers , Assisted by that most disloyall Traytor , The Thane of Cawdor , began a dismall Conflict , Till that Bellona ' s Bridegroome , lapt in proofe , Confronted him with selfe - comparisons , Point against Point , rebellious Arme ' gainst Arme , Curbing his lauish spirit : and to conclude , The Victorie fell on vs". So in order to have my algorithm in uppercase, I should have the uppercase version of the sentences of Macbeth in the gutenberg's list..

Store list of all words and the length of the sentence (you must decide how to handle punctuation and whitespace). Basically you count sentence length first and change the value of the longest sentence with the word (maybe index number of the sentence) and the length of it, but not if the previous value stored is longer than current sentence. Same process as counting words only instead of increasing counter do update of max containing sentence.

Store list of all words and the length of the sentence (you must decide how to handle punctuation and whitespace). Basically you count sentence length first and change the value of the longest sentence with the word (maybe index number of the sentence) and the length of it, but not if the previous value stored is longer than current sentence. Same process as counting words only instead of increasing counter do update of max containing sentence.

I will, thanks a lot :)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.