Hello everyone, I am currently teaching myself language processing by using the book of NLTK -found at http://www.nltk.org/book - and I have a problem.

The following code retrieves every sentence in Shakespeare's Macbeth respectively as a list of list of list -or something like that- format:

from nltk.corpus import gutenberg

mySents = gutenberg.sents('shakespeare-macbeth.txt')

for example, mySents[0:5] results in:

[['[', 'The', 'Tragedie', 'of', 'Macbeth', 'by', 'William', 'Shakespeare', '1603', ']'], ['Actus', 'Primus', '.'], ['Scoena', 'Prima', '.'], ['Thunder', 'and', 'Lightning', '.'], ['Enter', 'three', 'Witches', '.']]

The first 5 sentences of Macbeth are pritten in the stuff up there.

My problem is, I want to turn the sentences of books in project Gutenberg to uppercase so I can perform a search without worrying about case sensitivity. So I should be able to get the output like the following:

[['[', 'THE', 'TRAGEDIE', 'OF', 'MACBETH', 'BY', 'WILLIAM'... and so on

Any help would be appreciated, thanks.

Edited 3 Years Ago by mike_2000_17: Fixed formatting

Like this?

mySents = [['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'], ['VOLUME', 'I']]
print [[sent.upper() for sent in sentences ] for sentences in mySents]

Like this?

mySents = [['[', 'Emma', 'by', 'Jane', 'Austen', '1816', ']'], ['VOLUME', 'I']]
print [[sent.upper() for sent in sentences ] for sentences in mySents]

Hello, thanks for the reply. I have updated the example I gave since the Jane Austen example was too inadequate. Well, your solution covers some of my problem but I actually want to have a list like the Gutenberg's that has every sentence in uppercase format. -The reason is, in my algorithm, when I enter a sentence, I should be able to find another sentence that has the longest word in the first sentence I entered as an input. And the result will be two sentences together. For example when my trigger input is "Thane of code", Thane will be the longest word and -naturally- the longest sentence in macbeth that has the word "Thane" is, "Norway himselfe , with terrible numbers , Assisted by that most disloyall Traytor , The Thane of Cawdor , began a dismall Conflict , Till that Bellona ' s Bridegroome , lapt in proofe , Confronted him with selfe - comparisons , Point against Point , rebellious Arme ' gainst Arme , Curbing his lauish spirit : and to conclude , The Victorie fell on vs". So in order to have my algorithm in uppercase, I should have the uppercase version of the sentences of Macbeth in the gutenberg's list..

Edited 6 Years Ago by koveras vehcna: grammar errors

Store list of all words and the length of the sentence (you must decide how to handle punctuation and whitespace). Basically you count sentence length first and change the value of the longest sentence with the word (maybe index number of the sentence) and the length of it, but not if the previous value stored is longer than current sentence. Same process as counting words only instead of increasing counter do update of max containing sentence.

Store list of all words and the length of the sentence (you must decide how to handle punctuation and whitespace). Basically you count sentence length first and change the value of the longest sentence with the word (maybe index number of the sentence) and the length of it, but not if the previous value stored is longer than current sentence. Same process as counting words only instead of increasing counter do update of max containing sentence.

I will, thanks a lot :)

This question has already been answered. Start a new discussion instead.