It's been a while since I bugged you guys. But I need the council of my wise advisers once more.


I am trying to write a program that reads a .txt file, breaks each line into words, strips the whitespace and punctuation from the words and returns them in lowercase.

My code is far from done, but when I try to run even this. It only returns one word.
What more puzzling the word isn't even the first one nor the last. It is one near the end.

The code looks right when I read it. What did I do wrong?

import string

def read_book():
    f = open("alice_in_wonderland.txt", "r")
    for l in f.readlines():
        l.strip().translate(None, string.punctuation)
    for w in l.split(" "):
       if w != "":
        return w

def main():
    """
        main function
    """
    print read_book()
    return 0

if __name__ == "__main__":
    main()

here is the book if you guys want to run my code:
http://www.gutenberg.org/cache/epub/11/pg11.txt

print read_book() calls the read_book function once. The read_book function does something to each of the lines in the file, so at the end of the first for-loop the variable l (bad name for a variable... looks like a number 1) contains the last line of the file. Then the second for-loop reads the first word into variable w and returns w. Returning from a function means exiting the function, which is called only once. Result: the one word returned is the first word in the last line of the file.

well you can try this ok? GPL feel free to use it ;)

from __future__ import print_function
import re

with open("frisky.py","r") as file:
      real_value=[ re.sub('[_\W]*', "", value) for value in file] #I could make all one-liner but i thought you may want to extend on the code. wont you? ;)
      print("\n".join(real_value))

This code will strip every marks and punctiations for you provided the data without a headach ;) If you want your output together, remove "\n" from the join method

Number 2.

from __future__ import print_function
import re,urllib

file=urllib.urlopen("http://daniweb.com") # scrap daniweb
real_value=[ re.sub('[_\W]*', "", value) for value in file] # clean stuff
print("\n".join(real_value),end="")  # output. toggle "\n" if you want all togeter

Ok guys, I've gotten some progress. It is now printing the first word of the book in lowercase. But it is only printing one word.

I thought my for loops would make the code run through the entire book.
Why is my for loop...well not looping?

import string

def read_book():
    f = open("alice_in_wonderland.txt", "r")
    for l in f.readlines():
        l.strip().translate(None, string.punctuation)
        for w in l.split(" "):
            if w != "":
                return w.lower()

def main():
    """
        main function
    """
    print read_book()
    return 0

if __name__ == "__main__":
    main()

EUREKA!
Guys you are awesome.:cool:

Here is my final code:
I am kinda proud lol. It doesn't doesn't look nearly as nasty as some of my other code.:)

import string

def read_book():
    f = open("alice_in_wonderland.txt", "r")
    for l in f.readlines():
        book_line = l.strip().translate(None, string.punctuation)
        for w in book_line.split(" "):
            if w != "":
                print w.lower()

def main():
    """
        main function
    """
    print read_book()
    return 0

if __name__ == "__main__":
    main()

Good, except for one detail. Look at the last word at the end of your output when you run your program. Does it say 'None'? Is 'None' the last word in alice_in_wonderland.txt?

Some advice,and no None as d5ed poinet out.

import string

def read_book(file_in=None):
    '''
    Read in file,strips whitespace-punctuation
    Returns them in lowercase
     '''
    f = open(file_in)  #r is default
    for l in f:        #Dont need readlines
        book_line = l.strip().translate(None, string.punctuation)
        for w in book_line.split(" "):
            if w != "":
                print w.lower()

def main():
    """main function"""
    file_in = 'alice.txt'
    read_book(file_in)  #The function you calling has print
    #return 0           #Not needed in python how has an excellent garbage collection

if __name__ == "__main__":
    main()

Edited 6 Years Ago by snippsat: n/a

Some advice,and no None as d5ed poinet out.

import string

def read_book(file_in=None):
    '''
    Read in file,strips whitespace-punctuation
    Returns them in lowercase
     '''
    f = open(file_in)  #r is default
    for l in f:        #Dont need readlines
        book_line = l.strip().translate(None, string.punctuation)
        for w in book_line.split(" "):
            if w != "":
                print w.lower()

def main():
    """main function"""
    file_in = 'alice.txt'
    read_book(file_in)  #The function you calling has print
    #return 0           #Not needed in python how has an excellent garbage collection

if __name__ == "__main__":
    main()

snippsat I did a test run of your code, and you get "None" at the end too.
Why does it do that?

No,i dont get None at the end.

and
the
happy
summer
days
the
end
>>>

read_book(file_in) #ok
print read_book(file_in) #will return None,because you are calling a function that has no return statement

Edited 6 Years Ago by snippsat: n/a

No,i dont get None at the end.

and
the
happy
summer
days
the
end
>>>

read_book(file_in) #ok
print read_book(file_in) #will return None,because you are calling a function that has no return statement

Yes. my mistake

This question has already been answered. Start a new discussion instead.