Hi, just hoping someone can shed some light on the best way to read from a text file. I wish to create a program that will grab out specific words/numbers when chosen. I need to be able to extract data from different fields but also be able to determine if there are any empty entries. I can get it to read the by line, but not by words or numbers.

In the text file I have the words seperated by a , to create the seperate sections. EG

Barry, Crocker, 17, 19, 30

I am using the following code to input the whole line

[def main():
f = open( "Exp_2010_input.txt", "r") # Opening the text file
line = f.readline() # Reading the whole line
while len(line) !=0:
print line
line = f.readline()

main()]

Also, if someone could please explain to me the while len(line) ! =0, it would be greatly appreciated! I think it means that while the line length is not equal to 0, print line. Does the ! mean not?

Thanks in advance! Im just trying to get my head around all of this and its difficult to find good help!

Cheers,
Todd

I'm not sure what you mean by "the best way to read from a text file." In what respect? Speed? Memory Consumption? Elegance?

The while len(line) != 0: literally means "while the line is not an empty string" (Personally, I'd use the much cleaner while line: . It means the same thing with strings). In this context, it means to execute the block until you reach the end of the file, at which point the line will be an empty string. Note that a blank line within the file is seen as "\n" and not "" , and as such would not be seen as the end of the file.


Edit: yeah the ! in != means not.

Edited 6 Years Ago by scru: n/a

Im just after the best practice for reading from a text file, I guess it would fall under the elegance category.

Would I be correct in saying that the ! can be used anywhere I need to specify not equal to?

If you want to say "not equal to", use != .

For cases where you mean not, use the not keyword.

Examples:

>>> "5" not in "12345"
False
>>> not True
False
>>> not False
True

Thanks for clearing that up! How do I get the program to only get one word from the line in the text file?
With the example I gave earlier

Barry, Crocker, 17, 19, 30

Say I wanted a program to grab the surname (2nd word) from the list,
how would I achieve that?

Cheers

line = f.readline() # Grabs your line
sline = line.split() # Breaks your string into a list, separating it at " "
surname = sline[1] # Grabs the second part of the list, which is the surname
print surname

Thanks for clearing that up! How do I get the program to only get one word from the line in the text file?
With the example I gave earlier

Barry, Crocker, 17, 19, 30

Say I wanted a program to grab the surname (2nd word) from the list,
how would I achieve that?

Cheers

Assuming that a comma followed by a space separates each word, then you can use split:

words = open("myfile.txt").read().split(", ")
print words

A bit more flexible, assuming just the comma:

words = open("myfile.txt").read().split(",")
# At this point, each word in words may have a whitespace at the beginning.
# You can strip them off in a for loop
stripped_words = []
for word in words:
    stripped_words.append(word.lstrip())

print stripped_words

Hang on a second. If this is csv file, then you can just use the csv module.

wouldn't it be better to do this?

f = open("file.txt")
for line in f:
    line = line.strip()
    words = words.split(",")
    for word in words:
        print word.strip()

I know it's 2 loops, but it seems more flexible.

Thanks for all the help, I am starting to understand it a bit more.
I can get it to print out particular words in the first line (0 - 5) but I cannot get the other lines to do the same. I get IndexError: list index out of range. I have tried using
line=f.readline(1)
to get it to read the 2nd line, is this correct or am I going the wrong way about it all?

IndexError means that the index you are trying to get is not available.

For example, if a list (let's call it li) as 5 item. There are only li[0], li[1], li[2], li[3], li[4]

Since it's a text file. I'm going to assume that there are different numbers of words in different lines.

If you look at the python docs, You will see that the parameter for readline is not the index of the line, rather, it's the size:

file.readline()

Read one entire line from the file. A trailing newline character is kept in the string (but may be absent when a file ends with an incomplete line). [6] If the size argument is present and non-negative, it is a maximum byte count (including the trailing newline) and an incomplete line may be returned. An empty string is returned only when EOF is encountered immediately.

I would still suggest using the for loop. If you need to keep track of the line number, you can do this:

f = open("file.txt")
linenum = 0
for line in f:
    linenum += 1
    line = line.strip()
    words = words.split(",")
    if linenum == 5:
        # do whatever for the line 5. For example, I want to retrieve the last word.
        print words[-1]
## file.txt has:
"""
Larry, Brocker, 12, 4, 11
Sally, Strocker, 8, 9
Barry, Crocker, 17, 19, 30, 18
"""

## or keep it simple

lines=open("file.txt").readlines()
print lines

lines=[x.split() for x in lines]
print lines

print 'Last number in each line is:'
for i in lines: print i[-1]

print "Family name of 2. person is",lines[1][1].strip(',')

"""
>>> 
['Larry, Brocker, 12, 4, 11\n', 'Sally, Strocker, 8, 9\n', 'Barry, Crocker, 17, 19, 30, 18\n']
[['Larry,', 'Brocker,', '12,', '4,', '11'], ['Sally,', 'Strocker,', '8,', '9'], ['Barry,', 'Crocker,', '17,', '19,', '30,', '18']]
Last number in each line is:
11
9
18
Family name of 2. person is Strocker
"""

Edited 6 Years Ago by pyTony: n/a

This article has been dead for over six months. Start a new discussion instead.