Analyzing text files

Question

wutwutwut 0 Newbie Poster

15 Years Ago

I know how to get a program to read a raw text file that has been piped to it, and do some simple things like counting letters, but I don't know how to read the actual words and store the entire word as a string.

For example, if a text file is (all words and values are separated by a tab):

Tree Branches Leaves Height
Cedar 12 80 100
Pine 9 70 80
Maple 15 100 120

How do I then assign the word (string) to a value, and then assign the word Branches to a different value?

All my previous attempts have just been able to either look at the entire text file as one string, or an entire line at a time as one string.

So basically what I'm asking is how do I assign just one word at a time within a line to a value which can later be recalled?

Is something like this on the right track?:

textfile = open("file.txt").read()
words_list = list(textfile)

python

4 Contributors
4 Replies
212 Views
17 Hours Discussion Span
Latest Post 15 Years Ago Latest Post by vegaseat

All 4 Replies

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

This might help you ...

# test data (tab separated)
data_test = """\
Tree    Branches    Leaves  Height
Cedar   12          80      100
Pine    9           70      80
Maple   15          100     120
"""

fname = "file.txt"

# write the test file
fout = open(fname, "w")
fout.write(data_test)
fout.close()

# read the test file back in
fin = open(fname, "r")
data_str = fin.read()
fin.close()

# create a header list and  
# a data_list of [tree, branches, leaves, height] lists
data_list = []
for index, line in enumerate(data_str.split('\n')):
    if index == 0:
        header_list = line.split()
    elif line:
        word_list = []
        for word in line.split():
            if word.isdigit():
                # convert string to integer value
                word_list.append(int(word))
            else:
                word_list.append(word)
        data_list.append(word_list)


print( header_list )
print( '-'*40 )
print( data_list )

""" my result -->
['Tree', 'Branches', 'Leaves', 'Height']
----------------------------------------
[['Cedar', 12, 80, 100], ['Pine', 9, 70, 80], ['Maple', 15, 100, 120]]
"""

# now you can access the data_list
# for instance you want to just get the type of tree and its height
for tree, branches, leaves, height in data_list:
    print( "%-15s %4d" % (tree, height) )
    
""" my result -->
Cedar            100
Pine              80
Maple            120
"""

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mattloto 0 Light Poster · Answer 1 · 2010-04-04T08:16:47+00:00

Doing what you are doing will make a list of each individual character, which is not what you want. To get an array of each line, use this:

lines=textfile.split('\n')

Every time in the textfile it sees a new line (which is represented by "\n" in python) it will make a new line in an array called lines. I'm not exactly sure how you want to set up your data, but I'll assume you want to use arrays. So to start out, declare the arrays:

Tree=[]
Branches=[]
Leaves=[]
Height=[]

Next, loop through each line, starting at the second one (since you don't want to have the heading included as data):

for line_number in range(1,len(lines)-1):
    current_line=lines[line_number]
    items_in_line=current_line.split()
    Tree.append(items_in_line[0])
    Branches.append(items_in_line[1])
    Leaves.append(items_in_line[2])
    Height.append(items_in_line[3])

Just in case you don't know, arrays start at 0 instead of 1. The split() function will make an array out of all the elements in the line that are separated by a space. Then, we use the append() function, which just adds a value to the end of the array, to add another value to the respective array. The first line with the for statement makes a loop where line_number starts out as 1 and goes until the number of items in the array lines - 1, because arrays are 0-based. Then to call or use a value later on, you need to use:

Trees[x]

with x being the row of data you want. Trees can be replaces by any of the other arrays.

Hoped that helped. I'm sorry if I made it too simple, I just don't know how much you know so to be on the safe side I wrote a lot.

Krstevski 7 Junior Poster · Answer 2 · 2010-04-04T08:21:42+00:00

I'm not sure if this is what you look for, but however...

#!/usr/bin/python
    
fd = open("hehe.txt")
content = fd.readline()

while (True):
    #read line by line
    content = fd.readline()
    if content == "":
        break
    # split the string
    line = content.split(" ")
    """
    line[0] = Tree
    line[1] = Branches
    line[2] = Leaves
    line[3] = Height
    """
    print line[0], line[1], line[2], line[3]
    """
    or
    tree_list.append(line[0])
    branches_list.append(line[1])
    ...
    """

wutwutwut 0 Newbie Poster · Answer 3 · 2010-04-04T14:42:54+00:00

Thank you very much for the two posts, they've really helped me out.

But in terms of what I actually want to do with the program, it will read a text file that has been piped to it, and then assign several variables to the relevant data depending on where it is within the text file, and then creates another piped file with all off the relevant data stored to several named variables.

mattloto, you mentioned that it takes the entire line and then puts it all in to one array, but I just want one word at a time within the line to have it stored and assigned to a value.

For example if I get the text file from the first post, I want to have it so that it's something like {Datatype1: Tree}, {Datatype2: Branches}, {Datatype3: Leaves}.... And then after the first line, I want the first word in the subsequent lines to have one type of value (Like {Specifictype1: Cedar}, {Specifictype2: Pine}, {Specifictype3: Maple}, etc. and then to assign the subsequent values within the Specifictypes line to be stored within that one Specifictype variable.

In my mind it seems really simple to do, but I've tried several different types of code but I don't know if I'm looking for a string, list, dictionary, tuple, and then what sort of loop I'd need to go through the whole text file.

Again, any help is greatly appreciated, and if you want me to be more clear on something please let me know.

Analyzing text files

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers