Hello all,

I have a question. You see i have a file which contains data in this format:

index 388.315813
index 311.214286
syndrome 289.708333
factor 184.246753
loss 168.578313
index 451.123455
factor 321.676544

What i want to do is to read every line and print it. However if i encounter the same word again (which is in the above case, "index") i want to print the word as well as the max number that has been read. how can i do that? So far i ve written this piece of code.

word=''
cvalue=''
for line in open('example.txt','r'):
    words=string.split(line)
    wordcount=len(words) 
    if wordcount == 2:
       word=words[0]
       cvalue=words[1]
       if word == words[0]:
          max_cvalue=max(words[1],cvalue)
          print word, max_cvalue

I know it is not much but i am a new user of python and i really like it :*

The thing i do not understand is how can i write in python "if you encounter the same word, then look at the value and print the word as well as the max value"

To be more specific from the above example here is what i would like to have as a result:

syndrome 289.708333
loss 168.578313
index 451.123455
factor 321.676544

Any help will be deeply appreciated.

Recommended Answers

All 10 Replies

I'm very glad that you like Python.

When you get more and more experienced in Python, it will get easier to translate your thought process into code.
So, when I think of a list of words that have individual values connected to them, then I usually think of Python's dictionaries.

Here's one way to do what you want:

wordsdict = {}
f = open('example.txt')
for line in f:
    words = line.split()
    if len(words) != 2:
        continue #skip to next line
    word = words[0]
    cvalue = float(words[1])
    if word not in wordsdict:
        wordsdict[word] = cvalue
    else:
        wordsdict[word] = max(cvalue,
                              wordsdict[word])
f.close()  #close it after you are done

for item in wordsdict:
    print " ".join((item, str(wordsdict[item])))

One reason I chose to use a dictionary is because dictionaries cannot have duplicate keys.
So there would be no way for the same word to be listed twice.

Except if the word is word. , "word or Word. I mean that punctuation or case can be problem if input is unprepared text and you want to count the words. That is not case here but I want to remind not to oversimplify. Remember Python-do axiom 'strong testing instead of strong typing'.

My style of coding for my 5 cents:

words = {}
for line in (line for line in open('example.txt','r') if len(line.split())==2):
    word, cvalue=line.lower().split()
    words[word]=(max(float(cvalue),words[word]) if word in words
                 else float(cvalue))
    print word,words[word]

My style of coding for my 5 cents:

words = {}
for line in (line for line in open('example.txt','r') if len(line.split())==2):
    word, cvalue=line.lower().split()
    words[word]=(max(float(cvalue),words[word]) if word in words
                 else float(cvalue))
    print word,words[word]

I would take out the print statement at the end,
as I think it would cause repeated entries to be displayed multiple times.
It is better to place out of the for loop.

Oh i never have thought of using dictionnaries. Still, there are so many things i neglect to think while writing a program. Programming is so cool-at least for these small applications. Thanks for the help! this is what i ve written and so far it works perfectly. i slightly adjust it because i wanted the results to be sorted based on the numeric value:)

import string

wordsdict = {} #creation of a dictionnary

word=''
cvalue=''
f=open('example.txt','r') #open a file
for line in f:
    words = line.split()
    #print words
    if len(words) == 2:
        word = words[0]
        cvalue = float(words[1])
        if word not in wordsdict:
            wordsdict[word] = cvalue
        else:
            wordsdict[word] = max(cvalue,wordsdict[word])
    elif len(words) == 3:
         word = words[0] + " " + words [1]
         cvalue = float(words[2])
         if word not in wordsdict:
             wordsdict[word] = cvalue
         else:
             wordsdict[word] = max(cvalue,wordsdict[word])


cvalue_list=[(val,key) for key,val in wordsdict.items()]
cvalue_list.sort(reverse=True)

file=open('max_cv.txt','a')
for word,cvalue in cvalue_list:
     file.write(str(cvalue)+ "   " + str(word)+ "\n")
     print cvalue, word
file.close()

Thanks again for the help :P

Request was to print value for every input line:

What i want to do is to read every line and print it. However if i encounter the same word again (which is in the above case, "index") i want to print the word as well as the max number that has been read. how can i do that? So far i ve written this piece of code.

Ok, I missed the real input from beginning, you are right. Then it is better to use list instead of dict if you want original order (latest Python has ordered dictionaries though). Ok later post told to sort the dict by value, so I fix my code:

words = {}
for line in (line for line in open('example.txt','r') if len(line.split())==2):
    word, cvalue=line.lower().split()
    words[word]=(max(float(cvalue),words[word]) if word in words
                 else float(cvalue))

print '\n'.join("%s %s" % (word,cvalue)
                for word,cvalue in sorted(words.items(),
                                          key=lambda x : x[1])) # lamda takes second value as key
""" Output with right input from original post:
loss 168.578313
syndrome 289.708333
factor 321.676544
index 451.123455
"""
#import string      <--unneeded

wordsdict = {} #creation of a dictionnary


#word='' unneeded
#cvalue=''
f=open('example.txt', 'r')
for line in f:
    words = line.split()
    #print words
    if len(words) == 2:
        word = words[0]
        cvalue = float(words[1])
    elif len(words) == 3:
        word = words[0] + " " + words [1]
        cvalue = float(words[2])
    #this way, code is not repeated
    if word not in wordsdict:
        wordsdict[word] = cvalue
    else:
        wordsdict[word] = max(cvalue,wordsdict[word])


cvalue_list=[(val,key) for key,val in wordsdict.items()]
cvalue_list.sort(reverse=True)

fil=open('max_cv.txt','a')  #never have variable named file
#because file is built-in type
#you would override it
for word,cvalue in cvalue_list:
     fil.write(str(cvalue)+ "   " + str(word)+ "\n")
     print cvalue, word
fil.close()

Those are some small mistakes that I found,
so you can remember to avoid them in the future.

:)Thanks guys for the help and setting me in the right path :-)

You truly rock!

All hail python!:twisted:

Just a quick question-i enhanced that program with calculating the average of the numeric values of words that appear multiple times (if a word appears more than once then i want the average of its values). I do not get an error but instead only the sum of each word's value. Any thoughts of why this may happen? Is something i do not see?:-/

f=open('example.txt','r') #open a file
for line in f:
    words = line.split()
    if len(words) == 2:   #we have only 2 or three words in the terminological heads file (along with cvalue)
        word = words[0]
        cvalue = float(words[1])  #float because the cvalue is a number      
    elif len(words) == 3:
         word = words[0] + " " + words [1] #slightly different here since we have two words
         cvalue = float(words[2])
         count = 0
         if word not in wordsdict:
             wordsdict[word] = cvalue
         else:
             count += 1
             wordsdict[word] = sum([cvalue,wordsdict[word]]) / count

I could not figure out your code but here is how I changed mine to do the average:

## average counting added
words = {}
for line in (line for line in open('example.txt','r') if len(line.split())==2):
    word, cvalue=line.lower().split()
    if word in words:
        words[word].append(float(cvalue)) 
    else:
        words[word] = [float(cvalue)]

## replace multiple value list by their average
for word,cvalue in words.items(): 
    words[word]= sum(cvalue)/len(cvalue)

print '\n'.join("%s %s" % (word,cvalue)
                for word,cvalue in sorted(words.items(),
                                          key=lambda x : x[1])) # lambda takes second value as key
""" Output with right input from original post:
loss 168.578313
factor 252.9616485
syndrome 289.708333
index 383.551184667
"""
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.