I have a question. You see i have a file which contains data in this format:
index 388.315813
index 311.214286
syndrome 289.708333
factor 184.246753
loss 168.578313
index 451.123455
factor 321.676544
What i want to do is to read every line and print it. However if i encounter the same word again (which is in the above case, "index") i want to print the word as well as the max number that has been read. how can i do that? So far i ve written this piece of code.
word=''
cvalue=''
for line in open('example.txt','r'):
words=string.split(line)
wordcount=len(words)
if wordcount == 2:
word=words[0]
cvalue=words[1]
if word == words[0]:
max_cvalue=max(words[1],cvalue)
print word, max_cvalue
I know it is not much but i am a new user of python and i really like it :*
The thing i do not understand is how can i write in python "if you encounter the same word, then look at the value and print the word as well as the max value"
To be more specific from the above example here is what i would like to have as a result:
syndrome 289.708333
loss 168.578313
index 451.123455
factor 321.676544
Any help will be deeply appreciated.
3
Contributors
10
Replies
23 Hours
Discussion Span
2 Years Ago
Last Updated
11
Views
Question Answered
Related Article:python - finding the maximum value
is a Python discussion thread by parijat24 that has 1 reply, was last updated 2 years ago and has been tagged with the keywords: python.
When you get more and more experienced in Python, it will get easier to translate your thought process into code.
So, when I think of a list of words that have individual values connected to them, then I usually think of Python's dictionaries.
Here's one way to do what you want:
wordsdict = {}
f = open('example.txt')
for line in f:
words = line.split()
if len(words) != 2:
continue #skip to next line
word = words[0]
cvalue = float(words[1])
if word not in wordsdict:
wordsdict[word] = cvalue
else:
wordsdict[word] = max(cvalue,
wordsdict[word])
f.close() #close it after you are done
for item in wordsdict:
print " ".join((item, str(wordsdict[item])))
One reason I chose to use a dictionary is because dictionaries cannot have duplicate keys.
So there would be no way for the same word to be listed twice.
Except if the word is word. , "word or Word. I mean that punctuation or case can be problem if input is unprepared text and you want to count the words. That is not case here but I want to remind not to oversimplify. Remember Python-do axiom 'strong testing instead of strong typing'.
words = {}
for line in (line for line in open('example.txt','r') if len(line.split())==2):
word, cvalue=line.lower().split()
words[word]=(max(float(cvalue),words[word]) if word in words
else float(cvalue))
print word,words[word]
words = {}
for line in (line for line in open('example.txt','r') if len(line.split())==2):
word, cvalue=line.lower().split()
words[word]=(max(float(cvalue),words[word]) if word in words
else float(cvalue))
print word,words[word]
I would take out the print statement at the end,
as I think it would cause repeated entries to be displayed multiple times.
It is better to place out of the for loop.
Oh i never have thought of using dictionnaries. Still, there are so many things i neglect to think while writing a program. Programming is so cool-at least for these small applications. Thanks for the help! this is what i ve written and so far it works perfectly. i slightly adjust it because i wanted the results to be sorted based on the numeric value:)
import string
wordsdict = {} #creation of a dictionnary
word=''
cvalue=''
f=open('example.txt','r') #open a file
for line in f:
words = line.split()
#print words
if len(words) == 2:
word = words[0]
cvalue = float(words[1])
if word not in wordsdict:
wordsdict[word] = cvalue
else:
wordsdict[word] = max(cvalue,wordsdict[word])
elif len(words) == 3:
word = words[0] + " " + words [1]
cvalue = float(words[2])
if word not in wordsdict:
wordsdict[word] = cvalue
else:
wordsdict[word] = max(cvalue,wordsdict[word])
cvalue_list=[(val,key) for key,val in wordsdict.items()]
cvalue_list.sort(reverse=True)
file=open('max_cv.txt','a')
for word,cvalue in cvalue_list:
file.write(str(cvalue)+ " " + str(word)+ "\n")
print cvalue, word
file.close()
What i want to do is to read every line and print it. However if i encounter the same word again (which is in the above case, "index") i want to print the word as well as the max number that has been read. how can i do that? So far i ve written this piece of code.
Ok, I missed the real input from beginning, you are right. Then it is better to use list instead of dict if you want original order (latest Python has ordered dictionaries though). Ok later post told to sort the dict by value, so I fix my code:
words = {}
for line in (line for line in open('example.txt','r') if len(line.split())==2):
word, cvalue=line.lower().split()
words[word]=(max(float(cvalue),words[word]) if word in words
else float(cvalue))
print '\n'.join("%s %s" % (word,cvalue)
for word,cvalue in sorted(words.items(),
key=lambda x : x[1])) # lamda takes second value as key
""" Output with right input from original post:
loss 168.578313
syndrome 289.708333
factor 321.676544
index 451.123455
"""
#import string <--unneeded
wordsdict = {} #creation of a dictionnary
#word='' unneeded
#cvalue=''
f=open('example.txt', 'r')
for line in f:
words = line.split()
#print words
if len(words) == 2:
word = words[0]
cvalue = float(words[1])
elif len(words) == 3:
word = words[0] + " " + words [1]
cvalue = float(words[2])
#this way, code is not repeated
if word not in wordsdict:
wordsdict[word] = cvalue
else:
wordsdict[word] = max(cvalue,wordsdict[word])
cvalue_list=[(val,key) for key,val in wordsdict.items()]
cvalue_list.sort(reverse=True)
fil=open('max_cv.txt','a') #never have variable named file
#because file is built-in type
#you would override it
for word,cvalue in cvalue_list:
fil.write(str(cvalue)+ " " + str(word)+ "\n")
print cvalue, word
fil.close()
Those are some small mistakes that I found,
so you can remember to avoid them in the future.
Just a quick question-i enhanced that program with calculating the average of the numeric values of words that appear multiple times (if a word appears more than once then i want the average of its values). I do not get an error but instead only the sum of each word's value. Any thoughts of why this may happen? Is something i do not see?:-/
f=open('example.txt','r') #open a file
for line in f:
words = line.split()
if len(words) == 2: #we have only 2 or three words in the terminological heads file (along with cvalue)
word = words[0]
cvalue = float(words[1]) #float because the cvalue is a number
elif len(words) == 3:
word = words[0] + " " + words [1] #slightly different here since we have two words
cvalue = float(words[2])
count = 0
if word not in wordsdict:
wordsdict[word] = cvalue
else:
count += 1
wordsdict[word] = sum([cvalue,wordsdict[word]]) / count
I could not figure out your code but here is how I changed mine to do the average:
## average counting added
words = {}
for line in (line for line in open('example.txt','r') if len(line.split())==2):
word, cvalue=line.lower().split()
if word in words:
words[word].append(float(cvalue))
else:
words[word] = [float(cvalue)]
## replace multiple value list by their average
for word,cvalue in words.items():
words[word]= sum(cvalue)/len(cvalue)
print '\n'.join("%s %s" % (word,cvalue)
for word,cvalue in sorted(words.items(),
key=lambda x : x[1])) # lambda takes second value as key
""" Output with right input from original post:
loss 168.578313
factor 252.9616485
syndrome 289.708333
index 383.551184667
"""