hi guys im trying to make the words that have capital letters in small caps, then count the numbers of times a certain words appears in the text.

i managed to count the number of times a word appears but now im having trouble converting upperletters to caps on all words.

here is my code:

words = {}
file = open ('input.txt', 'r')                    
for line in file:                             
    wordlist = line.split()

    for word in wordlist:

       [B] if words.upper(word):
            words[word]=words.lower(word)[/B]   [B]<<< not working for conversion but not sure if its correct[/B]        

        if words.has_key(word):
            words[word]=words[word]+1
        else:
            words[word] = 1       
for word in words:
        print word, words[word]

thank you

how about this

from collections import defaultdict
words = defaultdict(int)
for word in (w.lower() for w in open("input.txt").read().split()):
    words[word] += 1
for word, cnt in sorted(words.items()):
    print word, cnt

how about this

from collections import defaultdict
words = defaultdict(int)
for word in (w.lower() for w in open("input.txt").read().split()):
    words[word] += 1
for word, cnt in sorted(words.items()):
    print word, cnt

thanks mate! theres an error reading the first line
"from collections import defaultdict"

", line 1, in ?
from collections import defaultdict
ImportError: cannot import name defaultdict

any ideas?

Well defaultdict's are new in python 2.5. May be your python is 2.4 ?

yes that and it says "print word, cnt" on the last line is an invalid syntax? very wierd

yes that and it says "print word, cnt" on the last line is an invalid syntax? very wierd

sorry i was using 3.0, i went down to 2.5 and it worked.. silly versions always change things.

are u able to explain with lil comments beside each line what they do ? if u cant i understand. much appreciated.

Ok

from collections import defaultdict
# create a defaultdict (a dictionary with a default constructor for missing keys)
# see [url]http://docs.python.org/dev/library/collections.html#id3[/url]
words = defaultdict(int)
# open("input.text").read() returns the whole content of the file as a single string
# the .split() method cuts this string on white space, returning a list of non white
# blocks (words ?)
# w.lower() returns the word w with all letters in lowercase
# (w.lower() for w in ...) is an iterator over all words in the file, in lowercase
for word in (w.lower() for w in open("input.txt").read().split()):
    # add 1 to the count of this word in the dictionary words
    # if the word isn't already there, defaultdict creates an initial value by
    # calling int() which returns 0
    words[word] += 1
# words.items() is a list of all pairs (key, value) in the dictionary
# sorted( theList) returns a new list with the same items, but sorted
for word, cnt in sorted(words.items()):
    print word, cnt

I hope it explains a little :)

Ok

from collections import defaultdict
# create a defaultdict (a dictionary with a default constructor for missing keys)
# see [url]http://docs.python.org/dev/library/collections.html#id3[/url]
words = defaultdict(int)
# open("input.text").read() returns the whole content of the file as a single string
# the .split() method cuts this string on white space, returning a list of non white
# blocks (words ?)
# w.lower() returns the word w with all letters in lowercase
# (w.lower() for w in ...) is an iterator over all words in the file, in lowercase
for word in (w.lower() for w in open("input.txt").read().split()):
    # add 1 to the count of this word in the dictionary words
    # if the word isn't already there, defaultdict creates an initial value by
    # calling int() which returns 0
    words[word] += 1
# words.items() is a list of all pairs (key, value) in the dictionary
# sorted( theList) returns a new list with the same items, but sorted
for word, cnt in sorted(words.items()):
    print word, cnt

I hope it explains a little :)

appreciate it man! i understand it now..
im trying to view it on a html page now.

this is my code so far:

from collections import defaultdict
outfile = open("test.html", "w")
words = defaultdict(int)

print >>outfile, """<html>
<head>
<title>Words & frequency table</title>
</head>
<body>
<table border="1">"""


print >>outfile, "<tr><th>Words</th><th>Frequency</th></tr>"


for word in (wordz.lower() for wordz in open("input.txt").read().split()):
    words[word] = words[word] + 1
for word, cnt in sorted(words.items()):
    print >> outfile, "<tr><td>",word,"</td><td>", cnt,"</td></tr>"

print >>outfile, "</table></body></html>"

but its outputing word and cnt several times.. any ideas ? cheers
EDIT: i just fixed it!! its the code above.. thanks anyway :)

This article has been dead for over six months. Start a new discussion instead.