Hello, i'm new to this forum and to Python and i want to creat a function that counts the number of words on .txt file, but also that creats a dictionary with it, with the words has a key and the number of referencies to it has a value.
Thanks and best regards,
Luis Ventura

Recommended Answers

All 37 Replies

You got to show some coding efforts.

Sounds good project, good luck. If you get stuck, you can post your effort, and regulars can help you point out where to look for solution. When posting, notice that pushing the (CODE) gets you tags to post code nicely with indention by pasting between them.

Sounds like an application for collections.Counter.

First do the Python tutorial to learn the Python language.

http://docs.python.org/py3k/tutorial/index.html

Then review the library reference to get an overview of Python's built in functionality.

http://docs.python.org/py3k/library/index.html

Finally, read the documentation for collections and try to write your program.

http://docs.python.org/py3k/library/collections.html

We can't help you if you don't do the work.

commented: very nicely said +15

So, i managed to create a function that open a text file, reads it line by line and then prints the values[].

infile = open("text.txt", mode='r');
line=infile.readline()
while line:
    values=line.split()
    print("QB", values [0],values[1], "had a rating of", values[2])
    line=infile.readline()

infile.close

Then i managed to create dictionaries in Shell:

spanglish={}
spanglish["perro"]="dog"
spanglish["botella"]="bottle"
print(spanglish)

However i couldn't put them togheter. I know i have to add the elements of the text to an empty dictionary, but i couldn't do it. What do you reckon?

Not a bad start.

You are counting words, so your keys are going to be words and the values are going to be integers.

If a word isn't present in the dictionary you need to add it. Otherwise you need to increment the counter.

if word not in counter:
    counter[word] = 1
else:
    counter[word] += 1

You have enough of a grasp to do it. The only thing you need to do is learn about string methods so when you decide how you want to handle stuff like whitespace and punctuation you know what to do.

Gee thanks! I know it's kinda sketchy, but is the counting part right?

infile = open("text.txt", mode='r');
line=infile.readline()
while line:
    count+=0
    word=line.split
    if word not in count:
        count[word]=1
    else:
        count[word]+=1

"count" needs to be a dictionary.
"line.split" is a function so it should be "line.split()".
"line.split" returns an iterable of words. You need to use a for loop to iterate over each word and increment its counter.

Hum interesting. I coded this

infile = open("texto.txt", mode='r');
line=infile.readline()
words=line.split()
for words in line:
    count={}
    if words not in count:
        count[words]=1
    else:
        count[words]+=1
print(count)

But im pretty sure i need the indexes in words. Problem is i don't know how to do it with lines i do not know the lenght. The only feedback i got was

{'\n': 1}

You are counting characters in first line (by readline at line 2) y and ou have empty line at begining. Words is character in line, very misleading name. Use extra prints to debug and check your assumptions!

Should i use

text=infile.readlines()

?

Why would you not try. BTW how many code snippets have you looked through. (you might check the post linked in my signature)

Sorry. Last post was idiotic. Anywho,

infile = open("text.txt", mode='r');
aline=infile.readline()
words=aline.split()
while aline:
    for words in aline:
        count={}
        if words not in count:
            count[words]=1
        else:
            count[words]+=1
        

print(count)

With this i think it should do:
While there is lines in the the text to read(while line);
For words(aline.split) in that especific line, add those words to dictionary and count 1. If they are already in count, count+=1. But it doesnt print(count). What im i doing wrong?

you are reading only one line before while loop. Try

for aline in infile:

instead.

Wow. Sounds nice. I applied it on my code, but now it gives an error.

infile = open("texto.txt", mode='r');
aline=infile.readline()
words=aline.split()
while aline:
    for aline in infile:
        count={}
        if words not in count:
            count[words]=1
        else:
            count[words]+=1
        

print(count)
File "C:\Users\Utilizador\Desktop\Luis\IP\Workspace\py3\src\Try1.py", line 7, in <module>
    if words not in count:
TypeError: unhashable type: 'list'

What now? =/

You should remove while and readline, For should be after opening the file, after current line 6. ..

Well, still same error.

infile = open("texto.txt", mode='r');
for aline in infile:
        aline=infile.readline()
        words=aline.split()
        count={}
        if words not in count:
            count[words]=1
        else:
            count[words]+=1
        

print(count)
File "C:\Users\Utilizador\Desktop\Luis\IP\Workspace\py3\src\Try1.py", line 7, in <module>
    if words not in count:
TypeError: unhashable type: 'list'

Each word in words should be a key.

for word in words:
    if word not in count:
        count[word] = 1
    else:
        count[word] += 1

So i should replace

for aline in file

for

for word in words

?

No. Put it after "words=aline.split()".
Also, move "count={}" to above "for aline in infile:".

I couldnt understood in what order have you put them. Can you show me how have you done that part?

If anyone post an example i would be very pleased. I sooo close!

infile = open("afile.txt")
count = {}
for aline in infile:
    words = aline.split()
    for word in words:
        if word in count:
            count[word] += 1
        else:
            count[word] = 1

print(count)

An other way this code also remove punctuation and covert to lower case.

from collections import Counter

with open('afile.txt') as f:
    text = f.read().lower()
words = [c for c in text.split() if c.isalpha()]

print(Counter(words))

Thanks very much, it was a great help. Im trying to glue those two togheter, but i got an problem.

infile = open("texto.txt")
f=infile.read().lower()
count = {}
for aline in f:
    words = aline.split()
    for word in words:
        if word in count:
            count[word] += 1
        else:
            count[word] = 1
 
print(count)

So read().lower() takes all the text into a string. Isnt there a way that creats a string per line (like readline), and puts it all in lower case?

You can do like this.

infile = open("afile.txt")
count = {}
for aline in infile:
    words = aline.lower().split()
    for word in words:
        if word in count:
            count[word]  +=1
        else:
            count[word] = 1

print(count)

Also remove punctuation,this is an other way than isalpha() ,use string.punctuation .
This has advangtes that is also count words that can have number.
So car nr5 will be two words, isalpha() will also remove numbers.

from string import punctuation

infile = open("afile.txt")
count = {}
for aline in infile:
    words = ''.join(c for c in aline.lower() if c not in punctuation)
    words = words.split()
    for word in words:
        if word in count:
            count[word]  +=1
        else:
            count[word] = 1

print(count)

Just to take out that line and show you how it works in IDLE.

>>> from string import punctuation
>>> s = 'The quick,,, brown?? fox... jumps!! over the lazy++ dog....'
>>> ''.join(c for c in s.lower() if c not in punctuation)
'the quick brown fox jumps over the lazy dog'

A little python power;)

Very nice indeed. Wasn't aware of words = aline.lower().split() . Very good, problem solved. Thanks guys! :D

One last thing. I tried to put it inside a function, so it would work for any given filename. Can anyone spot an error?

def ziword(filename):
    filename=open("text.txt")
    count={}
    for aline in filename:
        words = aline.lower().split()
        for word in words:
            if word in count:
                    count[word]+=1
            else:
                    count[word]=1
def ziword(filename):
    count = {}
    for aline in filename:
        words = aline.lower().split()
        for word in words:
            if word in count:
                count[word] += 1
            else:
                count[word] = 1
        return count


#The point is that file is outside the function
filename = open("fox.txt")
print ziword(filename)

The point is that you give filename as argument to the function.
You also have forget to retun something out of the function.
Watch you indentation always use 4 space,and look how i use 1 space after an operator(=+-*).
Look at PEP-8 python style guide.

Gee thanks very much. N00b mistake. But when cant i define filename 1st? =O

You can also just change the literal name in function to filename, but better style generally is to take any iterable source of lines and change filename to iterable (or similar) to reflect changed assumptions. Then you can put call inside with statement opening the file and pass the filehandle (not filename) to function.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.