I am trying to read from a txt file and counts the number of times each word appears. The problem is that it counts the EOL characters as well. I tried to use the rstrip, still it didn't do anything. So how can I handle these end-of-line characters?
Please help.

Object= open('w.txt','r')
L= Object.read().rstrip()
occurrenences={}

for word in L.split():
    occurrenences[word] = occurrenences.get(word,0)+1
    
for word in occurrenences:
    print(occurrenences[word],word)

Object.close()

Recommended Answers

All 3 Replies

On my machine (Windows), when providing a file that contains "a b c d e", I have this output:

(1, 'a')
(1, 'c')
(1, 'b')
(1, 'e')
(1, 'd')

There is no EOL character in the output, which is expected. So, the code haven't took the EOL character on my machine.

Are you running your script on another OS?

On my machine (Windows), when providing a file that contains "a b c d e", I have this output:

(1, 'a')
(1, 'c')
(1, 'b')
(1, 'e')
(1, 'd')

There is no EOL character in the output, which is expected. So, the code haven't took the EOL character on my machine.

Are you running your script on another OS?

I am running on Windows 7 OS. Lets say that I had the following line in my txt file:
My \r name \r is Nana \n

I want my program to skip \n and \r. I don't want them to be counted.
I tried everything, still it didn't work.

PLEASE HELP :(

Your code should just work because split() should handle all kinds of white space including various newline characters. However, you can try

import string
# ...
for word in L.split(string.whitespace):
    if word:
        occurrenences[word] = occurrenences.get(word,0)+1
#...

The extra test is because when you specify the split characters, you get empty strings if there are adjacent split characters as you have in your example.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.