954,541 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Count given word

Hi there,

I am new to python.
Can somebody tell me how can I count a given word from a file.
I found lots of solution for counting all the words in a file, but not for some particular ones.

Thanks in advance

atsuko
Newbie Poster
4 posts since Jun 2008
Reputation Points: 10
Solved Threads: 0
 
string = "hi"

f = open("filename.txt")
contents = f.read()
f.close()

print "Number of '" + string + "' in your file is:", contents.count("hi")


Replace "hi" with the word you want to count basically.

Feel free to ask anymore questions!

Shadow14l
Light Poster
39 posts since May 2008
Reputation Points: 10
Solved Threads: 8
 

Thank you so much!!!

atsuko
Newbie Poster
4 posts since Jun 2008
Reputation Points: 10
Solved Threads: 0
 

There is a problem with count() as shown below:

string = "hi"

# test text
text = "hi, I am a history buff with a hideous hidrosis history"

print "Number of '" + string + "' in your file is:", text.count("hi")

"""
my result -->
Number of 'hi' in your file is: 5
"""
Ene Uran
Posting Virtuoso
1,723 posts since Aug 2005
Reputation Points: 625
Solved Threads: 213
 

Hi atsuko,

Here is my version of a word finder. It has some serious limitations (it cannot search for words such as "It's" or anything like that due to the punctuation), and it cannot search for multiple words at one time, but at least in my tests it could find the word I was looking for in the correct amount. Here is an example of it working...

paragraph = '''This is a test sentence.  We will look for hi in this sentence.
If we find 'hi', we want to keep count of it.  Remember, hi
can be Hi or hi.  Hi can also have characters before or
after it ie (hi. hi, hi:).  There should be a total of 10 'hi'
in this sentence, not any more for words like 'this' or
'hippo' or 'hiccups' '''

word = 'hi'

for x in range(33,64):
    char = chr(x)
    paragraph = paragraph.replace(char, '')

for x in range(91,97):
    char = chr(x)
    paragraph = paragraph.replace(char, '')

listParagraph = paragraph.split()
reducedList = [n for n in listParagraph if len(n) == len(word)]
reducedParagraph = ' '.join(reducedList)
reducedParagraph = reducedParagraph.lower()

count = reducedParagraph.count(word.lower())
if count == 0:
    print "The word", "'" + word + "'", "does not occur in this section of text."
else:
    print "The word", "'" + word + "'", "occurs", count, "times in this section."


I read an article online about tokenization, but for my current knowledge level I couldn't really do anything with it. From what I read though it is a more complicated but more exact way of finding words.

Dunganb
Newbie Poster
10 posts since Jun 2008
Reputation Points: 10
Solved Threads: 4
 

You could also replace the following:

reducedList = [n for n in listParagraph if len(n) == len(word)]
## replace with this
reduced_list = [n for n in list_paragraph if word == n.lower()]
if len(reduced_list):
    print "The word", "'" + word + "'", "occurs", len(reduced_list), "times in this section."
else:
    print "The word", "'" + word + "'", "does not occur in this section of text."
#
# anyway, here is another solution
#
paragraph = '''This is a test sentence.  We will look for hi in this sentence.
If we find 'hi', we want to keep count of it.  Remember, hi
can be Hi or hi.  Hi can also have characters before or
after it ie (hi. hi, hi:).  There should be a total of 10 'hi'
in this sentence, not any more for words like 'this' or
'hippo' or 'hiccups' '''

word='hi'
p_list = paragraph.split()
ctr=0
for p_word in p_list:
   p_word = p_word.lower()
   if p_word == word:
      ctr += 1
   elif not p_word.isalpha():     ## has non-alpha characters
      new_word = ""
      for chr in p_word:
         if chr.isalpha():
            new_word += chr
      if new_word == word:
         ctr += 1
print "The word '%s' was found %d times" % (word, ctr)
woooee
Nearly a Posting Maven
2,454 posts since Dec 2006
Reputation Points: 777
Solved Threads: 714
 

how can I develop a program in PASCAL that counts the number of words in a paragraph

pherro
Newbie Poster
1 post since May 2010
Reputation Points: 10
Solved Threads: 1
 
how can I develop a program in PASCAL that counts the number of words in a paragraph

Pascal is a very old style language. I don't think it would come even close to Python's modern syntax and features.

You could try the Delphi/Python forum at DaniWeb.

The closest thing you could use to take advantage of modern language concepts is Python for Delphi: http://mmm-experts.com/Products.aspx?ProductID=3

sneekula
Nearly a Posting Maven
2,427 posts since Oct 2006
Reputation Points: 961
Solved Threads: 212
 

Some fun.

import re
print 'Word found was found %s times' % (len([re.findall(r'\bhi\b',open('my_file.txt').read())][0]))

Or a more readable version.

import re

Search_word = 'hi'
comp = r'\b%s\b' % Search_word
my_file = open('my_file.txt').read()
find_word = re.findall(comp, my_file)
print 'Word was found %s times' % len(find_word)
snippsat
Practically a Posting Shark
808 posts since Aug 2008
Reputation Points: 353
Solved Threads: 294
 

Maybe could adapt my earlier multimatcher to be more restrictive:

# multiple searches of a string for a substring
# using s.find(sub[ ,start[, end]])
import string

def multis(search,text,start=0):
    while start>-1:
        f=text.find(search,start)
        start=f
        if start>-1:
            if ((text[start-1] not in string.letters) and
                (text[start+len(search)] not in string.letters)):
                yield f
            start+=1

paragraph = '''This is a test sentence.  We will look for hi in this sentence.
If we find 'hi', we want to keep count of it.  Remember, hi
can be Hi or hi.  Hi can also have characters before or
after it ie (hi. hi, hi:).  There should be a total of 10 'hi'
in this sentence, not any more for words like 'this' or
'hippo' or 'hiccups' '''

word = 'hi'
print(paragraph)
print(word)

print("Searching %s:" % word)
for i in multis(word,paragraph):
    w,_,_ = paragraph[i:].partition(' ')
    print( "%s found at index %d: %s" % (word, i, w) )


Output:

This is a test sentence.  We will look for hi in this sentence.
If we find 'hi', we want to keep count of it.  Remember, hi
can be Hi or hi.  Hi can also have characters before or
after it ie (hi. hi, hi:).  There should be a total of 10 'hi'
in this sentence, not any more for words like 'this' or
'hippo' or 'hiccups' 
hi
Searching hi:
hi found at index 43: hi
hi found at index 76: hi',
hi found at index 121: hi
can
hi found at index 137: hi.
hi found at index 193: hi.
hi found at index 197: hi,
hi found at index 201: hi:).
hi found at index 239: hi'
in
pyTony
pyMod
Moderator
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You