Hi there,

I am new to python.
Can somebody tell me how can I count a given word from a file.
I found lots of solution for counting all the words in a file, but not for some particular ones.

Thanks in advance

Recommended Answers

All 9 Replies

string = "hi"

f = open("filename.txt")
contents = f.read()
f.close()

print "Number of '" + string + "' in your file is:", contents.count("hi")

Replace "hi" with the word you want to count basically.

Feel free to ask anymore questions!

Thank you so much!!!

There is a problem with count() as shown below:

string = "hi"

# test text
text = "hi, I am a history buff with a hideous hidrosis history"

print "Number of '" + string + "' in your file is:", text.count("hi")

"""
my result -->
Number of 'hi' in your file is: 5
"""

Hi atsuko,

Here is my version of a word finder. It has some serious limitations (it cannot search for words such as "It's" or anything like that due to the punctuation), and it cannot search for multiple words at one time, but at least in my tests it could find the word I was looking for in the correct amount. Here is an example of it working...

paragraph = '''This is a test sentence.  We will look for hi in this sentence.
If we find 'hi', we want to keep count of it.  Remember, hi
can be Hi or hi.  Hi can also have characters before or
after it ie (hi. hi, hi:).  There should be a total of 10 'hi'
in this sentence, not any more for words like 'this' or
'hippo' or 'hiccups' '''

word = 'hi'

for x in range(33,64):
    char = chr(x)
    paragraph = paragraph.replace(char, '')

for x in range(91,97):
    char = chr(x)
    paragraph = paragraph.replace(char, '')

listParagraph = paragraph.split()
reducedList = [n for n in listParagraph if len(n) == len(word)]
reducedParagraph = ' '.join(reducedList)
reducedParagraph = reducedParagraph.lower()

count = reducedParagraph.count(word.lower())
if count == 0:
    print "The word", "'" + word + "'", "does not occur in this section of text."
else:
    print "The word", "'" + word + "'", "occurs", count, "times in this section."

I read an article online about tokenization, but for my current knowledge level I couldn't really do anything with it. From what I read though it is a more complicated but more exact way of finding words.

You could also replace the following:

reducedList = [n for n in listParagraph if len(n) == len(word)]
## replace with this
reduced_list = [n for n in list_paragraph if word == n.lower()]
if len(reduced_list):
    print "The word", "'" + word + "'", "occurs", len(reduced_list), "times in this section."
else:
    print "The word", "'" + word + "'", "does not occur in this section of text."
#
# anyway, here is another solution
#
paragraph = '''This is a test sentence.  We will look for hi in this sentence.
If we find 'hi', we want to keep count of it.  Remember, hi
can be Hi or hi.  Hi can also have characters before or
after it ie (hi. hi, hi:).  There should be a total of 10 'hi'
in this sentence, not any more for words like 'this' or
'hippo' or 'hiccups' '''

word='hi'
p_list = paragraph.split()
ctr=0
for p_word in p_list:
   p_word = p_word.lower()
   if p_word == word:
      ctr += 1
   elif not p_word.isalpha():     ## has non-alpha characters
      new_word = ""
      for chr in p_word:
         if chr.isalpha():
            new_word += chr
      if new_word == word:
         ctr += 1
print "The word '%s' was found %d times" % (word, ctr)

how can I develop a program in PASCAL that counts the number of words in a paragraph

how can I develop a program in PASCAL that counts the number of words in a paragraph

Pascal is a very old style language. I don't think it would come even close to Python's modern syntax and features.

You could try the Delphi/Python forum at DaniWeb.

The closest thing you could use to take advantage of modern language concepts is Python for Delphi:
http://mmm-experts.com/Products.aspx?ProductID=3

Some fun.

import re
print 'Word found was found %s times' % (len([re.findall(r'\bhi\b',open('my_file.txt').read())][0]))

Or a more readable version.

import re

Search_word = 'hi'
comp = r'\b%s\b' % Search_word
my_file = open('my_file.txt').read()
find_word = re.findall(comp, my_file)
print 'Word was found %s times' % len(find_word)

Maybe could adapt my earlier multimatcher to be more restrictive:

# multiple searches of a string for a substring
# using s.find(sub[ ,start[, end]])
import string

def multis(search,text,start=0):
    while start>-1:
        f=text.find(search,start)
        start=f
        if start>-1:
            if ((text[start-1] not in string.letters) and
                (text[start+len(search)] not in string.letters)):
                yield f
            start+=1

paragraph = '''This is a test sentence.  We will look for hi in this sentence.
If we find 'hi', we want to keep count of it.  Remember, hi
can be Hi or hi.  Hi can also have characters before or
after it ie (hi. hi, hi:).  There should be a total of 10 'hi'
in this sentence, not any more for words like 'this' or
'hippo' or 'hiccups' '''

word = 'hi'
print(paragraph)
print(word)

print("Searching %s:" % word)
for i in multis(word,paragraph):
    w,_,_ = paragraph[i:].partition(' ')
    print( "%s found at index %d: %s" % (word, i, w) )

Output:

This is a test sentence.  We will look for hi in this sentence.
If we find 'hi', we want to keep count of it.  Remember, hi
can be Hi or hi.  Hi can also have characters before or
after it ie (hi. hi, hi:).  There should be a total of 10 'hi'
in this sentence, not any more for words like 'this' or
'hippo' or 'hiccups' 
hi
Searching hi:
hi found at index 43: hi
hi found at index 76: hi',
hi found at index 121: hi
can
hi found at index 137: hi.
hi found at index 193: hi.
hi found at index 197: hi,
hi found at index 201: hi:).
hi found at index 239: hi'
in
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.