A simple program to count the words, lines and sentences contained in a text file. The assumptions are made that words are separated by whitespaces, and sentences end with a period, question mark or exclamation mark.

# count lines, sentences, and words of a text file

# set all the counters to zero
lines, blanklines, sentences, words = 0, 0, 0, 0

print '-' * 50

try:
  # use a text file you have, or google for this one ...
  filename = 'GettysburgAddress.txt'
  textf = open(filename, 'r')
except IOError:
  print 'Cannot open file %s for reading' % filename
  import sys
  sys.exit(0)

# reads one line at a time
for line in textf:
  print line,   # test
  lines += 1
  
  if line.startswith('\n'):
    blanklines += 1
  else:
    # assume that each sentence ends with . or ! or ?
    # so simply count these characters
    sentences += line.count('.') + line.count('!') + line.count('?')
    
    # create a list of words
    # use None to split at any whitespace regardless of length
    # so for instance double space counts as one space
    tempwords = line.split(None)
    print tempwords  # test
    
    # word total count
    words += len(tempwords)

    
textf.close()

print '-' * 50
print "Lines      : ", lines
print "Blank lines: ", blanklines
print "Sentences  : ", sentences
print "Words      : ", words

# optional console wait for keypress
from msvcrt import getch
getch()

This code is most likely more portable:
# optional console wait for keypress
raw_input('Press Enter...')

I need a program to count the words in a sentence and illuminate the sentence that has 30 or more words. I need to be able to load an article into the program and then have the program highlight the sentence that has more words that the selected amount of words, i.e. 20 word, 30 words.

I am using this code to compute some lexical statistics in a text. However, it is not recognizing the end of the sentences (example . ? ! etc) and returns 1 sentence. I think that the command line.count is not working. The counting of the lines in the text is functional. Finally for the word counting, the program is only considering the last sentence and not the whole text. Can someone help me with this issue?

I modified the program using a this test text, and it works correctly ...

# count lines, sentences, and words of a text file

# set all the counters to zero
lines, blanklines, sentences, words = 0, 0, 0, 0

# test text ...
text = """\
Just a simple text.
We can count the sentences!
Why do sentences have to end?

Every now and then a blank line.
Perhaps it will snow!

Wow, another blank line for the count.
That should do it for the test!"""

# write the trs file
fname = "MyText1.txt"
fout = open(fname, "w")
fout.write(text)
fout.close()

# read the file back in
textf = open(fname, "r")

# reads one line at a time
for line in textf:
    #print line,   # test
    lines += 1

    if line.startswith('\n'):
        blanklines += 1
    else:
        # assume that each sentence ends with . or ! or ?
        # so simply count these characters
        sentences += line.count('.') + line.count('!') + line.count('?')

        # create a list of words
        # use None to split at any whitespace regardless of length
        # so for instance double space counts as one space
        tempwords = line.split(None)
        #print tempwords  # test

        # word total count
        words += len(tempwords)

textf.close()

print '-' * 50
print "Lines      : ", lines
print "Blank lines: ", blanklines
print "Sentences  : ", sentences
print "Words      : ", words

"""my result -->
Lines      :  9
Blank lines:  2
Sentences  :  7
Words      :  40
"""

Edited 7 Years Ago by vegaseat: n/a

thanks. Will this program works for any type of encoding (ASCII, UTF-8 for example?). I found on the web the following instruction which is supposed to allow Python to work with UTF-8:
# -*- coding: utf-8 -*-
The problem is that when I debug the program it does not seem to recognize the instruction (probably because it starts with #) but if I erase it it does not work either. Do you know this command?

thanks again for the info. Is there a way to differentiate letters from numbers? for example in the string "the wine is 7 years old". Do you need a function for that? I tried to use line.count but it did not work (I guess I need a generic term for numbers).

Try this code sample ...

s = "the wine is 7 years old"
for c in s:
    if c.isdigit():
        print( "%s is numeric" % c )
import string

# Count number of lines, words, and characters

sentence = raw_input("Enter filename: ")
try:
    outfile = open(sentence, "r")

    word, lines, char = 0, 0, 0

    for ligne in outfile:
        
        split_word = string.split(ligne)
        print split_word # test
        
        lines += 1
        word += len(split_word)

        for i in split_word: # enter in a word in split
            for ch in i: # enter in a character in word
                print ch # test
                char += len(ch)
            
    print """
words = %d
lines = %d
characters = %d""" % (word, lines, char)

except IOError:
    print "file not found!"

#Roshan S. University Of Mauritius, dept. of Computer Science (student).

Edited 7 Years Ago by drumkill: n/a

If I may propose a semantically equivalent but much shorter alternative...

import re

def analyzeText(text):
    sentences = re.findall(r'\s*(.+?)[.!?]\s*', text)
    wordsets = map(str.split, sentences)
    wordcounts = map(len, wordsets)
    charcounts = [ sum(len(word) for word in words) for words in wordsets ]
    return zip(sentences, wordcounts, charcounts)

Test:

text = """\
Just a simple text.
We can count the sentences!
Why do sentences have to end?

Every now and then a blank line.
Perhaps it will snow!

Wow, another blank line for the count.
That should do it for the test!"""

for sentence, wordCount, charCount in analyzeText(text):
    print 'There are %d words and %d chars in "%s".' % (wordCount, charCount, sentence)

Output:

There are 4 words and 15 chars in "Just a simple text".
There are 5 words and 22 chars in "We can count the sentences".
There are 6 words and 23 chars in "Why do sentences have to end".
There are 7 words and 25 chars in "Every now and then a blank line".
There are 4 words and 17 chars in "Perhaps it will snow".
There are 7 words and 31 chars in "Wow, another blank line for the count".
There are 7 words and 24 chars in "That should do it for the test".

What a difference 4 1/2 years make! I am surprised that Python has made it that long.

With this code:

from itertools import groupby
import doctest

print '-' * 50
 
#try:
  # use a text file you have, or google for this one ...
#  filename = 'text.txt' #'GettysburgAddress.txt'
#  text = open(filename, 'r')
#except IOError:
#  print 'Cannot open file %s for reading' % filename
#  import sys
#  sys.exit(0)

# test text ...
text = """\
Just a simple text.
We can count the sentences!
Why do sentences have to end?
 
Every now and then a blank line.
Perhaps it will snow!
 
Wow, another blank line for the count.
That should do it for the test!"""
 
# write the trs file
fname = "MyText1.txt"
fout = open(fname, "w")
fout.write(text)
fout.close()
 
# read the file back in
try:
    text = open(fname, "r")
except IOError:
  print 'Cannot open file %s for reading' % filename
  import sys
  sys.exit(0)

print text

def printWordFrequencies(text):
    #"""
    #>>> printWordFrequencies("Ob la di ob la da")
    #1 da
    #1 di
    #2 la
    #2 ob"""
    for w, g in groupby(sorted(text.lower().split())):
        print "%s %s" % (len(list(g)), w)

doctest.testmod(verbose=True)

I get this error:

<open file 'MyText1.txt', mode 'r' at 0x15a69b0>

Can someone explain? I'm on a Mac.

Edited 6 Years Ago by halophyte: n/a

<open file 'MyText1.txt', mode 'r' at 0x15a69b0>
Can someone explain? I'm on a Mac.

You dont make any action for the file object.
text = open(fname, "r").read()
Now it will read file into memory and you can print it out.

So if you call "printWordFrequencies" like this it will work.
printWordFrequencies(text)
doctest.testmod(verbose=True)

Dont ask question in Code Snippet,make a new post next time.

The article starter has earned a lot of community kudos, and such articles offer a bounty for quality replies.