Wordcount of a text file (Python)

20 Years Ago vegaseat 2 2K Views

A simple program to count the words, lines and sentences contained in a text file. The assumptions are made that words are separated by whitespaces, and sentences end with a period, question mark or exclamation mark.

python

# count lines, sentences, and words of a text file

# set all the counters to zero
lines, blanklines, sentences, words = 0, 0, 0, 0

print '-' * 50

try:
  # use a text file you have, or google for this one ...
  filename = 'GettysburgAddress.txt'
  textf = open(filename, 'r')
except IOError:
  print 'Cannot open file %s for reading' % filename
  import sys
  sys.exit(0)

# reads one line at a time
for line in textf:
  print line,   # test
  lines += 1
  
  if line.startswith('\n'):
    blanklines += 1
  else:
    # assume that each sentence ends with . or ! or ?
    # so simply count these characters
    sentences += line.count('.') + line.count('!') + line.count('?')
    
    # create a list of words
    # use None to split at any whitespace regardless of length
    # so for instance double space counts as one space
    tempwords = line.split(None)
    print tempwords  # test
    
    # word total count
    words += len(tempwords)

    
textf.close()

print '-' * 50
print "Lines      : ", lines
print "Blank lines: ", blanklines
print "Sentences  : ", sentences
print "Words      : ", words

# optional console wait for keypress
from msvcrt import getch
getch()

vegaseat 1,735 DaniWeb's Hypocrite

19 Years Ago

This code is most likely more portable:
# optional console wait for keypress
raw_input('Press Enter...')

eclark53 0 Newbie Poster

16 Years Ago

I need a program to count the words in a sentence and illuminate the sentence that has 30 or more words. I need to be able to load an article into the program and then have the program highlight the sentence that has more words that the selected amount of words, i.e. 20 word, 30 words.

pelupelu 0 Newbie Poster

15 Years Ago

I am using this code to compute some lexical statistics in a text. However, it is not recognizing the end of the sentences (example . ? ! etc) and returns 1 sentence. I think that the command line.count is not working. The counting of the lines in the text is functional. Finally for the word counting, the program is only considering the last sentence and not the whole text. Can someone help me with this issue?

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

I modified the program using a this test text, and it works correctly ...

# count lines, sentences, and words of a text file

# set all the counters to zero
lines, blanklines, sentences, words = 0, 0, 0, 0

# test text ...
text = """\
Just a simple text.
We can count the sentences!
Why do sentences have to end?

Every now and then a blank line.
Perhaps it will snow!

Wow, another blank line for the count.
That should do it for the test!"""

# write the trs file
fname = "MyText1.txt"
fout = open(fname, "w")
fout.write(text)
fout.close()

# read the file back in
textf = open(fname, "r")

# reads one line at a time
for line in textf:
    #print line,   # test
    lines += 1

    if line.startswith('\n'):
        blanklines += 1
    else:
        # assume that each sentence ends with . or ! or ?
        # so simply count these characters
        sentences += line.count('.') + line.count('!') + line.count('?')

        # create a list of words
        # use None to split at any whitespace regardless of length
        # so for instance double space counts as one space
        tempwords = line.split(None)
        #print tempwords  # test

        # word total count
        words += len(tempwords)

textf.close()

print '-' * 50
print "Lines      : ", lines
print "Blank lines: ", blanklines
print "Sentences  : ", sentences
print "Words      : ", words

"""my result -->
Lines      :  9
Blank lines:  2
Sentences  :  7
Words      :  40
"""

Edited 15 Years Ago by vegaseat because: n/a

pelupelu 0 Newbie Poster

15 Years Ago

thanks. Will this program works for any type of encoding (ASCII, UTF-8 for example?). I found on the web the following instruction which is supposed to allow Python to work with UTF-8:
# -*- coding: utf-8 -*-
The problem is that when I debug the program it does not seem to recognize the instruction (probably because it starts with #) but if I erase it it does not work either. Do you know this command?

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

To define a source code encoding, a magic comment must
be placed into the source files either as first or second
line in the file:
# -*- coding: <encoding name> -*-

see: http://www.python.org/peps/pep-0263.html

pelupelu 0 Newbie Poster

15 Years Ago

thanks again for the info. Is there a way to differentiate letters from numbers? for example in the string "the wine is 7 years old". Do you need a function for that? I tried to use line.count but it did not work (I guess I need a generic term for numbers).

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

Try this code sample ...

s = "the wine is 7 years old"
for c in s:
    if c.isdigit():
        print( "%s is numeric" % c )

drumkill 0 Newbie Poster

15 Years Ago

import string

# Count number of lines, words, and characters

sentence = raw_input("Enter filename: ")
try:
    outfile = open(sentence, "r")

    word, lines, char = 0, 0, 0

    for ligne in outfile:
        
        split_word = string.split(ligne)
        print split_word # test
        
        lines += 1
        word += len(split_word)

        for i in split_word: # enter in a word in split
            for ch in i: # enter in a character in word
                print ch # test
                char += len(ch)
            
    print """
words = %d
lines = %d
characters = %d""" % (word, lines, char)

except IOError:
    print "file not found!"

#Roshan S. University Of Mauritius, dept. of Computer Science (student).

Edited 15 Years Ago by drumkill because: n/a

pythopian 10 Junior Poster in Training

15 Years Ago

If I may propose a semantically equivalent but much shorter alternative...

import re

def analyzeText(text):
    sentences = re.findall(r'\s*(.+?)[.!?]\s*', text)
    wordsets = map(str.split, sentences)
    wordcounts = map(len, wordsets)
    charcounts = [ sum(len(word) for word in words) for words in wordsets ]
    return zip(sentences, wordcounts, charcounts)

Test:

text = """\
Just a simple text.
We can count the sentences!
Why do sentences have to end?

Every now and then a blank line.
Perhaps it will snow!

Wow, another blank line for the count.
That should do it for the test!"""

for sentence, wordCount, charCount in analyzeText(text):
    print 'There are %d words and %d chars in "%s".' % (wordCount, charCount, sentence)

Output:

There are 4 words and 15 chars in "Just a simple text".
There are 5 words and 22 chars in "We can count the sentences".
There are 6 words and 23 chars in "Why do sentences have to end".
There are 7 words and 25 chars in "Every now and then a blank line".
There are 4 words and 17 chars in "Perhaps it will snow".
There are 7 words and 31 chars in "Wow, another blank line for the count".
There are 7 words and 24 chars in "That should do it for the test".

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

What a difference 4 1/2 years make! I am surprised that Python has made it that long.

halophyte 0 Newbie Poster

14 Years Ago

With this code:

from itertools import groupby
import doctest

print '-' * 50
 
#try:
  # use a text file you have, or google for this one ...
#  filename = 'text.txt' #'GettysburgAddress.txt'
#  text = open(filename, 'r')
#except IOError:
#  print 'Cannot open file %s for reading' % filename
#  import sys
#  sys.exit(0)

# test text ...
text = """\
Just a simple text.
We can count the sentences!
Why do sentences have to end?
 
Every now and then a blank line.
Perhaps it will snow!
 
Wow, another blank line for the count.
That should do it for the test!"""
 
# write the trs file
fname = "MyText1.txt"
fout = open(fname, "w")
fout.write(text)
fout.close()
 
# read the file back in
try:
    text = open(fname, "r")
except IOError:
  print 'Cannot open file %s for reading' % filename
  import sys
  sys.exit(0)

print text

def printWordFrequencies(text):
    #"""
    #>>> printWordFrequencies("Ob la di ob la da")
    #1 da
    #1 di
    #2 la
    #2 ob"""
    for w, g in groupby(sorted(text.lower().split())):
        print "%s %s" % (len(list(g)), w)

doctest.testmod(verbose=True)

I get this error:

Can someone explain? I'm on a Mac.

Edited 14 Years Ago by halophyte because: n/a

snippsat 661 Master Poster

14 Years Ago

<open file 'MyText1.txt', mode 'r' at 0x15a69b0>
Can someone explain? I'm on a Mac.

You dont make any action for the file object.
text = open(fname, "r").read()
Now it will read file into memory and you can print it out.

So if you call "printWordFrequencies" like this it will work.
printWordFrequencies(text)
doctest.testmod(verbose=True)

Dont ask question in Code Snippet,make a new post next time.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.