Story Statistics (Python)

vegaseat vegaseat is offline Offline Oct 6th, 2009, 5:44 pm |
1
This Python code allows you to get selected statistics of a story text. It will count lines, sentences, words, list words and characters by frequency, and give the average word length. It should be easy to add more statistics using the dictionaries created.
Last edited by vegaseat; Oct 6th, 2009 at 6:06 pm.
Quick reply to this message  
Python Syntax
  1. # get some statistics of a story text
  2. # count lines, sentences, words, frequent words ...
  3. # tested with Python 2.5.4 and Python 3.1.1
  4. # vegaseat 06oct2009
  5.  
  6. # test text (9 lines total, 2 blank lines, 8 sentences) ...
  7. text = """\
  8. Just a simple text we can use to count the sentences.
  9. Looks like fun! Why do sentences have to end so soon?
  10.  
  11. Every now and then let's put in a blank line, so we
  12. track those too. Perhaps something with a multitude of
  13. characters.
  14.  
  15. Ah, another blank line for the count. Time for lunch!
  16. That should do it for this longwinded test!"""
  17.  
  18. # write the test file
  19. fname = "MyText1.txt"
  20. fout = open(fname, "w")
  21. fout.write(text)
  22. fout.close()
  23.  
  24. # read the test file back in
  25. # or change the filename to a text you have
  26. textf = open(fname, "r")
  27.  
  28. # set all the counters to zero
  29. lines = 0
  30. blanklines = 0
  31. # start with empty word list and character frequency dictionary
  32. word_list = []
  33. cf_dict = {}
  34. # reads one line at a time
  35. for line in textf:
  36. # count lines and blanklines
  37. lines += 1
  38. if line.startswith('\n'):
  39. blanklines += 1
  40. # create a list of words
  41. # split at any whitespace regardless of length
  42. word_list.extend(line.split())
  43. # create a character:frequency dictionary
  44. # all letters adjusted to lower case
  45. for char in line.lower():
  46. cf_dict[char] = cf_dict.get(char, 0) + 1
  47.  
  48. textf.close()
  49.  
  50. # create a word frequency dictionary
  51. # all words in lower case
  52. word_dict = {}
  53. # a list of punctuation marks (could use string.punctuation)
  54. punctuations = [",", ".", "!", "?", ";", ":"]
  55. for word in word_list:
  56. # get last character of each word
  57. lastchar = word[-1]
  58. # remove any trailing punctuation marks from the word
  59. if lastchar in punctuations:
  60. word = word.rstrip(lastchar)
  61. # convert to all lower case letters
  62. word = word.lower()
  63. word_dict[word] = word_dict.get(word, 0) + 1
  64.  
  65. # assume that each sentence ends with '.' or '!' or '?'
  66. sentences = 0
  67. for key in cf_dict.keys():
  68. if key in '.!?':
  69. sentences += cf_dict[key]
  70.  
  71. number_words = len(word_list)
  72.  
  73. #print word_list # test
  74. #print cf_dict # test
  75. #print word_dict # test
  76.  
  77. # formatted prints will work with Python2 and Python3
  78. print( "Total lines: %d" % lines )
  79. print( "Blank lines: %d" % blanklines )
  80. print( "Sentences : %d" % sentences )
  81. print( "Words : %d" % number_words )
  82.  
  83. print('-' * 30)
  84. # optional things ...
  85. # average word length
  86. num = float(number_words)
  87. avg_wordsize = len(''.join([k*v for k, v in word_dict.items()]))/num
  88.  
  89. # most common words
  90. mcw = sorted([(v, k) for k, v in word_dict.items()], reverse=True)
  91.  
  92. # most common characters
  93. mcc = sorted([(v, k) for k, v in cf_dict.items()], reverse=True)
  94.  
  95. print( "Average word length : %0.2f" % avg_wordsize )
  96. print( "3 most common words : %s" % mcw[:3] )
  97. print( "3 most common characters: %s" % mcc[:3] )
  98.  
  99. """my result -->
  100. Total lines: 9
  101. Blank lines: 2
  102. Sentences : 8
  103. Words : 62
  104. ------------------------------
  105. Average word length : 4.08
  106. 3 most common words : [(3, 'for'), (3, 'a'), (2, 'we')]
  107. 3 most common characters: [(57, ' '), (31, 'e'), (30, 't')]
  108. """
-1
bipratikgoswami bipratikgoswami is offline Offline | Oct 19th, 2009
hi how to do the calculation of the fibonacci series in jst 2 lines in python. plz hlp!!
 
 

Tags
character, count, statistics, word

Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC