Counting capitalized words from a text

Question

Finki 0 Newbie Poster

14 Years Ago

I just want to know if this is done right. And some suggestions on the commented line.

msg = ...

import re
msg = re.sub("[^\w ]", " ", msg)

names = []

for x in msg.split():
    if (x[0].isupper()) and not (x[1].isupper()):# I think this should be done in a different way, if the word is PS or something like that it must not count
        names.append(x)

from collections import defaultdict

wordsCount = defaultdict(int)
for word in imena:
  wordsCount[word] += 1

for word, num in wordsCount.items():
	print word, num

python

3 Contributors
2 Replies
2K Views
7 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by TrustyTony

All 2 Replies

griswolf 304 Veteran Poster

14 Years Ago

you can precompile the regular expression: nonWordRE = re.compile(r'[^\w ]') (I don't get why you have the space after the \w; and I think you need a trailing '+' so it finds multiple non-word characters...) (the ellipsis is a tiny self-referential joke)
You can use the regular expression directly to split the sentence: http://docs.python.org/library/re.html#re.RegexObject.split without doing the first substitution at line 4.
You can directly add the capitalized words into wordsCount, no need for the intermediate names list (what is imena in your line 15?)
Instead of using split() you can use a regular expression that notices only words that are capitalized, and findall() of them in the sentence (docs just after split() mentioned above). This solves your issue at line 9, too. Note the r'\b' regular expression special character.
I would prefer line 18 to be for word, num in sorted(wordsCount.items()):

Edited 14 Years Ago by griswolf because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 1 · 2010-11-06T03:18:26+00:00

Here one non-re way also:

from itertools import groupby
import string
text = """Just a simple text.
We can count the words!
Why do words have to end?
 
Every now and then a blank line.
Perhaps it will snow!
 
Wow, another blank line for the count.
That should do it for the test!"""

# generator to get Title cased words
test = (word.strip(string.punctuation) for word in text.split() if word.istitle())

for word, thewords in groupby(sorted(test)):
        print "%s %s" % (len(list(thewords)), word)

Counting capitalized words from a text

Recommended Answers Collapse Answers

All 2 Replies

Recommended Answers