We're a community of 1077K IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,076,235 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

stop words in python

I am trying to make a python that can remove any occurences of any word in stopwords from the wordlist, but I don't know what is exactly wrong with this program. Any suggestions?

STOPWORDS = ['a','able','about','across','after','all','almost','also','am','among',
             'an','and','any','are','as','at','be','because','been','but','by','can',
             'cannot','could','dear','did','do','does','either','else','ever','every',
             'for','from','get','got','had','has','have','he','her','hers','him','his',
             'how','however','i','if','in','into','is','it','its','just','least','let',
             'like','likely','may','me','might','most','must','my','neither','no','nor',
           'not','of','off','often','on','only','or','other','our','own','rather','said',
             'say','says','she','should','since','so','some','than','that','the','their',
             'them','then','there','these','they','this','tis','to','too','twas','us',
             'wants','was','we','were','what','when','where','which','while','who',
             'whom','why','will','with','would','yet','you','your']

def remove_stop_words(wordlist, stopwords=STOPWORDS):
    wordlist = raw_input("type a sentence: ")
    marked = []
    for t in wordlist:
        if t.lower() in stopwords:
            marked.append('*')
        else:
            marked.append(t)

remove_stop_words('')
3
Contributors
2
Replies
6 Hours
Discussion Span
1 Year Ago
Last Updated
3
Views
boiishuvo
Junior Poster in Training
76 posts since Jun 2009
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

You do not split user input to words, you should be building the new sentence without stop words, marked does not make sense.

pyTony
pyMod
Moderator
6,310 posts since Apr 2010
Reputation Points: 879
Solved Threads: 986
Skill Endorsements: 26

You might want to do it this way:

STOPWORDS = ['a','able','about','across','after','all','almost','also','am','among',
             'an','and','any','are','as','at','be','because','been','but','by','can',
             'cannot','could','dear','did','do','does','either','else','ever','every',
             'for','from','get','got','had','has','have','he','her','hers','him','his',
             'how','however','i','if','in','into','is','it','its','just','least','let',
             'like','likely','may','me','might','most','must','my','neither','no','nor',
           'not','of','off','often','on','only','or','other','our','own','rather','said',
             'say','says','she','should','since','so','some','than','that','the','their',
             'them','then','there','these','they','this','tis','to','too','twas','us',
             'wants','was','we','were','what','when','where','which','while','who',
             'whom','why','will','with','would','yet','you','your']

def remove_stop_words(wordlist, stopwords=STOPWORDS):
    # ask for sentence if wordlist is empty
    if not wordlist:
        sentence = raw_input("type a sentence: ")
        wordlist = sentence.split()
    marked = []
    for t in wordlist:
        if t.lower() in stopwords:
            marked.append('*')
        else:
            marked.append(t)
    return marked

# test empty list
wordlist = []
marked_list = remove_stop_words(wordlist)
print(marked_list)

# test given list
wordlist = "should you beg or steal".split()
marked_list = remove_stop_words(wordlist)
print(marked_list)

''' example ...
type a sentence: hello there
['hello', '*']
['*', '*', 'beg', '*', 'steal']
'''
HiHe
Posting Whiz
332 posts since Oct 2008
Reputation Points: 177
Solved Threads: 34
Skill Endorsements: 4

This article has been dead for over three months: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
View similar articles that have also been tagged:
 
© 2013 DaniWeb® LLC
Page rendered in 0.0583 seconds using 2.69MB