i am beginning to learn python so please be easy on me, i am trying to create a function that will accept user input and store it in a list, separating words and ignoring punctuation example:


input = i.am/lost

stores in list as ",[am][lost]"

this is what i got so far

def text():
	sentence = 0
	while line !="EOF":
		sentence= raw_input()
		line1 = []
		processed_line = ""
		for char in sentence:
			if char.isalpha():
				processed_line = processed_line+char
			else:
				processed_line = processed_line+" "
				
		line1.append(processed_line)	
		line1.split()
		
		print processed_line
		
		print line1
	
text()

my problem is that its adding it all under 1 item in the list instead of separating word by word

thanks in advance

Recommended Answers

All 10 Replies

check itertools.groupby, use it with string.isalpha as the grouping function.

i was trying to use the .split() but i am implementing wrong somehow

Your indentation is funky, probably using tabs instead of spaces. First, "print processed_line.split()" at the end of your code. Lists do not have a split() method.

You can use split() on the original string, but then you still have to iterate letter by letter and test for alpha as you do in your code. To get your code to run correctly, you should append "processed_line" when a non-alpha character is found (under the else) and then set "processed_line" to an empty string, ready to accept the next new word. Note that if you have some input like "I am a dog, I am not a cat.", you will get a break on the comma and on the space, leading to an empty "processed_line" being appended to the list. So, check processed_line for a positive length before appending. It would also be a good idea to print both "processed_line" and "line1" both before and after the append, while testing this code, so you know what is going on.

Also, the while() loop will never exit since "line" is never defined and therefore will never equal "EOF".
while line !="EOF":

I meant this:

>>> import itertools
>>> input = "i.am/lost"
>>> words = [''.join(word) for letters, word in itertools.groupby(input, lambda x: x.isalpha())]
>>> words
['i', '.', 'am', '/', 'lost']
>>> words = [''.join(word) for letters, word in itertools.groupby(input, lambda x: x.isalpha()) if letters]
>>> words
['i', 'am', 'lost']
>>>

Your indentation is funky, probably using tabs instead of spaces. First, "print processed_line.split()" at the end of your code. Lists do not have a split() method.

You can use split() on the original string, but then you still have to iterate letter by letter and test for alpha as you do in your code. To get your code to run correctly, you should append "processed_line" when a non-alpha character is found (under the else) and then set "processed_line" to an empty string, ready to accept the next new word. Note that if you have some input like "I am a dog, I am not a cat.", you will get a break on the comma and on the space, leading to an empty "processed_line" being appended to the list. So, check processed_line for a positive length before appending. It would also be a good idea to print both "processed_line" and "line1" both before and after the append, while testing this code, so you know what is going on.

Also, the while() loop will never exit since "line" is never defined and therefore will never equal "EOF".
while line !="EOF":

i did some changes but there is a flaw with putting the append in the else statement, for example if i input this

i.going.to.work
it will only list and skip work because there is no non alpha character after work

and thanks for looking at my thread


I meant this:

>>> import itertools
>>> input = "i.am/lost"
>>> words = [''.join(word) for letters, word in itertools.groupby(input, lambda x: x.isalpha())]
>>> words
['i', '.', 'am', '/', 'lost']
>>> words = [''.join(word) for letters, word in itertools.groupby(input, lambda x: x.isalpha()) if letters]
>>> words
['i', 'am', 'lost']
>>>

i really didnt want to get into that yet since i am still learning and dont want to get ahead of myself but thanks for showing that way to me

maybe I'm misunderstanding, but wouldn't something simple like

def make_norm_lis(entry):
	lis=[]
	for letter in entry:
		if letter.isalpha():
			lis.append(letter)
		else:
			lis.append(' ')
	lis=''.join(lis)
	lis=lis.split()
	return lis

work?

>>> st='!@#this@#$is//sparta'
>>> print(make_norm_lis(st))
['this', 'is', 'sparta']
>>> 
>>> work='i.going.to.work'
>>> 
>>> print(make_norm_lis(work))
['i', 'going', 'to', 'work']

it will only list and skip work because there is no non alpha character after work

Correct, so you have to test after the loop, for "processed_line" has length (something in it since the input could end with a period), and if so, append it to the list. Post your new code.

i got this

def text():
	sentence = 0
	while sentence !="EOF":
		sentence = raw_input()
		line1 =[]
		processed_line = ""
		for char in line:
			if char.isalpha() or char.isalnum():
				processed_line = processed_line+char

			else:
				line1.append(processed_line)
				processed_line = " "
		print processed_line

		print line1
	
text()

See the comments.

def text():
	sentence = 0
	while sentence !="EOF":
		sentence = raw_input()
		line1 =[]
		processed_line = ""
		for char in sentence:  ## use the input string
                        ## isalnum() includes isalpha()
			#if char.isalpha() or char.isalnum():
			if char.isalnum():
				processed_line = processed_line+char

			else:
                                if len(processed_line): ## test for empty string
        				line1.append(processed_line)
				processed_line = ""  ## empty, not a space
                if len(processed_line):  ## append the final word
                    line1.append(processed_line)
		print processed_line
                print sentence.split()  ## second way to do it
		print line1
	
text()

My version of yours:

def text():
    # docstring, which comes to tooltip of function
    """ This function processes out non-alphanumerics from user input
        function exits, when user inputs empty inputline == Enter

    """
    # common loop until break loop
    while True:
        # more descriptive name
        words =[]
        # another variable name change, it really does matter, doesn't it
        this_word = ""
        #  prompt it and add punctuation to get out of special casing last word
        for char in raw_input('Your words (Enter to finish): ')+'.':
            if char.isalnum(): # one enough, .isalpha not needed
                this_word += char
            else:
                if this_word:
                    words.append(this_word)
                    this_word = "" # clear only once
        if words:
            # as string
            print ' '.join(words)
        else:
            break
    
text()
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.