Hi ,

Have downloaded a feed from a news site and trying to show only the first paragraph for each story. At present, the output to email looks like this:

"http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10568455&ref=rss" Man arrested after pointing fake gun at police (A 59-year-old man has been arrested after pointing an imitation gun at police from his car in Paekakariki, north of Wellington, this morning.

When police approached his vehicle on Paekakariki Hill Rd after receiving a call over...)

Looking in the shell, the output is:
"http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10568455&ref=rss" Man arrested after pointing fake gun at police (A 59-year-old man has been arrested after pointing an imitation gun at police from his car in Paekakariki, north of Wellington, this morning.\r\n\r\nWhen police approached his vehicle on Paekakariki Hill Rd after receiving a call over...)\r\n

So it appears the issue is the \r\n\r\n. Have tried removing with re.sub but it doesn't want to go; got a little excited when I read about rstrip() but I can not get that working (at a guess because I'm trying to run it on a txt file rather than a string).
Read on one site that using readlines() would make a txt.file a list, which would allow the use of rstrip(), but that doesn't seem to work for me either.
Should I persevere with re.sub or rstrip(), or am I on the wrong track entirely?

Blair

PS: Apologies for length: noticed the note that help would not be given to those who could not demonstrate they had at least tried to solve their problem themselves. And no, it's not homework... left that behind about 15 years ago.

Read on one site that using readlines() would make a txt.file a list, which would allow the use of rstrip(), but that doesn't seem to work for me either.

If you have a file object you can do readlines() to get a list in which item 0 is the first paragraph. If all you have is a string, you can get the same effect with split("/r").

Thanks Targ, will persevere with readlines() and hopefully make a breakthrough in the next few days.

Cheers,

Blair

Yeah I think split() would probably be your best bet. Either that or replace :

>>> some_text = '"http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10568455&ref=rss" Man arrested after pointing fake gun at police (A 59-year-old man has been arrested after pointing an imitation gun at police from his car in Paekakariki, north of Wellington, this morning.\r\n\r\nWhen police approached his vehicle on Paekakariki Hill Rd after receiving a call over...)\r\n'
>>> for each_line in some_text.split('\r\n'):
...     if each_line: print each_line
...     
"http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10568455&ref=rss" Man arrested after pointing fake gun at police (A 59-year-old man has been arrested after pointing an imitation gun at police from his car in Paekakariki, north of Wellington, this morning.
When police approached his vehicle on Paekakariki Hill Rd after receiving a call over...)
>>>
>>>
>>> some_text.replace('\r\n', ' ')
'"http://www.nzherald.co.nz/nz/news/article.cfm?c_id=1&objectid=10568455&ref=rss" Man arrested after pointing fake gun at police (A 59-year-old man has been arrested after pointing an imitation gun at police from his car in Paekakariki, north of Wellington, this morning.  When police approached his vehicle on Paekakariki Hill Rd after receiving a call over...) '

Hope that helps

EDIT: Note that I used if each_line: in the for loop method because when Python split() the double \r\n there was a 'nothing' between them. So that will show up in your list still...

I thought he wants to just display the first paragraph, not replace the line breaks with spaces. So the options are:

paragraphs = news_file.readlines()
first_paragraph = paragraphs[0]

or:

paragraphs = news_text.split('/r')
first_paragraph = paragraphs[0]

or:

end_of_paragraph = news_text.find('/r')
first_paragraph = news_text[:end_of_paragraph]
This article has been dead for over six months. Start a new discussion instead.