text files.

Question

revenge2 0 Junior Poster in Training

15 Years Ago

how do you manipulate text files..

import urllib2
dload = urllib2.Request('http://tvnz.co.nz/content/tv2_listings_data/portable_listings_skin')
text_file = urllib2.urlopen(dload)
text_file_read = text_file.read()

print text_file_read

f = open ('site.txt', 'w'
f.write(text_file_read) 
f.close()

This returns the page and saves it as text file, but how would i do the following things..
-delete lines of texts, for example say delete from lines 1-10.
-Return/delete everything between '<body>' and '</body>' etc.
-search and delete all '<b>' and </b> etc?

I've read about 'parsers' but not quite sure how to use them but either way wouldn't this way be easier?

-Thanks

python

4 Contributors
6 Replies
147 Views
2 Days Discussion Span
Latest Post 15 Years Ago Latest Post by Ene Uran

All 6 Replies

jice 53 Posting Whiz in Training

15 Years Ago

For line count, it is very easy (this is to be put after your own code) :

f = open ('out.txt', 'w')
for i, l in enumerate(file('site.txt', 'r')):
    if i > 10:
        f.write(l)

To delete part of the text, you can use regexp (this example erase the body part) :

import re
pat=re.compile("<body.*</body>", re.DOTALL) # re.DOTALL is to include \n
f = open ('out2.txt', 'w')
f.write(pat.sub("", text_file_read))
f.close()

You can also get the content of the part this way (have a look at the re module)

You can also parse your file but if you don't have any condition on the content to erase a part, it may be simpler to do this this way :)

To parse html files, you can use the HTMLParser module
For xml or xhtml files, you can use xml.sax or xml.dom modules

Stefano Mtangoo 455 Senior Poster

15 Years Ago

http://www.regular-expressions.info/

http://www.google.co.tz/search?q=re+python+module&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

revenge2 0 Junior Poster in Training · Answer 1 · 2009-01-30T18:43:05+00:00

thanks jice, is there any site which shows this in depth?.

jice 53 Posting Whiz in Training · Answer 2 · 2009-01-30T22:04:18+00:00

also, for xml :
http://www.devshed.com/c/a/Python/Working-with-XML-Documents-and-Python/
http://www.devshed.com/c/a/Python/Parsing-XML-with-SAX-and-Python/

http://www.google.fr/search?hl=fr&q=python+xml+tutorial&btnG=Rechercher&meta=

and for string manipulation
http://www.devshed.com/c/a/Python/String-Manipulation/

revenge2 0 Junior Poster in Training · Answer 3 · 2009-02-01T09:16:25+00:00

also, for xml :
http://www.devshed.com/c/a/Python/Working-with-XML-Documents-and-Python/
http://www.devshed.com/c/a/Python/Parsing-XML-with-SAX-and-Python/
http://www.google.fr/search?hl=fr&q=python+xml+tutorial&btnG=Rechercher&meta=
and for string manipulation
http://www.devshed.com/c/a/Python/String-Manipulation/

cheers:) , the xml parsing tutorial was very helpful.
Is there a way to find keywords and delete them?.
for eg. it looks for '<title>' and deletes all '<title>'s in the text?

Ene Uran 638 Posting Virtuoso · Answer 4 · 2009-02-01T21:31:08+00:00

You can use simple string functions:

html = """\
<html>
<head>
   <title>Ordered List Example</title>
</head>
<body bgcolor="#FFFF00" text="#FF0000">
<OL STYLE = "font-family: comic sans ms; font-size:10pt">
   <LI>Religious tolerance</LI>
   <LI>Exact estimate</LI>
   <LI>Military Intelligence</LI>
   <LI>Passive aggression</LI>
   <LI>Tight slacks</LI>
   <LI>Business ethics</LI>
   <LI>Advanced BASIC</LI>
   <LI>Extinct Life</LI>
   <LI>Pretty ugly</LI>
   <LI>Genuine imitation</LI>
</OL>
</body>
</html>
"""

# erase just "<title>" 
print(html.replace("<title>", ""))

print('-'*50)

# erase "<title>" and "</title>"
print(html.replace("<title>", "").replace("</title>", ""))

text files.

Recommended Answers Collapse Answers

All 6 Replies

Recommended Answers