hey people i wud really really appreciate a hand here
ohk dis is my task
i have been given a bunch of urls to webpages containing blog posts
i need to extract the authorname,date,content of blogpost and meta data and put it into a database.
Here is wat i thought i would do
use a module called gadfly to act as a bridge between python and sql
using urlopen i got the html source code of each webpage ..now i am planning to read this code line by line ,use the html tags to recognise the author name,date,data and metadata..after identifying each part i need to detag it and den store it into the database
cud anyone suggest a way to detag the html code line by line (shud work wid Linux and Python)?/
am really stuck!any help wud be really welcome!