Plan for parsing HTML

Reply

Join Date: Oct 2007
Posts: 39
Reputation: Tom Tolleson is an unknown quantity at this point 
Solved Threads: 0
Tom Tolleson's Avatar
Tom Tolleson Tom Tolleson is offline Offline
Light Poster

Plan for parsing HTML

 
0
  #1
Dec 3rd, 2008
Hi! I just inherited a rather large legacy site here at work that has no database behind it. It's a large volume of HTML pages with the content written right into the HTML page. I need to extract the content and bring it into a database, or XML files.

Each section of the HTML pages has header tag and a standard title, so I'm thinking I should write a perl script to parse the pages based on header tags and insert them into MYSQL.

Before I begin, I thought I'd check with you guys to see if you have had any similar experience and recommendations.

Thanks!

Tom Tolleson
Last edited by Tom Tolleson; Dec 3rd, 2008 at 12:02 pm. Reason: typo
Reply With Quote Quick reply to this message  
Join Date: Jul 2006
Posts: 882
Reputation: pritaeas will become famous soon enough pritaeas will become famous soon enough 
Solved Threads: 142
Sponsor
pritaeas's Avatar
pritaeas pritaeas is offline Offline
Practically a Posting Shark

Re: Plan for parsing HTML

 
0
  #2
Dec 4th, 2008
I've done the same, only with PHP. Using a regex I stripped out the actual content and put it into the DB. You may need to escape the content, but that depends on your insertion method and column type.
"If it is NOT source, it is NOT software."
-- NASA
Reply With Quote Quick reply to this message  
Join Date: Jan 2007
Posts: 3,210
Reputation: MidiMagic has a spectacular aura about MidiMagic has a spectacular aura about 
Solved Threads: 164
MidiMagic's Avatar
MidiMagic MidiMagic is offline Offline
Nearly a Senior Poster

Re: Plan for parsing HTML

 
0
  #3
Dec 8th, 2008
I did it with notepad.

I used find and replace to replace each tag with either nothing or the separators needed to import the data into the database.
Daylight-saving time uses more gasoline
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:



Similar Threads
Other Threads in the HTML and CSS Forum
Thread Tools Search this Thread



Tag cloud for HTML and CSS
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC