Plan for parsing HTML

Question

Tom Tolleson 0 Light Poster

16 Years Ago

Hi! I just inherited a rather large legacy site here at work that has no database behind it. It's a large volume of HTML pages with the content written right into the HTML page. I need to extract the content and bring it into a database, or XML files.

Each section of the HTML pages has header tag and a standard title, so I'm thinking I should write a perl script to parse the pages based on header tags and insert them into MYSQL.

Before I begin, I thought I'd check with you guys to see if you have had any similar experience and recommendations.

Thanks!

Tom Tolleson

html-css perl web-design xml

3 Contributors
2 Replies
83 Views
4 Days Discussion Span
Latest Post 16 Years Ago Latest Post by MidiMagic

All 2 Replies

pritaeas 2,211 ¯\_(ツ)_/¯

16 Years Ago

I've done the same, only with PHP. Using a regex I stripped out the actual content and put it into the DB. You may need to escape the content, but that depends on your insertion method and column type.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

MidiMagic 579 Nearly a Senior Poster · Answer 1 · 2008-12-08T12:32:08+00:00

I did it with notepad.

I used find and replace to replace each tag with either nothing or the separators needed to import the data into the database.

Plan for parsing HTML

Recommended Answers Collapse Answers

All 2 Replies

Recommended Answers