Regular expression to extract content between tags from an html output

Question

screenedcreamy 0 Newbie Poster

9 Years Ago

Hello,
I need to extract a particular value from this html snippet. As i would not like to use any external libraries the only way to achieve this using core java is using regular expressions. As i have never used regular expressions it would be great if you could suggest how the integer value could be retrieved from the below input.

<tr><td>GLOBALID=123245</td></tr>

I need to extract the integer value assigned to to GLOBALID.

html-css java

3 Contributors
5 Replies
236 Views
4 Hours Discussion Span
Latest Post 9 Years Ago Latest Post by JamesCherrill

All 5 Replies

JamesCherrill 4,733 Most Valuable Poster

9 Years Ago

I'm no REGEX expert, but standard Java does include classes for parsing XML (and therefore HTML) without any "external libraries". That's probabaly the safest way to ensure your parsing isn't going to fail on some obscure but legal example of real data.
http://docs.oracle.com/javase/tutorial/jaxp/index.html

Gribouillis 1,391 Programming Explorer

9 Years Ago

I'm not a Java programmer but any pythonista would answer to use the beautifulsoup library. From what I read here and there, the java equivalent is named JSoup. I think you could try this library.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 1 · 2016-05-12T09:23:19+00:00

JSoup looks like an excellent solution, except for OP's "i would not like to use any external libraries". Personally I find nothing wrong with external libraries, provided they are open source and I can bundle their classes into my own distribution jar.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 2 · 2016-05-12T09:52:20+00:00

Gribouillis 1,391 Programming Explorer

9 Years Ago

Hm, I missed that part of the post.

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 3 · 2016-05-12T10:32:36+00:00

On the other hand you clould just hack it...

If there's just one <tr><td>GLOBALID= prefix in the text, and the </td>suffix is on the same line then you can simply use String's indexOf to find the prefix's position, then indexOf again to find the first suffix after that position, which will give you the two indexes that you need to substring the actual value.
Depending on the file you may first have to deal with distracting white space anywhere in that.

Regular expression to extract content between tags from an html output

Recommended Answers Collapse Answers

All 5 Replies

Recommended Answers