Hi everybody,
I have a file with some tags like html.
<instance id="bass.1000000" docsrc = "BNC/A0C">
<answer instance="bass.1000000" senseid="bass%fish"/>
Try it with grilled sea <head>bass</head> and fennel.

I need to parse it and retrieve some information like id and senseid in answer tag.
How can I do this?
I add <html> &<body> and write this code. But it does not work.

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
             HtmlNodeCollection collection = doc.DocumentNode.SelectNodes("//instance");
             foreach (HtmlNode n in collection)

Recommended Answers

All 4 Replies

You can use some simple regex to achieve what you want but from past experience this can get quite verbose and inefficient i would suggest trying Click Here

Parsing HTML content without earlier interpretation of the HTML into an information structure is certain to be touchy to pages made in diverse ways. At the same time, HTML found in the wild is frequently invalid. More terrible, it is frequently indeed XHTML, while records that claim to be XHTML regularly contain HTML peculiarities, for example, missing close-labels, utilization of uppercase, and so forth.

Well I feel my suggestion might be a little slanted cause I love to do it, but Regex.

I actually awhile back wrote a custom library that would parse HTML, and relied heavily on Regex to perform the operations (the thing was a little weird, I need to clean it up a little). With regex you could easily parse out whole nodes or attributes. I guess the next question though is, how familar are you with Regex?

If not, there is that HTML Agility Pack mentioned by AleMonterio. While I have never used it myself, when I was looking to parse HTML (that resulted in me building my own library), that popped up A LOT

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, learning, and sharing knowledge.