This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don't HAVE to understand XPATH nor XSLT to use it, don't worry...). It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).
@Chris i can't understand that. actually i got all the html code in a string and now i just want to pull out the paragraphs written in between "p" tags. actually i am trying to fetch the whole article.
Well I can't vouch for HAP, I do agree with __avd. In any case, by default the dot operator (.) matches any character except the line feed character (\n). You can change the default behaviour by doing something like this:
Regex regex = new Regex(@"<p>\s(.)\s</p>", RegexOptions.SingleLine);
Match m = regex.Match(htmlstring);
Just note, that there are a lot of things that can go wrong when doing this. The other option is to use * after the subexpression to capture each as a group. If you want to capture multiple paragraphs, wrap the whole thing in round braces () and add the * to capture multiple groups.
F**ks sake, I am really starting to hate this new text editor. Writing this one short post was painful.