Hi guys, I've been banging my head all day with this

I've got a regular expression and im trying to extract a menu from the html page source to assert within a test. For the life of me i can't get it to work correctly.

Here's the page source:

<span id="sitemap"><a href="#"></a><span><a href="#"">Home</a></span>
<span style="text-decoration:none;"> : </span>
<span><a href="#" style="text-decoration:none;">Hello</a></span>
<span style="text-decoration:none;"> : </span>
<span><a href="#" style="text-decoration:none;">World</a>
</span><span style="text-decoration:none;"> : </span>
<span style="text-decoration:none;">Today</span>
<a id="sitemap"></a></span>

And here's the regex...

Basically I want 'Home : Hello : World : Today'

But this is in the middle of a html page so I want to ignore everything else.

Here's my attempt at a regex
(<span\s*id="sitemap"\s.* </span>)

but this doesnt appear to be working.

There is a problem when the items you're capturing are wrapped in something that can be caught by the Regular Expression.

You might need to remove some unwanted elements before using a Regex.
Are you reading this as one string or multiple?

Does it really need to be a regular expression?
[Assuming you can use Linq]
Treated as one big string, you could parse it with something like:

Console.WriteLine(
            string.Join(" ", 
               strRawHtml.Split("<>\r\n".ToArray(), StringSplitOptions.RemoveEmptyEntries)
               .Where(
                  s => !s.Contains('/')
                  && !s.Contains('"')
                  && !s.Equals("span")
               ).ToArray()));
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.