I have the below code,

string txt = "<table SUMMARY=\"Game Match\" border=0 cellpadding=2 cellspacing=0><tr valign=top><td width=25><b>2.</b></td><td width=60><a href=\"/game/gameboy-advance/legend-of-zelda-a-link-to-the-past\"><img alt=\"Game Boy Advance Front Cover\" border=\"0\" src=\"/images/i/36/26/365576.jpeg\" height=\"59\" width=\"60\" ></a></td><td><a href=\"/game/gameboy-advance/legend-of-zelda-a-link-to-the-past\">The Legend of Zelda: A Link to the Past</a><br>by Nintendo of America Inc. -- 2002<br>Game Boy Advance<table SUMMARY=\"Game Match\" border=0 cellpadding=2 cellspacing=0><tr valign=top><td width=25><b>2.</b></td><td width=60><a href=\"/game/gameboy-advance/legend-of-zelda-a-link-to-the-past\"><img alt=\"Game Boy Advance Front Cover\" border=\"0\" src=\"/images/i/36/26/365576.jpeg\" height=\"59\" width=\"60\" ></a></td><td><a href=\"/game/gameboy-advance/legend-of-zelda-a-link-to-the-past\">The Legend of Zelda: A Link to the Past</a><br>by Nintendo of America Inc. -- 2002<br>Game Boy Advance<table SUMMARY=\"Game Match\" border=0 cellpadding=2 cellspacing=0><tr valign=top><td width=25><b>1.</b></td><td width=60><a href=\"/game/snes/legend-of-zelda-a-link-to-the-past\"><img alt=\"SNES Front Cover\" border=\"0\" src=\"/images/i/41/28/365728.jpeg\" height=\"41\" width=\"60\" ></a></td><td><a href=\"/game/snes/legend-of-zelda-a-link-to-the-past\">The Legend of Zelda: A Link to the Past</a><br>by Nintendo Co., Ltd. -- 1991<br>SNES";
            string title = "Monopoly";
            string url = "http://www.mobygames.com/search/quick?q=" + title.Replace(' ', '+');
            StreamReader oSR = null;
            //WebRequest.DefaultWebProxy.Credentials = CredentialCache.DefaultNetworkCredentials;
            HttpWebRequest objRequest = (HttpWebRequest)WebRequest.Create(url);
            WebResponse objResponse = objRequest.GetResponse();
            oSR = new StreamReader(objResponse.GetResponseStream());
            txt = oSR.ReadToEnd();
            String unixpath1 = "";
            string name = "";
            string[] name2;
            string re1 = ".*?";	// Non-greedy match on filler
            string re2 = "(?:\\/[\\w\\.\\-]+)+";	// Uninteresting: unixpath
            string re3 = ".*?";	// Non-greedy match on filler
            string re4 = "(?:\\/[\\w\\.\\-]+)+";	// Uninteresting: unixpath
            string re5 = ".*?";	// Non-greedy match on filler
            string re6 = "((?:\\/[\\w\\.\\-]+)+)";	// Unix Path 1
            string re7 = ".*?";	// Non-greedy match on filler
            string re8 = "((?:(?:[1]{1}\\d{1}\\d{1}\\d{1})|(?:[2]{1}\\d{3})))(?![\\d])";	// Year 1

            Regex r = new Regex(re1 + re2 + re3 + re4 + re5 + re6 + re7 + re8, RegexOptions.IgnoreCase);
            MatchCollection m = r.Matches(txt);
            int i = 0;
            Console.WriteLine(txt);
            while (i < m.Count)
            {
                name = m[i].Groups[1].Value.Replace('-', ' ');
                name2 = name.Split('/');
                unixpath1 += "Name: " + name2[3] + "\nPlatform: " + name2[2] + "\nLink: " + m[i].Groups[1].Value + "\nYear: " + m[i].Groups[2].Value + "\n--------------------------------------\n";
                unixpath1 += "Link: " + m[i].Groups[1].Value + "\nYear: " + m[i].Groups[2].Value + "\n--------------------------------------\n";
                i++;
            }
            Console.WriteLine(unixpath1);
            //Console.WriteLine(year1);
            //Console.Write("(" + unixpath1.ToString() + ")" + "(" + year1.ToString() + ")" + "\n");

            Console.ReadLine();

Searching the pre-defined string txt works fine. After grabbing the web-page, everthing seems to go fine, however the program stalls.

Stepping through the code indicates the code stops at the while statement.

Various testing shows that its not the while statement that is wrong but something to do with variable m. If I try to access any properties or methods or anything of m the program stalls.

I assume the regex expression is working fine, as it runs that statement without a problem.

Is the result set too big and causing issues maybe?

Any help would be much appreciated.

TIA

Recommended Answers

All 2 Replies

Realize that the regular expressions don't get evaluated against the input string until you call m.Count. This is because MatchCollection hopes that you'll enumerate through the MatchCollection with a foreach loop, so that it can perform one regex match at a time, possibly avoiding any unneeded computation.

Your regexes are experiencing catastrophic backtracking.

Ok..... point taken... And I can only assume it is well and truly catastrophic considering size of page.

So I should first regex out the bit that contains the information I need and ignore the rest and see how that goes maybe...

Thanks for the help

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.