I would like to find out what this code snippet does. i know it involves extraction from a webpage format gb2312 but can ano\yone explain line by line, i appreciate the help... :)

Code blocks are created by indenting at least 4 spaces
... and can span multiple lines
 if (RgxNextPage.IsMatch(page))
        {


            nurl = RgxNextPage.Match(page).Value.Replace("pagepanel", string.Empty);
            Regex WebURL = new Regex(patternURL, RegexOptions.IgnoreCase);
            if (WebURL.IsMatch(nurl))
            {
                nurl = WebURL.Match(nurl).Value;
            }
            else
            {
                nurl = string.Empty;
            }


        }
        if (!nurl.Equals(url) && !string.IsNullOrEmpty(nurl))
        {
            string temp = readPage(nurl, "gb2312");
            ExtractResults(nurl, temp, list);
        }

Recommended Answers

All 6 Replies

Code is setting some variable "nurl" based on some matching - which is done by using Regular Expressions.
After that (in 2nd if statement) is checks the "nurl" varialbe and get some string from readPage() method.
Next (and last), is fires ExtractResults() method which obviously does some things (like setting or showing result; based on method`s name) - based on some parametes that were gathered and set previously in the code from above.

Hope it hepls.
bye

This is a poorly designed method. In line 3 it does a Regex match (IsMatch) then repeats the same match in the very next line (6). Regex is slow, do the matching once and save it. It does this again in lines 9 and 11.

Line 20 if nurl is null it will generate an exception as the test for null is after the use of the variable.

A lot will depend on previous settings before the code gets to this point.
It could never reach this depending on other values (or crash as Momerath said).
Also, if patternURL is not properly formed, you'll get unexpected results.

Are you going to rewrite this function?

yes i have done so. and tided up what needed to be. this is part of my project to build a system that will retrieve information from 3 different types of websites that are showing results from queries of books. in each website the program extracts all results and stores them in a text file. now that i have done that the next task is .

  1. combine all three programs form the different websites to one program (because each website is unique i had to do them differently) then have to sort results according to their rank.

  2. then remove duplicate results.

3.Lastly then present to the users from an interface that i have to design myself.

my main problem is what material can i go through to have knowledge on doing the result merging and sorting algorithm. en how to apply it to my code.

all types of advice really appreciated "positive en negative critisism all welcome"

Result merging?
It sounds like you need a collection of items (either strings, key-value pairs or custom objects).
You can place the results together and evaluate them after they are collected.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.