Member Avatar
kobalt

Hello All,
I'm having trouble with a little string search. My code downloads a webpage that contains a table of a list of sharenames, the company name and a margin rate in the following format:

<tr class="odd">
<td>AAC</td>
<td>Australian Agricultural Company Limited.</td>
<td>35.00%</td>
<td>

No
</td>
<td>
No
</td>	
<td>
Yes
</td>
</tr>
<tr class="even">
<td>AAD</td>
<td>ARDENT LEISURE GROUP</td>
<td>35.00%</td>
<td>

No
</td>
<td>
No
</td>	
<td>
Yes
</td>
</tr>
<tr class="odd">
<td>AAE</td>
<td>Agri Energy Limited</td>
<td>100.00%</td>
<td>

No
</td>
<td>
No
</td>	
<td>
No
</td>
</tr>

This is read in to a string from a streamreader:
string Result = sr.ReadToEnd();

I am looking to return the Margin rate eg <td>35%</td> for any given sharename (Name is a 3 letter id eg. AAC)

Code I have so far:

string regMatch = "<td>" + Name + "</td>";
            if (Regex.IsMatch(Result, regMatch))
            {
                MessageBox.Show(Name + " Found");
            }
            else
                MessageBox.Show(Name + " Not Found");

All this tells me is that the Sharename is found... How do I then return the Margin and perhaps even the company Name?

When you use Regex you can use Named Groups to return values found within your pattern.
I created an example app for you to look through. The form had 3 textboxes. One for input which contains the sample data you posted here, one is an optional search box to find a specific company and the last is for output. There is also a checkbox, if it is checked then the input is searched for the abbreviated name in the search box and if it isnt then it outputs every company in the input data.

public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void btnGo_Click(object sender, EventArgs e)
        {
            if (chbSearch.Checked)
            {
                Share share = Find(txtInput.Text, txtSearch.Text);
                txtOutput.Text = share.ToString();
            }
            else
            {
                txtOutput.Clear();
                List<Share> shares = FindAll(txtInput.Text);
                foreach (Share share in shares)
                {
                    txtOutput.Text += share.ToString() + Environment.NewLine;
                }
            }
        }
        private Share Find(string input, string Name)
        {
            string pattern = @"(?:<tr.+?<td>)(?<Name>" + Name + @")(?:</td>)(?:<td>)(?<FullName>.+?)(?:</td>)(?:<td>)(?<Rate>.+?)(?:%</td>)";

            Match match = System.Text.RegularExpressions.Regex.Match(input, pattern);
            if (match.Success)
            {
                Share share = new Share();
                share.Name = match.Groups["Name"].Value;
                share.FullName = match.Groups["FullName"].Value;
                Double rate;
                bool success = double.TryParse(match.Groups["Rate"].Value, out rate);
                if (!success)
                    rate = -1;

                share.Rate = rate;

                return share;
            }
            else
            {
                return null;
            }
        }
        private List<Share> FindAll(string input)
        {
            List<Share> shares = new List<Share>();
            const string pattern = @"(?:<tr.+?<td>)(?<Name>.+?)(?:</td>)(?:<td>)(?<FullName>.+?)(?:</td>)(?:<td>)(?<Rate>.+?)(?:%</td>)";

            MatchCollection matches = System.Text.RegularExpressions.Regex.Matches(input, pattern);
            foreach(Match match in matches)
            {
                Share share = new Share();
                share.Name = match.Groups["Name"].Value;
                share.FullName = match.Groups["FullName"].Value;
                Double rate;
                bool success = double.TryParse(match.Groups["Rate"].Value, out rate);
                if(!success)
                    rate = -1;

                share.Rate = rate;
                shares.Add(share);
            }

            return shares;
        }

        class Share
        {
            public string Name { get; set; }
            public string FullName { get; set; }
            public double Rate { get; set; }

            public override string ToString()
            {
                return Name + " - " + FullName + " : " + Rate + "%";
            }
        }
    }

Let me know if you need me to explain anything in more detail. Regex can be very powerful when you learn some of the more advanced syntax. But it can be quite hard to get the hang of.

string find = "AAD";
            string result = "<table><tr><td>AAD</td><td>35%</td></tr><tr><td>hola</td><td>40%</td></tr></table>";
            string[] arrLines = Regex.Split(result, @"<tr.*?>",RegexOptions.IgnoreCase);

            foreach (string strLine in arrLines)
            {
                string[] strCol = Regex.Split(strLine, @"<td.*?>",RegexOptions.IgnoreCase);
                for (int i = 0; i < strCol.Length; i+=1)
                {
                    string test = Regex.Replace(strCol[i], @"<[^>]*>", "");
                    if (test == find)
                    {
                        String Rate += Regex.Replace(strCol[i + 1], @"<[^>]*>", "");
                    }
                }
            }

this is not the best approach but it will do the work. the ouput is String Rate = 35%.

regards

Member Avatar
kobalt

When you use Regex you can use Named Groups to return values found within your pattern.
I created an example app for you to look through. The form had 3 textboxes. One for input which contains the sample data you posted here, one is an optional search box to find a specific company and the last is for output. There is also a checkbox, if it is checked then the input is searched for the abbreviated name in the search box and if it isnt then it outputs every company in the input data.

public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void btnGo_Click(object sender, EventArgs e)
        {
            if (chbSearch.Checked)
            {
                Share share = Find(txtInput.Text, txtSearch.Text);
                txtOutput.Text = share.ToString();
            }
            else
            {
                txtOutput.Clear();
                List<Share> shares = FindAll(txtInput.Text);
                foreach (Share share in shares)
                {
                    txtOutput.Text += share.ToString() + Environment.NewLine;
                }
            }
        }
        private Share Find(string input, string Name)
        {
            string pattern = @"(?:<tr.+?<td>)(?<Name>" + Name + @")(?:</td>)(?:<td>)(?<FullName>.+?)(?:</td>)(?:<td>)(?<Rate>.+?)(?:%</td>)";

            Match match = System.Text.RegularExpressions.Regex.Match(input, pattern);
            if (match.Success)
            {
                Share share = new Share();
                share.Name = match.Groups["Name"].Value;
                share.FullName = match.Groups["FullName"].Value;
                Double rate;
                bool success = double.TryParse(match.Groups["Rate"].Value, out rate);
                if (!success)
                    rate = -1;

                share.Rate = rate;

                return share;
            }
            else
            {
                return null;
            }
        }
        private List<Share> FindAll(string input)
        {
            List<Share> shares = new List<Share>();
            const string pattern = @"(?:<tr.+?<td>)(?<Name>.+?)(?:</td>)(?:<td>)(?<FullName>.+?)(?:</td>)(?:<td>)(?<Rate>.+?)(?:%</td>)";

            MatchCollection matches = System.Text.RegularExpressions.Regex.Matches(input, pattern);
            foreach(Match match in matches)
            {
                Share share = new Share();
                share.Name = match.Groups["Name"].Value;
                share.FullName = match.Groups["FullName"].Value;
                Double rate;
                bool success = double.TryParse(match.Groups["Rate"].Value, out rate);
                if(!success)
                    rate = -1;

                share.Rate = rate;
                shares.Add(share);
            }

            return shares;
        }

        class Share
        {
            public string Name { get; set; }
            public string FullName { get; set; }
            public double Rate { get; set; }

            public override string ToString()
            {
                return Name + " - " + FullName + " : " + Rate + "%";
            }
        }
    }

Let me know if you need me to explain anything in more detail. Regex can be very powerful when you learn some of the more advanced syntax. But it can be quite hard to get the hang of.

Thanks for your help and for providing your code - however, I can't seem to get it to work???? I have copied it directly into Visual C# and created a Form - it runs, however when I click on the Go button (with chk Search checked) I get a NullReferenceException at txtOutput.Text = share.ToString();
if I don't check the Search box, txtOutput.text displays nothing?

However, I will continue to play around with the code you've provided as it looks like it'll do exactly what I want to do (I still have to do some more reading on Regex obviouslY!)

Member Avatar
kobalt

Thanks for your help and for providing your code - however, I can't seem to get it to work???? I have copied it directly into Visual C# and created a Form - it runs, however when I click on the Go button (with chk Search checked) I get a NullReferenceException at txtOutput.Text = share.ToString();
if I don't check the Search box, txtOutput.text displays nothing?

However, I will continue to play around with the code you've provided as it looks like it'll do exactly what I want to do (I still have to do some more reading on Regex obviouslY!)

WOOHOO!! Thanks Heaps - I've solved the issue, the string pattern needed \r\n characters.

const string pattern = @"(?:<tr.+?\r\n<td>)(?<Name>.+?)(?:</td>\r\n)(?:<td>)(?<FullName>.+?)(?:</td>\r\n)(?:<td>)(?<Rate>.+?)(?:%</td>\r\n)";

All works beautifully now. Thank you for pointing me in the right direction!

Member Avatar
kobalt

WOOHOO!! Thanks Heaps - I've solved the issue, the string pattern needed \r\n characters.

const string pattern = @"(?:<tr.+?\r\n<td>)(?<Name>.+?)(?:</td>\r\n)(?:<td>)(?<FullName>.+?)(?:</td>\r\n)(?:<td>)(?<Rate>.+?)(?:%</td>\r\n)";

All works beautifully now. Thank you for pointing me in the right direction!

Ok.... Still having problems :( this is really frustrating me! The above code by Ryshad works fine, however, when I utilise that code in the project I'm working on, I can't get the 'Pattern' to Match. Here is the code I've got:

private void GetMarginRate(string Name)
        {
            //GET MARGIN RATES FROM 
            HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("https://www.macquarie.com.au/emgonline/portal/web/guest/marginrates");
            webRequest.Method = "GET";
            WebResponse webResponse = webRequest.GetResponse();
            StreamReader sr = new StreamReader(webRequest.GetResponse().GetResponseStream(), System.Text.Encoding.UTF8);

            string Result = sr.ReadToEnd();
            sr.Close();

            webResponse.Close();

            //need to search for text "<td> + name + </td>" as entire string
            string regMatch = "<td>" + Name + "</td>";
            if (Regex.IsMatch(Result, regMatch))
            {
                Find(Result, Name);
            }
            else
                MessageBox.Show(Name + " Not FOund");
         
        }

        private void Find(string input, string Name)
        {
            string pattern = @"(?:<tr.+?\r\n<td>)(?<Name>" + Name + @")(?:</td>\r\n)(?:<td>)(?<FullName>.+?)(?:</td>\r\n)(?:<td>)(?<Rate>.+?)(?:%</td>)";

            Match match = System.Text.RegularExpressions.Regex.Match(input, pattern);
            if (match.Success)
            {
                //share.Name = match.Groups["Name"].Value;
                //share.FullName = match.Groups["FullName"].Value;

                Double rate;
                bool success = double.TryParse(match.Groups["Rate"].Value, out rate);
                if (!success)
                    rate = -1;

                MessageBox.Show(Name + " Margin Rate: " + rate);
            }
            else
            {
                MessageBox.Show(Name + "Margin Rate not found");
            }
        }

If I open up the webpage, view source, copy and paste the webpage code into a textbox and then use Ryshad's solution, I get a result. However, if I use the streamreader to read the code into a String and then try and search using the same solution as in my code, I get a 'Margin Rate not found' Message box. I know the string contains the webpage, and I know the it can find a 'Name' within that to pass to the next method....
Can anyone see what i'm doing wrong here?

Member Avatar
kobalt

I finally solved the problem after much frutstration!

Aparently when the webpage is read into the Streamreader file, it doesnt have the 'return carriage' char - just the 'newline' char. However, when the same data is cut/paste into a textbox for example, the 'return carriage' char is appended at each line. I don't understand why, and perhaps I am wrong in my interpretation, however, when I removed the return carriage \r from the string pattern I get the intended result i'm looking for.

string pattern = @"(?:<td>)(?<Name>" + Name + @")(?:</td>\n)(?:<td>)(?<FullName>.+?)(?:</td>\n)(?:<td>)(?<Rate>.+?)(?:%</td>)";

Is this normal behaviour? Am I missing something here?

Different environments use different newline symbols. Some use \n some use \r\n.

There are a couple of workarounds:

1) Remove the linefeeds from the source text.

input = input.Replace("\r\n", "");
            input = input.Replace("\n", "");

2) Include the linefeeds in the regex explicitly (as you have). Can check for either using optional groups like (?:\r\n|\n)*?. In this example, it will match either '\r\n' OR '\n' any number of times (non-greedy)

string regex = @"(?:<tr.+?(?:\r\n|\n)*?<td>)(?<Name>.+?)(?:</td>)(?:(?:\r\n|\n)*?<td>)(?<FullName>.+?)(?:</td>)(?:(?:\r\n|\n)*?<td>)(?<Rate>.+?)(?:%</td>)"

3) Use options to make the '.' match line feeds:

string regex = @"(?s:(?:<tr.+?<td>)(?<Name>.+?)(?:</td>.*?)(?:<td>)(?<FullName>.+?)(?:</td>.*?)(?:<td>)(?<Rate>.+?)(?:%</td>))"

Here we have wrapped the query in an option group (?s: ). The 's' specifies SingleLine option is turned on which causes the '.' to match line feeds as well as all other characters. We can then use a '.*?' between each element to match any whitespace/linefeeds: (?:</td>.*?)

Personally i prefer number 1. By removing the linefeeds you can take a lot of unecessary symbols out of the regex which makes your code more readable.

EDIT: Also, apologies for any frustration my not replying over the weekend might have caused. It was my birthday so i was otherwise engaged :)

Member Avatar
kobalt


Personally i prefer number 1. By removing the linefeeds you can take a lot of unecessary symbols out of the regex which makes your code more readable.

EDIT: Also, apologies for any frustration my not replying over the weekend might have caused. It was my birthday so i was otherwise engaged :)

Thanks Ryshad, I'll use #1 too - it makes sense. No need to apologise at all, you've been more than helpful and I've learnt heaps from your original post as well. Happy Birthday and thanks again!

<table ><tr><td>Type</td><td>%@</td></tr><tr><td>Name</td><td>%@</td></tr><tr><td>Email</td><td>%@</td></tr><tr><td>App Version</td><td>%@</td></tr><tr><td>iOS Version</td><td>%@</td></tr><tr><td>Message</td><td>%@</td></tr></table>

i wnant to get the value from table and save into the data base with specific cloumn this tabale is a string message in my mathod i wnat to parse the tale and get specific value kindly help me;