Hello friends! I've been playing around with the MS Chart class type (System.Windows.Forms.DataVisualization.Charting) and have a situation where I would like to know how to code a query of my data using LINQ. The data I'm playing with is the PowerBall drawings data that is updated each drawing: http://www.powerball.com/powerball/winnums-text.txt

I'm loading a DataTable from the file above:

public const int MAX_WB = 59;
        public const int MAX_PB = 42; // only 39 now, but must have been upto 42 at one time???
        public const int MAX_PP = 5;

        public void LoadData(string filename)
        {
            try
            {
                //first make sure the file exists
                if (!File.Exists(filename))
                {
                    throw new FileNotFoundException("The file " + filename + " could not be found");
                }

                //create a StreamReader and open our text file
                using (StreamReader reader = new StreamReader(filename))
                {
                    Filename = filename;

                    // Using DataSet; structure same as commented code below...
                    dt = dsPowerBall.Tables["Drawings"];
#if false
                    // Add data columns
                    dt.Columns.Add(new DataColumn("DrawDate", typeof(DateTime)));
                    dt.Columns.Add(new DataColumn("WB1", typeof(int)));
                    dt.Columns.Add(new DataColumn("WB2", typeof(int)));
                    dt.Columns.Add(new DataColumn("WB3", typeof(int)));
                    dt.Columns.Add(new DataColumn("WB4", typeof(int)));
                    dt.Columns.Add(new DataColumn("WB5", typeof(int)));
                    dt.Columns.Add(new DataColumn("PB", typeof(int)));
                    dt.Columns.Add(new DataColumn("PP", typeof(int)));
#endif
                    dt.Rows.Clear();
                    string data = reader.ReadToEnd();
                    string[] rows = data.Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
                    
                    // skip first row of data containing the column names
                    for (int i=1; i<rows.Count(); i++)
                    {
                        bool error = false;

                        string r = rows[i];

                        string[] values = r.Split(new char[]{' '}, StringSplitOptions.RemoveEmptyEntries);

                        // scrub data...
                        if (values.Count() < 7)
                            error = true;

                        DateTime date = (DateTime)Convert.ToDateTime(values[0]);
                        int wb1 = (int)Convert.ToInt32(values[1]);
                        int wb2 = (int)Convert.ToInt32(values[2]);
                        int wb3 = (int)Convert.ToInt32(values[3]);
                        int wb4 = (int)Convert.ToInt32(values[4]);
                        int wb5 = (int)Convert.ToInt32(values[5]);
                        int pb = (int)Convert.ToInt32(values[6]);
                        // power play started later in game and not all records have a pp value...
                        int pp = 0;
                        if (values.Count() > 7)
                            pp = (int)Convert.ToInt32(values[7]);

                        if (date.Date.CompareTo(new DateTime(1997, 11, 5)) < 0
                            || date.Date.CompareTo(System.DateTime.Now) > 0)
                        {
                            error = true;
                        }
                        else if ((wb1 < 1 || wb1 > MAX_WB)
                            || (wb2 < 1 || wb2 > MAX_WB)
                            || (wb3 < 1 || wb3 > MAX_WB)
                            || (wb4 < 1 || wb4 > MAX_WB)
                            || (wb5 < 1 || wb5 > MAX_WB))
                        {
                            error = true;
                        }
                        else if (pb < 1 || pb > MAX_PB)
                            error = true;
                        else if (pp < 0 || pp > MAX_PP)
                            error = true;

                        if (error)
                        {
                            DumpInvalidRecord(values);
                            continue;
                        }

                        DataRow dr = dt.Rows.Add();
                        dr["DrawDate"] = date;
                        dr["WB1"] = wb1;
                        dr["WB2"] = wb2;
                        dr["WB3"] = wb3;
                        dr["WB4"] = wb4;
                        dr["WB5"] = wb5;
                        dr["PB"] = pb;
                        dr["PP"] = pp;

                    }
                }
            }
            catch (FileNotFoundException ex)
            {
                Console.WriteLine("Exception: {0}\r\n   Stack Trace: {1}", ex.Message, ex.StackTrace);
            }
            catch (Exception ex)
            {
                Console.WriteLine("Exception: {0}\r\n   Stack Trace: {1}", ex.Message, ex.StackTrace);
            }
        }

What I would like is an example projection of using LINQ to obtain a count, by ball number, of each of the top 10 winning balls ("WB1" THRU "WB5") from the DataTable. In other words, I'd like to create a list of 10 "numbered" balls having the highest/greatest number of hits.

Thanks in advance for the help!

If I'm interpreting your request correctly, maybe something like this meets your needs.

for (int i = 1; i <= 5; i++)
    {
        var query = (from DataRow dr in dt.Rows
                     group dr by int.Parse(dr["WB" + i.ToString()].ToString())
                         into ballgroup
                         select new
                         {
                             Ball = ballgroup.Key,
                             BallCount = ballgroup.Count()
                         }).OrderByDescending(ball => ball.BallCount)
                         .Take(10);

        foreach (var ball in query)
            Console.WriteLine("PB{0}: {1} - Count: {2}", i, ball.Ball, ball.BallCount);

        Console.WriteLine("\n\n");
    }

That would be just outside of your for (int i=1; i<rows.Count(); i++) looping construct, just before the close of your using (StreamReader reader = new StreamReader(filename)) bracket.

apegram, thanks for your help! I believe the query you gave is partitioning the projection into the top 10 ball counts for each column ("WB1" THRU "WB5"). However, what I am looking for is a query statement that would partition the top 10 accumulated ball count "across all five columns". I'm not sure if I have worded that coherently enough for you to interpret...

Let us say that "WB1" contains 37 instances of ball number 2 and each of the remaining columns ("WB2" THRU "WB5") contain only 2 instances of ball number 2. When determining the top 10 ball counts, I want to evaluate 45 instances of ball 2 and not just 37; where 37 is a high (top 10) count in column "WB1" only. Does that make sense? Also, these counts I'm throwing out are arbitrary and not real. Here is what I am using to tally the actual ball instances for comparison:

int[] wb = new int[MAX_WB];
        int[] pb = new int[MAX_PB];
        int[] pp = new int[MAX_PP];

        public void TallyPicks(DateTime dateStart, DateTime dateEnd)
        {
            Array.Clear(wb, 0, wb.Count());
            Array.Clear(pb, 0, pb.Count());
            Array.Clear(pp, 0, pp.Count());
                
            try
            {
                string query = "DrawDate >= '" + dateStart.Date + "'"
                    + " AND DrawDate <= '" +dateEnd.Date + "'";
                
                if (PowerBall > -1)
                    query += " AND PB = " + PowerBall;

                DataRow[] rows = dt.Select(query);
                //for (int i = 0; i < dt.Rows.Count; i++)
                for (int i=0; i<rows.Count(); i++)
                {
                    //DataRow dr = dt.Rows[i];
                    DataRow dr = rows[i];

                    //System.Diagnostics.Debug.Assert(dr.ItemArray.Count() == 8);

                    int num;

                    // skip date, powerball and power play...
                    for (int c = 1; c < dt.Columns.Count - 2; c++)
                    {
                        num = (int)dr[c];
                        System.Diagnostics.Debug.Assert(num > 0 && num <= MAX_WB);
                        wb[num - 1]++;
                    }

                    // power ball...
                    num = (int)dr[6];
                    System.Diagnostics.Debug.Assert(num > 0 && num <= MAX_PB);
                    pb[num-1]++;

                    // power play...
                    num = (int)dr[7];
                    if (num == 0)
                        continue; // prior to power play implementation...
                    System.Diagnostics.Debug.Assert(num > 0 && num <= MAX_PP);
                    pp[num-1]++;
                }
            }
            catch (Exception ex)
            {
                Console.WriteLine("Exception: {0}\r\n   Stack Trace: {1}", ex.Message, ex.StackTrace);
            }
        }

I know I could more easily take the above array buckets and apply descending/Take, but I'm trying to learn how to use LINQ to do it and I want to be able to throw in other where clauses like when the Power Ball is a particular number too, but I've got to learn to walk before I can run ;), so that's why I didn't throw in the extra check for a particular power ball at this point. Am I making sense?

As I will certainly expand how I mine the data, I want to get a better grasp using LINQ to do it. Thanks again for helping me with this endeavor.

OK, this version will take the balls from each column and load them into a single generic list of ints. Then it modifies the original query from the first example to work against that list instead of each individual column.

List<int> allBalls = new List<int>();

    for (int i = 1; i <= 5; i++)
    {
        var firstQuery = from DataRow dr in dt.Rows
                          select int.Parse(dr["WB" + i.ToString()].ToString());

        allBalls.AddRange(firstQuery);
    }

    var query = (from ball in allBalls
                 group ball by ball
                     into ballgroup
                     select new
                     {
                         Ball = ballgroup.Key,
                         BallCount = ballgroup.Count()
                     })
                     .OrderByDescending(ball => ball.BallCount)
                     .Take(10);

    foreach (var ball in query)
        Console.WriteLine("Ball {0} - Count: {1}", ball.Ball, ball.BallCount);

    Console.WriteLine("\n\n");

If you want to limit the date ranges for the tally, you would apply a where clause to the first query.

var firstQuery = from DataRow dr in dt.Rows
                         where DateTime.Parse(dr["DrawDate"].ToString()) >= new DateTime(2009, 12, 1)
                         && DateTime.Parse(dr["DrawDate"].ToString()) < new DateTime(2010, 1, 1)
                         select int.Parse(dr["WB" + i.ToString()].ToString());

Edited 6 Years Ago by apegram: n/a

This appears to have accomplished what I requested--very cool! Let me mull this over tomorrow and I will get back to you. Thanks again for showing me how this can be done using LINQ!

EDIT >>> Actually, this was more straightforward than I thought. If I have understood correctly, all I need in this case to apply a criteria of a particular powerball number is this modification (hardcoded powerball number of 2):

var firstQuery = from DataRow dr in dt.Rows
                                 where int.Parse(dr["PB"].ToString()) == 2
                                 select int.Parse(dr["WB" + i.ToString()].ToString());

If I have understood correctly enough, I will mark this thread as solved and just create a new thread if I have additional questions... Let me know and thanks!

Edited 6 Years Ago by DdoubleD: n/a

Yes, that's how you would apply a limit to your results. You can hardcode it, use a variable input, combine it with other limits, etc.

This question has already been answered. Start a new discussion instead.