This is as much of a question about algorithm as it is about code.
I’m training monkeys to input data into a keyboard in the form of a table. I have created a special keyboard with nothing on it but numbers 0 - 9, a period, tab key and a return key. The goal is to get them to input data in rows and columns. However, I have been unsuccessful so far, as I am unable to get them to input a consistent number of columns. The input I get looks like this:

443 3439 3.932
4 55 9.99.9 12 293837493892
1 . 000.441893 78 74939.392938 4421229
8 77 .0 4.4 78909

…and so on.

Part of the problem is that as a reward for entering a consistent number of columns, I give them Jello shots. They really love the red ones.

As a programmer, I feel it’s my job to handle whatever erratic input might come my way, and so I have decided to try to deal with this input and create a table anyway. Much of this could be handled by simply throwing an exception or two, but this is an exercise and I believe it’s good for me. And since research monkeys are not cheap, I’m going to have to stick with the same ones. Besides, we’ve sort of bonded, I guess you could say. Some obvious concerns:

How do I code for various column widths? I can attempt to set the table to the number of rows present (incrementing by one for each “row”). Using the same procedure, I could then set the columns to the largest width. For example, in row three of the “table” above, there are possibly six values, giving it the greatest number of columns. Thus, I would have a table with four rows and six columns (depending on how the lone dot is handled). So far, the code to read the input and set the number of rows and columns looks like this:

1.	string endLine;
2.	double value = 0; //using this as placeholder so that something can be read.   Nothing else done with this value.
3.	while(fin.good())
4.	{
5.	   for(int r = 0; r < numRows; r++)
6.	       {
7.		fin>>value;
8.		fout << value << "\t";
9.		for(int c = 0; c < numCols; c++)
10.		    {
11.			fin >> value;
12.			fout << value << "\t";
13.			if(!(getline(fin, endLine)))
14.			numCols++;
15.		    }
16.		    if(getline(fin, endLine))
17.		    numRows++;
18.		    fout << endl;
19.	        }
20.	}
21.	
22.	cout << "Number of rows: " << numRows << ".\n";
23.	cout << "Number of columns: " << numCols << ".\n";

The idea is to read to the end of the line, remaining in the same row, until the code reads the end of line character, then add to the number of columns. However, simply incrementing numCols by one isn’t the answer because any given row might have more than one extra value. For example, row x might have two values (and thus two columns) and row x + 1 might have four values (and thus four columns). The answer would seem to be to count white spaces and add those up to get the total number of columns for the row (actually, I think it would be total number of whitespaces +1).

Assume that I get all this read and that I have a table of numRows number of rows and numCols number of columns. I then build that table and attempt to read the values from the same set of data:

double **table;
int numRows;
int numCols;

table = new double*[numRows];
for(int r = 0; r < numRows; r++)
{
table[r] = new double[numCols];
	for(int c = 0; c < numCols; c++)
	in >> table[r][c];
}

What will this table do when it tries to read a column that doesn’t have a value? Does it get to the endline marker and just read in the next value? And then what if it simply runs out of values to read? So far, my best idea has been to simply read in all the values in one stream, so to speak, and then find the two factors closest to the middle of the range of values to create a table. For example, if I read in 24 values, set the rows to 6 and columns to 4.

And finally, what’s the function to ignore that lone (or the extra) dot? Is it in.ignore(‘.’)?

Wait until I try to teach them to drive……………..

Your logic in your first code listing just will not work. For example for(int r = 0; r < numRows; r++) you use numRows as the loop limit, but numRows is what you are trying to calculate. What will be its value the first time the loop is entered? 0 seems most likely so then the loop never executes. The same applies to numCols.

for(int c = 0; c < numCols; c++)
		    {
			fin >> value;
			fout << value << "\t";
			if(!(getline(fin, endLine)))
			numCols++;
		    }

You do realise that getline will consume all the data left on the current line. That means that the second time round the loop you will have finished processing the current line and the first value you read on line 3 will be the first value on the next line.

Also yo do not do nearly enough error checking, line 3 could result in one or more of the stream status bits being set.

As far as input is concern a newline is just another white space character. You will get no particular indication that you have gone pasted one. However the newlines are important to you because the indicate the number of rows. I would suggest doing line reads from the file and then parsing the data on the line using a istringstream to get the values/number of values on the line.

How do I code for various column widths?

By using an appropriate data structure, reading each line as a whole, and then parsing out the columns. This is C++, so a vector of vectors or something similar would be far less awkward than a dynamic table.

As for parsing the fields, it really depends on what you want. Here's an example assuming all of the fields in your sample are valid except for the lone dot:

#include <iostream>
#include <sstream>
#include <string>

int main()
{
    using namespace std;

    istringstream in(
        "443 3439 3.932\n"
        "4 55 9.99.9 12 293837493892\n"
        "1 . 000.441893 78 74939.392938 4421229\n"
        "8 77 .0 4.4 78909\n");
    string line;

    while (getline(in, line))
    {
        istringstream split(line);
        string field;

        while (split>> field)
        {
            if (field.find_first_of("0123456789") != string::npos)
            {
                cout<< field <<'\t';
            }
        }

        cout<<'\n';
    }
}

Thanks to all for the responses.

I realized right off that I would need to run through the loops at least once, so I initialized numRows and numCols to 1; I just didn't include that part of the code.

I messed around with peeking and ignoring for a bit.

char b;
	double value = 0; //using this as placeholder so that something can be  read. //Nothing else done with this value.
	while(fin.good())
	{
		b = fin.peek();
		switch(b)
		{
		case ' ':
			numCols++;
			fin.ignore(1, b);
			break;
		case '\n':
			numRows++;
			fin.ignore(1, b);
			break;
		default:
			value = b;
			fout << value << '\t';
			//fin.ignore(1, b);
		}
		fout << endl;
	}//end while()

	cout << "Number of rows: " << numRows << ".\n";
	cout << "Number of columns: " << numCols << ".\n";

However, this was even worse than running through the loops.

So, it seems the best answers were the ones you posted, which I will try.

And yes, same monkeys, on loan. They were returned to me miserable and demoralized.

This article has been dead for over six months. Start a new discussion instead.