Hello,

I am trying to read in a tab delimited text file. Most lines in the file are in the following format:

Col1     Col2     Col3
data1     data2     data3
data1     data2     data3

But some lines are missing the first value:

Col1     Col2     Col3
data1     data2     data3
     data2     data3
data1     data2     data3

I trying to parse each line with the following code:

line = new Scanner(scanner.nextLine());
line.useDelimiter("\t");
variable = line.next();

However, this does not work in cases where the value for the first column is missing. In such cases everything will be off by one.
Meaning, I will be getting the value for col2 where I should be getting the value for col1; and I will be getting the value for col3 where I should be getting the value for col3(so on and so on).

Has someone encountered this problem before? Would someone make a suggestion regarding how solve this problem?

Recommended Answers

All 4 Replies

>Has someone encountered this problem before?
>Would someone make a suggestion regarding how solve this problem?
Uhm...You could maybe let your program throw an error when it wasn't able to get data from the three columns, if the data is incomplete, your program cannot work with it so I guess that would be a good solution.
(But you'll have to keep track of the number of read values in order to know that).

Or...
Assume your file looks like this:

[I]col1[/I]      [I]col2[/I]      [I]col3[/I]
hello     world

Then we can use the following approach to achieve what you want:
I assume that you'll read these values into an array first, I'll call this array 'temp', you'll also need a counter in order to keep track of the number of values read, I'll call this variable 'count'.
So to begin, you read all the values on that line into array 'temp'.
(Each time you read a value, you increase variable 'count' with one)

After reading the first line from the file, the array will look like:

[0] = "hello";
[1] = "world";
( [3] = ?; )

The annoying thing is that these values aren't in the order we want it, so let's create an array where the values are in the right order, I'll call this array 'columns'.
All the information we've got about the columns is these:

Number of columns in the file: 3 (probably a constant within your program, or a hardcoded value)
Number of values read: 2 (stored in the variable 'count')

Now, all what's left to do is: subtract the number of values read from the total number of columns in the file: 3 - 2 = 1 (store this value in a variable called 'startIdx' for example).

So, we've got a number now...
It's not just a number, it's the index where we need to start writing to get all the values at their correct place in the 'columns' array.

At this stage you just have to run a loop to loop through the 'temp' array (element by element), as long as the loop index is lower than the number of values read from the file.
Now you write the value at the correct position in the 'columns' array, using the variable 'startIdx' to write in the correct element.
At the end of the loop, you each time increase the value of 'startIdx' with one.

When the whole loop has finished, your 'columns' array should look like this:

[0] = ?;
[1] = "hello";
[2] = "world";

Uhm...You could maybe let your program throw an error when it wasn't able to get data from the three columns, if the data is incomplete, your program cannot work with it so I guess that would be a good solution.

The actual file that I am working with has 124 columns.
There are other columns with empty values. But that is okay since scanner returns empty strings for those columns. But everything is off by one column when the first column is empty. Because scanner sees a tab at the very beginning of the line. So it splits the string using:

line.useDelimiter("\t");

Thus, I am getting one less column for those lines.

Try this:

lineStr = scanner.nextLine();
	        
	       byte  bytes[] = lineStr.getBytes();
	       char tab = 9;
	       if((char)bytes[0]==((char)tab)) {		   
		 lineStr = "NA" +lineStr;
	       }

simple :)

commented: Good suggestion, simple, clever, and better than mine :) +18

:$

I see it.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.