Java Program Help

Question

Sailor_Jerry 3 Junior Poster in Training

18 Years Ago

I have 10 large .txt files with over 13,000 rows of data. Each “column is separated with a comma.
I want to find the largest value in one of the columns for all the txt files. (By largest I mean greatest number of characters for a particular column).
I wanted to write something in java to find the largest value for the column. Any ideas on how I would go about doing this?

Thanks
sj.

java

3 Contributors
9 Replies
251 Views
1 Day Discussion Span
Latest Post 18 Years Ago Latest Post by iamthwee

iamthwee

18 Years Ago

1. File i/o
2. Parse the data by virtue of their columns
3. Use the String length function to ascertain the longest word.

iamthwee

18 Years Ago

Look up rthe string.split method.

Effectively you're using the comma as a delimiter to separate the line into tokens.

Then you can just count the commas to find the 4th or fifth token or just count the tokens.

Dark_Omen 5 Posting Pro

18 Years Ago

Once you read in the data, go string by string and use StringTokenizer to get each element of the line, and depending on which token it is test it with the right data (meaning the data for the columns). So StringTokenizer toks = new StringTokenizer(line, ", ");

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Sailor_Jerry 3 Junior Poster in Training · Answer 1 · 2006-05-23T02:50:37+00:00

I'm having troble with the parsing part. What class should i use if i want to pull out the content between the the 4th and 5th comma of each row in my file?

Thanks

iamthwee · Answer 2 · 2006-05-23T14:00:35+00:00

I think that the general concensus is that string.split should be used over the StringTokeniser method. It has better functionality and is considered altogether better.

Perhaps its only downfall is the slight difference in speed, although this difference is very very negligible.

Sailor_Jerry 3 Junior Poster in Training · Answer 3 · 2006-05-24T00:43:27+00:00

Thanks for the help. This is what i came up with.

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class CSVFileParser
{

    public static void main(String[] args)
    {
        CSVFileParser csvFile = new CSVFileParser();
        csvFile.readCSVFile();
    }

    void readCSVFile()
    {
        String record = null;
        String currentWord = "";
        int recCount = 0;

        try
        {
            FileReader fileReader = new FileReader("myfile.txt");
            BufferedReader bufferedReader = new BufferedReader(fileReader);

            record = new String();
            while ((record = bufferedReader.readLine()) != null)
            {
                recCount++;
                String csvRow = record;
                String[] column = csvRow.split(",");
                boolean isValidData = false;
                for (int i = 0; i < column.length; i++)
                {
                    boolean isValidColumn = true;
                    // needed this for the following case: test1,test2,"test3,test,test,test",test4,test5
                    if (column[i].startsWith("\""))
                    {
                        isValidColumn = false;
                    }
                    
                    if (i == 4 && isValidColumn)
                    {
                        // If true don't need to bother with the else below.  Already have the data needed for the row.
                        isValidData = true;
                        String cellData = column[i];
                        if (cellData.length() > currentWord.length())
                        {
                            currentWord = cellData;
                        }
                    }
                    else
                    {
                        // needed this for the following case: test1,test2,"test3,test,test,test",test4,test5
                        if (column[i].endsWith("\"") && !isValidData)
                        {
                            String cellData = column[i + 1];
                            if (cellData.length() > currentWord.length())
                            {
                                currentWord = cellData;
                            }
                        }
                    }
                }
            }
            System.out.println(currentWord);
        }
        catch (IOException e)
        {
            System.out.println("error");
            e.printStackTrace();
        }
    }
}

iamthwee · Answer 4 · 2006-05-24T00:52:41+00:00

iamthwee

18 Years Ago

So does it do what you want?

Sailor_Jerry 3 Junior Poster in Training · Answer 5 · 2006-05-24T01:04:19+00:00

Yep, it did the job.

I would have liked the program to loop through all the files, so that i didn't have to update the name of the file i was reading each time.

Also i could have used the split method to find quotes first, then the comma.
I think the condition checks could be simpler if i did it this way.

iamthwee · Answer 6 · 2006-05-24T01:08:51+00:00

I would have liked the program to loop through all the files, so that i didn't have to update the name of the file i was reading each time.

Have you had a look at this