Hello,

I'm looking for a CSV parser and I found this example....

package com.ibm.ccd.api.samplecode.parser;


import java.io.BufferedReader;
import java.io.StringReader;
import java.util.ArrayList;

public class CSVParser 
    {

    String oneRes;
    private String line = "";
    private int nbLines = 0;
    private BufferedReader reader;

    public CSVParser(BufferedReader reader)
    {
        this.reader = reader;
    }

    public String[] splitLine() throws  Exception
    {
        nbLines = 0;
        ArrayList<String> al = new ArrayList<String>();
line = nextLine();
        if (line == null)
            return null;

        nbLines = 1;
        int pos = 0;

        while (pos < line.length())
        {
            pos = findNextComma(pos);
            al.add(oneRes);
            pos++;
        }

        if (line.length() > 0 && line.charAt(line.length() - 1) == ',')
        {
            al.add("");
        }

        return (String[])al.toArray(com.ibm.ccd.common.util.Const.JAVA_LANG_STRING_EMPTY_ARRAY);
    }

    private int findNextComma(int p) throws  Exception
    {
        char c;
        int i;
        oneRes = "";
        c = line.charAt(p);

        // empty field
        if (c == ',')
        {
            oneRes = "";
            return p;
        }

        // not escape char
        if (c != '"')
        {
            i = line.indexOf(',', p);
            if (i == -1)
                i = line.length();
            oneRes = line.substring(p, i);
            return i;
        }

        // start with "
        p++;

        StringBuffer sb = new StringBuffer(200);
        while (true)
        {
            c = readNextChar(p);
            p++;

            // not a "
            if (c != '"')
            {
                sb.append(c);
                continue;
            }

            // ", last char -> ok
            if (p == line.length())
            {
                oneRes = sb.toString();
                return p;
            }

            c = readNextChar(p);
            p++;

            // "" -> just print one
            if (c == '"')
            {
                sb.append('"');
                continue;
            }

            // ", -> return
            if (c == ',')
            {
                oneRes = sb.toString();
                return p - 1;
            }

            throw new Exception("Unexpected token found");
        }
    }

   private char readNextChar(int p) throws Exception
    {
        if (p == line.length())
        {
            String newLine = reader.readLine();
            if (newLine == null)
                throw new Exception("Error occured while parsing");
            line += "\n" + newLine;
            nbLines++;
        }
        return line.charAt(p);
    }

    public  String nextLine()
        throws Exception
    {
        do
        {
            line = reader.readLine();
            if (line == null)
                return null;
        }
        while (line.trim().equals(""));
        return line;
    }

    public static void main (String args[]) throws Exception
    {
        BufferedReader reader = null;
        try
        {
        String doc = "a,a  ab,c,d a\n" +
                     ",1 a\n" +
                     "1, \n" +
                     "a,\n" +
                     "1," +
                     "\"v \"\"a v\"";

            System.out.println("String to be parsed = " + doc);
            reader = new BufferedReader(new StringReader(doc));
            CSVParser parser = new CSVParser(reader);
            String[] res;

            ArrayList<String> tokens = new ArrayList<String>();

                     while ((res = parser.splitLine()) != null)
            {
                for (int i = 0; i < res.length; i++)
                {
                    System.out.println("Token Found ["+res[i]+"] \n");
                }
            }
        }
        catch(Exception e )
        {
            e.printStackTrace();
        }
        finally
        {
            reader.close();
        }

    }

}

It's relatively good,imo. It checks for multiple things. I just want to parse some stuff from a file. No output it needed for it as I'll do other things with the data. I'm not quite understanding the "tokenizing" here. It seems to be somewhat complicated to me. Is there an easier way to do this? I can't find any other examples right now. A short and simple code..more or less.

Edit:

Would this not do esentially the same thing?

http://code.google.com/p/jcsv/

Edited 4 Years Ago by XodoX

The way the code do the tokenizing is really bad. The code is a brute-force which is not too bad, but the way it is implemented is. I believe this code is from someone who is a student (and I hope so) who is quite new to Java.

When you parse a CSV data file, you should already know that there is a delimiter (and it can be anything from a comma, tab, semicolon, etc). In the code above, it expects a csv file to only use a comma as delimiter. There are many flaws in the code

1)If the delimiter is not a comma, it won't parse anything.
2)The flow of the program is not intuitive. The code mixes the use of global variables (inside the class, such as oneRes, line, nbLine, etc.) and other passing variables everywhere.
3)The way the code looks for each token (if I were to do it that way) is inefficient. The method should not be findNextComma() but rather is getNextToken() instead. The reason is that the position of a comma character is not important, so return a substring which is going to be the next token is more important.
4)Line 74, it limits the lenght of a token. Each token cannot be longer than 200 characters!
5)Line 77, this really throws me off. The method it is calling readNextChar() is sneaky! It attempts to concatenate new line from the buffer reader each time the method is being called! Why would one do that? Why not read a new line when it is needed. Also, if the position value passing in is greater than the length of the current line, what would happen? (of course, line.charAt(p) will throw an exception)
6)Line 158, it serves no purpose because the variable is never used anywhere.

Overall, you should not attempt to follow the code but you should learn how a CSV file format is. Then develop an idea of parsing the string with the format. A simple way to deal with parsing string is to use regular expression (regex). Using this will save you a lot of time and make your code a lot readable (and shorter).

This article has been dead for over six months. Start a new discussion instead.