Complete CSV Reader UserPageVisits:281 active 80 80 DaniWeb 561 60 2011-10-31T16:50:14+00:00

Complete CSV Reader

Tom Gunn

CSV is more than just comma delimited fields. There are quoting and white space rules too. This is a short function as an example of a complete way to read CSV records line by line.

About the Author
code snippet
#include <istream>
#include <string>
#include <vector>

/// <summary>loads a CSV record from the stream is</summary>
/// <remarks>
/// * leading and trailing white space is removed outside of 
//    quoted sections when trimWhiteSpace is true
/// * line breaks are preserved in quoted sections
/// * quote literals consist of two adjacent quote characters
/// * quote literals must be in quoted sections
/// </remarks>
/// <param name=is>input stream for CSV records</param>
/// <param name=trimWhiteSpace>trims white space on unquoted fields</param>
/// <param name=fieldDelim>field delimiter. defaults to ',' for CSV</param>
/// <param name=recordDelim>record delimiter. defaults to '\n' for CSV</param>
/// <param name=quote>delimiter for quoted fields. defaults to '"'</param>
/// <returns>a list of fields in the record</returns>
std::vector<std::string> CsvGetLine(std::istream& is, 
                                    bool trimWhiteSpace=true,
                                    const char fieldDelim=',',
                                    const char recordDelim='\n',
                                    const char quote='"')
    using namespace std;

    vector<string> record; // result record list. default empty
    string field;          // temporary field construction zone
    int start = -1,        // start of a quoted section for trimming
        end = -1;          // end of a quoted section for trimming
    char ch;

    while (is.get(ch))
        if (ch == fieldDelim || ch == recordDelim)
            // fieldDelim and recordDelim mark the end of a
            // field. save the field, reset for the next field,
            // and break if there are no more fields
            if (trimWhiteSpace)
                // trim all external white space
                // exclude chars between start and end
                const string wsList = " \t\n\f\v\r";
                int ePos, sPos;

                // order dependency: right trim before let trim
                // left trim will invalidate end's index value
                if ((ePos = field.find_last_not_of(wsList)) != string::npos)
                    // ePos+1 because find_last_not_of stops on white space
                    field.erase((end > ePos) ? end : ePos + 1);

                if ((sPos = field.find_first_not_of(wsList)) != string::npos)
                    field.erase(0, (start != -1 && start < sPos) ? start : sPos);

                // reset the quoted section
                start = end = -1;

            // save the new field and reset the temporary

            // exit case 1: !is, managed by loop condition
            // exit case 2: recordDelim, managed here
            if (ch == recordDelim) break;
        else if (ch == quote)
            // save the start of the quoted section
            start = field.length();

            while (is.get(ch))
                if (ch == '"')
                    // consecutive quotes are an escaped quote literal
                    // only applies in quoted fields
                    // 'a""b""c' becomes 'abc'
                    // 'a"""b"""c' becomes 'a"b"c'
                    // '"a""b""c"' becomes 'a"b"c'
                    if (is.peek() != '"')
                        // save the end of the quoted section
                        end = field.length();
                    else field.push_back(is.get());
                else field.push_back(ch);
        else field.push_back(ch);

    return record;

#if defined(TEST)
#include <iostream>
#include <sstream>
#include <string>
#include <vector>

using namespace std;

int main()
    string csv = 
        "a,\" a\"a,b\"b \"\n"
        "aa,b\"b,c\"c,   d   ,,ee,ff,  g\"g,h\"h  \n"
        "aa,  bb,cc  ,\"  dd\",\"ee  \",\"f,  g\n"
        ",  h\",\"i,\"\"j,k\"\",l\"\n";
    istringstream is(csv);

    while (true)
        typedef vector<string> rec_t;

        rec_t rec = CsvGetLine(is);

        if (rec.size() == 0) break;

        for (rec_t::iterator x = rec.begin(); x != rec.end(); ++x) 
            cout << '>' << *x << "<\n";

        cout << string(20, '*') << '\n';

When compiling it in Microsoft Visual Studio 2008,I found some
errors as below:"error LNK2019 and fatal error LNK1120"!

You need to #define TEST to run the code as is. main() is conditionally compiled because this is a library function.

This is a very useful snippet. Kudos.

This is quite useful code and good example of simple CSV parser. But I thing it contains one disability. If the last input line isn't terminated by right recordDelim character last returned record doesn't contain last value.

- bug: last term will NOT be read in eof situation
- dangerous: according specification the return value of std::istream::get is only for prototypes returning the number of characters read. Instead for detection of eof the istream::good() function should be used, at the beginning and after the get operation
- no error check on missing closing '"'
- some implementations might use for the variables "start" and "end" unsigned int
- instead of the "-1" string::npos should be used. Operations should not depend on string::npos having a specific value
- field.erase((end > ePos) ? end : ePos + 1); should also check for end != string::npos condition
- code not really elegant as there are several variable involved, having different states. Better to have in the routine only ONE variable with several states
- code probably much slower than straight c code using fast pointer operations. It is questionable if c++ is here really an advantage
- perhaps instead of char TCHAR should be used

All in all the code seems to me not recommendable for productive code on several platforms

Be a part of the DaniWeb community

We're a friendly, industry-focused community of 1.19 million developers, IT pros, digital marketers, and technology enthusiasts learning and sharing knowledge.