I've seen a lot of posts over time relating to importing and exporting CSV. Most of the answers involve using TextReader.readline and String.split, which will not work with any but the simplest data. When you are writing for users who have the expectation that your application will work with anything they export from Excel, you will have to cover all the CSV bases.

I hope that this post will give you a clear idea of what you're involving yourself in when you decide you need to read a CSV file.

The characteristics of CSV data are:

  1. The comma character is used to separate the fields. (This is obvious, but I thought I'd start with a couple of obvious points to warm up)
  2. The last field on the line is ended by the end-of-line character or the physical end of the file.
  3. Any field that contains commas, double-quotes or control characters will be surrounded by double-quote marks.
  4. Any double-quote marks within the field will be "escaped" with an extra double-quote character. For instance, the value Paul "Chuck" Norris would be converted to "Paul ""Chuck"" Norris".
  5. After the opening double-quote ANY character is valid. Including line breaks. This is why you can't use readline.

I have a parser class that works nicely and I'll post that on this thread in a little while, along with some notes on using it.

Cheers all!

5 Years
Discussion Span
Last Post by Paul Norris

My answer to the CSV import problem is the CsvParser class. The code will follow shortly, but using the class is illustrated with a little example. This code takes a CSV input file, which we will assume has a header row, with some specific header values assumed to be present. The data is assumed to follow in subsequent rows and the values are assumed to be in the columns corresponding to the headers.

Dim parser As New CsvParser(new IO.StreamReader(txtFilePath.Text))
dim hdr as List(Of String) = parser.parseToList
dim idxName as Integer = hdr.IndexOf("Name")
dim idxEmail as Integer = hdr.IndexOf("Email")
While Not parser.eof
  dim value as List(Of String) = parser.parseToList
  dim name as String = value(idxName)
  dim email as String = value(idxEmail)
  ' Do something with the values...
End While

And that's it. The class itself follows:

Public Class CsvParser

        Private Const CR As Integer = Asc(vbCr)
        Private Const LF As Integer = Asc(vbLf)

        Private src As TextReader
        Private ch As Integer

        Public Sub New(ByVal reader As TextReader)
            src = reader
        End Sub

        Public Sub close()
        End Sub

        Protected Sub advance()
            ch = src.Read
        End Sub

        Public ReadOnly Property eof() As Boolean
                Return (ch < 0)
            End Get
        End Property

        Public ReadOnly Property eol() As Boolean
                Return (ch = CR) OrElse (ch = LF) OrElse eof
            End Get
        End Property

        Public Function parseToList() As List(Of String)
            Dim fld As New List(Of String)
            While Not eol
            End While
            If Not eof Then
                Dim nextCh As Integer = src.Read
                If ch = CR And nextCh = LF Then
                    ch = nextCh
                End If
            End If
            Return fld
        End Function

        Protected Function nextField() As String
            Dim firstCh As Char = Chr(ch)
            Dim fld As String
            Select Case firstCh
                Case "0"c To "9"c
                    fld = numericField()
                Case """"
                    fld = quotedField()
                Case Else
                    fld = unquotedField()
            End Select
            Return fld
        End Function

        Protected Function numericField() As String
            Dim fld As String = ""
            Dim theCh As Char = Chr(ch)
            While Not eol AndAlso theCh <> ","c
                fld += theCh
                theCh = Chr(ch)
            End While
            If theCh = ","c Then advance()
            Return fld
        End Function

        Protected Function unquotedField() As String
            Return numericField()
        End Function

        Protected Function quotedField() As String
            Dim fld = ""
            Dim stillGoing As Boolean = True
            advance() ' Skip the opening quote
            Dim thisCh As Char = Chr(ch)
            Dim follCh As Char = Chr(src.Peek)
            While stillGoing
                If thisCh = """"c And follCh = """"c Then ' escaped double-quote
                    fld += """"
                    advance() ' Skip the extra quote mark
                ElseIf thisCh = """"c Then
                    stillGoing = False
                    fld += thisCh
                End If
                thisCh = Chr(ch)
                follCh = Chr(src.Peek)
            End While
            If thisCh = ","c Then advance() ' Skip the delimiter
            Return fld
        End Function

    End Class

Edited by Paul Norris: Activated Post button inadvertently

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.