0

I've seen a lot of posts over time relating to importing and exporting CSV. Most of the answers involve using TextReader.readline and String.split, which will not work with any but the simplest data. When you are writing for users who have the expectation that your application will work with anything they export from Excel, you will have to cover all the CSV bases.

I hope that this post will give you a clear idea of what you're involving yourself in when you decide you need to read a CSV file.

The characteristics of CSV data are:

  1. The comma character is used to separate the fields. (This is obvious, but I thought I'd start with a couple of obvious points to warm up)
  2. The last field on the line is ended by the end-of-line character or the physical end of the file.
  3. Any field that contains commas, double-quotes or control characters will be surrounded by double-quote marks.
  4. Any double-quote marks within the field will be "escaped" with an extra double-quote character. For instance, the value Paul "Chuck" Norris would be converted to "Paul ""Chuck"" Norris".
  5. After the opening double-quote ANY character is valid. Including line breaks. This is why you can't use readline.

I have a parser class that works nicely and I'll post that on this thread in a little while, along with some notes on using it.

Cheers all!

1
Contributor
1
Reply
3
Views
6 Years
Discussion Span
Last Post by Paul Norris
0

My answer to the CSV import problem is the CsvParser class. The code will follow shortly, but using the class is illustrated with a little example. This code takes a CSV input file, which we will assume has a header row, with some specific header values assumed to be present. The data is assumed to follow in subsequent rows and the values are assumed to be in the columns corresponding to the headers.

Dim parser As New CsvParser(new IO.StreamReader(txtFilePath.Text))
dim hdr as List(Of String) = parser.parseToList
dim idxName as Integer = hdr.IndexOf("Name")
dim idxEmail as Integer = hdr.IndexOf("Email")
While Not parser.eof
  dim value as List(Of String) = parser.parseToList
  dim name as String = value(idxName)
  dim email as String = value(idxEmail)
  ' Do something with the values...
End While
parser.close()

And that's it. The class itself follows:

Public Class CsvParser

        Private Const CR As Integer = Asc(vbCr)
        Private Const LF As Integer = Asc(vbLf)

        Private src As TextReader
        Private ch As Integer

        Public Sub New(ByVal reader As TextReader)
            src = reader
            advance()
        End Sub


        Public Sub close()
            src.Close()
        End Sub


        Protected Sub advance()
            ch = src.Read
        End Sub


        Public ReadOnly Property eof() As Boolean
            Get
                Return (ch < 0)
            End Get
        End Property


        Public ReadOnly Property eol() As Boolean
            Get
                Return (ch = CR) OrElse (ch = LF) OrElse eof
            End Get
        End Property


        Public Function parseToList() As List(Of String)
            Dim fld As New List(Of String)
            While Not eol
                fld.Add(nextField)
            End While
            If Not eof Then
                Dim nextCh As Integer = src.Read
                If ch = CR And nextCh = LF Then
                    advance()
                Else
                    ch = nextCh
                End If
            End If
            Return fld
        End Function


        Protected Function nextField() As String
            Dim firstCh As Char = Chr(ch)
            Dim fld As String
            Select Case firstCh
                Case "0"c To "9"c
                    fld = numericField()
                Case """"
                    fld = quotedField()
                Case Else
                    fld = unquotedField()
            End Select
            Return fld
        End Function


        Protected Function numericField() As String
            Dim fld As String = ""
            Dim theCh As Char = Chr(ch)
            While Not eol AndAlso theCh <> ","c
                fld += theCh
                advance()
                theCh = Chr(ch)
            End While
            If theCh = ","c Then advance()
            Return fld
        End Function


        Protected Function unquotedField() As String
            Return numericField()
        End Function


        Protected Function quotedField() As String
            Dim fld = ""
            Dim stillGoing As Boolean = True
            advance() ' Skip the opening quote
            Dim thisCh As Char = Chr(ch)
            Dim follCh As Char = Chr(src.Peek)
            While stillGoing
                If thisCh = """"c And follCh = """"c Then ' escaped double-quote
                    fld += """"
                    advance() ' Skip the extra quote mark
                ElseIf thisCh = """"c Then
                    stillGoing = False
                Else
                    fld += thisCh
                End If
                advance()
                thisCh = Chr(ch)
                follCh = Chr(src.Peek)
            End While
            If thisCh = ","c Then advance() ' Skip the delimiter
            Return fld
        End Function

    End Class

Edited by Paul Norris: Activated Post button inadvertently

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.