What I'm trying to do is parse a CSV file that has column headers, and map those headers - ideally to a class.

The tricky part is that I need to be able to support various mappings because it will be interacting with different systems that give the headers different names. Additionally, being able to combine fields would be extremely helpful.

Fortunately, the CSV will always be correctly formatted with all fields encapsulated in double quotes.

Example

system1Data.csv

"Address", "City", "State", "Zip"
"001 Test Rd", "New York", "New York", "10001"

system2Data.csv

"Street Number", "StreetName", "StreetSuffix" "City", "State", "Zip-5"
"001", "Test", "Rd", "New York", "New York", "10001"

myClass.vb

Public Class HouseInfo
    Private address as String
    Private city As String
    Private state As String
    Private zipcode As String
End Class

I need to some way of mapping the header fields to properties in a class.


Below is simply a LONGsummary of the solutions that are on my mind; possibly worth looking into, etc. Only read below if you've got time to kill XD

  • Perhaps XML/XSLT would work? Im not quite sure how, unfortunately my knowledge of XML/XSLT is very limited.
    The XML would simply map the fields. Not sure how to implement it, but what I had in mind was something like the following.
    <field>
    	<property>Address</property>
    	<header>[Street Number] [Street Name] [Street Suffix]</header>
    </field>
    <field>
    	<property>City</property>
    	<header>[City]</header>
    </field>
    etc


  • Another notion I have is to use a DataSet and a DataAdapter that defines the schema.

    Similar to the XML solution, I have very little knowledge of how to go about this solution, if it is even feasible. I've looked up numerous examples CSV + DataSet and haven't been able to find any that make use of the DataAdapter outside of Database oriented projects.

  • EditI tried using FileHelpers 2.0 a bit
    It has a fairly nice and elegant implementation - creating a class and casting the FileHelper engine to that class so that the data becomes directly accessible via class properties. (IE: HouseInfo.Address)

    Couldn't quite get the mapping that I am looking for though. Unless if Ive overlooked something, FileHelpers simply maps Column Indexs with Class Properties one after the other. Since it doesn't make use of the files headers, if the columns are moved around, the result is undesirable.

The example data/class is a *highly* simplified example. The real world application will be using over 100 fields across over a dozen different systems. That is why having an isolated Mapping / Schema solution would be ideal.

I've been racking my brain over how to do this for a few days now. I've come up with a few other solutions, but none of them are all too elegant or efficient.

Recommended Answers

All 8 Replies

See if this helps to get individual values from a CSV file.

Public Class Form1
    Private myFile As String = "C:\test.csv" '// your file.
    Private arlCSVlines As New ArrayList '// similar to a ListBox.
    Private arTemp() As String = Nothing '// String Array to .Split each line.

    Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
        If IO.File.Exists(myFile) Then
            For Each fileLine As String In IO.File.ReadAllLines(myFile)
                arlCSVlines.Add(fileLine) '// add all lines to the ArrayList.
            Next
        End If
        '// if you do not need the header line, remove it.
        arlCSVlines.RemoveAt(0) '// remove first item from ArrayList.
        '// .Replace all the double quotes and .Split the line into arrays by the comma.
        arTemp = arlCSVlines(0).ToString.Replace("""", "").Split(","c) '// line 1 from file.
        '// display results for line 1.
        MsgBox(arTemp(0) & vbNewLine & arTemp(1) & vbNewLine & arTemp(2) & vbNewLine & arTemp(3) & vbNewLine & arTemp(4) & vbNewLine & arTemp(5))
        arTemp = arlCSVlines(1).ToString.Replace("""", "").Split(","c) '// line 2 from file.
        '// display results for line 2.
        MsgBox(arTemp(0) & vbNewLine & arTemp(1) & vbNewLine & arTemp(2) & vbNewLine & arTemp(3) & vbNewLine & arTemp(4) & vbNewLine & arTemp(5))
        End '// exit while testing.
    End Sub
End Class

To mapping fields like "Street Number", "StreetName", "StreetSuffix" to one field "address", you may need to define a dictionary. When you read a file, first thing is to check the dictionary and find which fields to combine. In the application, you may want a form to help the input to the dictionary.

I could not figure out how to do this automatically. If you could be able to know all the fields, that information would be used to biuld the dictionary.

To mapping fields like "Street Number", "StreetName", "StreetSuffix" to one field "address", you may need to define a dictionary. When you read a file, first thing is to check the dictionary and find which fields to combine. In the application, you may want a form to help the input to the dictionary.

I could not figure out how to do this automatically. If you could be able to know all the fields, that information would be used to biuld the dictionary.

The Mapping doesn't need to be automated. For the program, all the fields will be known, all the column headers will be known, and I do expect to write out a map for each "System".

The reason why I dont want to / cant simply use column indexes is because as the program progresses, the fields used will change - in order and in which fields are utilized. Thus writing purely based on the index would require an entirely new map from scratch for every minor change.

Not really sure how to go about implementing a dictionary. Though both Dictionary and NameValueCollections look like they could be used to accomplish the mapping, somehow.

See if this helps to get individual values from a CSV file.

Hey codeorder, thank you but I think you mis-understood :).

I need to map the values, not parse the CSV. For parsing the CSV I will likely use FileHelpers or CsvReader as they will be able to handle all the edge cases (ie: commas within quoted fields, multi line fields, etc).

What I need to be able to Map the data based on the Headers in the CSV file.

Ex:

"House Number", "Street Name", "Street Suffix", "City", "State"

HouseInfo.Address = [House Number] & [Street Name] & [Street Suffix]
HouseInfo.City = [City]
HouseInfo.State = [State]

Member Avatar for Unhnd_Exception

This will map anything.

No error checking. Only rough draft concept.

Public Class Form1

    Private Sub ButtonMap_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles ButtonMap.Click

        Dim HouseInfo As New HouseInfo

        'Add the fields in any order.
        HouseInfo.AddField("Street Suffix", "St.")
        HouseInfo.AddField("House Number", "1234")
        HouseInfo.AddField("Street Name", "Sesame")

        'Returns 1234 Sesame St.
        Dim Address As String = HouseInfo.Address

        HouseInfo = New HouseInfo

        'Add address field only 
        HouseInfo.AddField("Address", "1234 Sesame St.")

        'Returns 1234 Sesame St.
        Address = HouseInfo.Address

    End Sub

    Public Class HouseInfo

        Private fpAddress As String()
        Private fpFieldMap As Dictionary(Of String, ArrayAndIndex)

        Public ReadOnly Property Address() As String
            Get
                'Takes the array and combines it with
                'a space and trims it.
                'This could consist of only the first
                'index having a value: if the address
                'field was used. Or it could have a 
                'value at position 1,2,3 if the
                'number name and suffix where used.
                'Either way its going to return the
                'address in the correct form.
                Return String.Join(" ", fpAddress).Trim
            End Get
        End Property

        Sub New()

            'The address could be made up of a single address,
            'or by number, name , and suffix so it needs four
            'positions.
            ReDim fpAddress(3)

            'Add all the fields names, which array holds the name
            'and at which position the value should be stored.
            fpFieldMap = New Dictionary(Of String, ArrayAndIndex)
            fpFieldMap.Add("Address", New ArrayAndIndex(fpAddress, 0))
            fpFieldMap.Add("House Number", New ArrayAndIndex(fpAddress, 1))
            fpFieldMap.Add("Street Name", New ArrayAndIndex(fpAddress, 2))
            fpFieldMap.Add("Street Suffix", New ArrayAndIndex(fpAddress, 3))

        End Sub

        Public Sub AddField(ByVal fieldName As String, ByVal fieldValue As String)

            'The magic mapper.

            'Since the ArrayAndIndex class knows the array and index to store the value
            'all that needs to be done is pass in the value of the field and
            'it will be stored in the right array at the right index.

            If fpFieldMap.ContainsKey(fieldName) Then
                CType(fpFieldMap.Item(fieldName), ArrayAndIndex).SetValue(fieldValue)
            End If

        End Sub

    End Class

    Public Class ArrayAndIndex
        'This class simply keeps a reference to the
        'array and the index to where the value should be set.
        Private fpStringArray As String()
        Private fpIndex As Integer

        Sub New(ByVal stringArray As String(), ByVal index As Integer)
            fpStringArray = stringArray
            fpIndex = index
        End Sub

        Public Sub SetValue(ByVal value As String)
            fpStringArray(fpIndex) = value
        End Sub

    End Class

End Class
Member Avatar for Unhnd_Exception

Actually in the sub new the fpAddress would only need 3 positions.

Sub New()
  
  ReDim fpAddress(2)
    
  fpFieldMap = New Dictionary(Of String, ArrayAndIndex)            
  fpFieldMap.Add("Address", New ArrayAndIndex(fpAddress, 0))            
  fpFieldMap.Add("House Number", New ArrayAndIndex(fpAddress, 0)) 
  fpFieldMap.Add("Street Name", New ArrayAndIndex(fpAddress, 1))            
  fpFieldMap.Add("Street Suffix", New ArrayAndIndex(fpAddress, 2))      
   
End Sub

Unhnd_Exception... I wanna give you a giant e-hug right now! Haha.

I think with a little tweaking, it should be able to accomplish exactly what I want. Thank you so much. :):)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.