I receive large pre-formatted XML files. Each XML is formatted the same way - in that they all have the same Parent Nodes and Child Nodes. Each Parent node has at least one Attribute (sometimes 3 or 4) and some fields also have attributes. In my opinion, the formatting of these xmls files is poor and very hard to work with.

The Parent Nodes are Databases Tables, the Child Nodes are Fields within the tables and the Attributes are Index Fields in the Tables.

My job - is to import these xml files into one database (lets say MSAccess in this example). This database already has the Tables, fields etc...the goal is to import one or more xml files into these tables to create a Consolidation of data.

I am successfully doing this now using XPath to navigate through the xml files basically record by record (field by field) and creating INSERT strings which then are used to import into the database tables via OLEDBCommands. Using...

-XPath XML Document
-XPath Navigator
-XPath Node Iterator

As mentioned, I am successfully doing this now. However, the reason I'm writing this post is that I am not at all happy with the performance. I've googled many times and every thing I see are examples of importing xml files that are alot more clean to work with (e.g. no attributes). The Attributes in these xml files are the number one PAIN as I need to retrieve them and include them in the sql INSERTS as indexs on rows.

I'm looking to the VB.net experts here for Ideas on how I can import this data faster.
An example of a portion of the xmls is below. Thank you in advance.

<Elements> - section of XML file
<Element> - Database Table
Nodes under <Element> are fields in the Element Table in the Database.

<Elements>
<Element id="ce1" cc_id="cc1" st_id="st1">
<WbsNum>TOTAL CONTRACT</WbsNum>
<ElemDesc>X-GEN Phase II</ElemDesc>
<ElemLevel>1</ElemLevel>
<ElemHier>1</ElemHier>
<ElemSeq>0</ElemSeq>
<NumChild>8</NumChild>
<SortVal>1P000215</SortVal>
<LinkVal>1P000215</LinkVal>
<LongDesc>X-GEN Phase II</LongDesc>
<NAFlag>0</NAFlag>
<RiskLevel>L</RiskLevel>
<CtrID ctr_id="ctr1"/>
<UserDefValue col_id="col70">ENGR</UserDefValue>
</Element>
<Element id="ce2" cc_id="cc1" st_id="st1">
<WbsNum>1</WbsNum>
<ElemDesc>X-GEN SHIP BADE</ElemDesc>
<ElemLevel>2</ElemLevel>
<ElemHier>2</ElemHier>
<ParentID ce_id="ce1"/>
<ElemSeq>0</ElemSeq>
<NumChild>5</NumChild>
<SortVal>1</SortVal>
<LinkVal>1</LinkVal>
<LongDesc>X-GEN SHIP BADESx X-GEN</LongDesc>
<NAFlag>0</NAFlag>
<RiskLevel>L</RiskLevel>
<CtrID ctr_id="ctr1"/>
<UserDefValue col_id="col70">ENGR</UserDefValue>
</Element>
</Elements>

Recommended Answers

All 3 Replies

you can use DOMdocument to process XML in VB,

you use the method LoadXML(pathname) to load and there methods and properties to iterate through the XML file. A quick google search will show you how.

you can use DOMdocument to process XML in VB,

you use the method LoadXML(pathname) to load and there methods and properties to iterate through the XML file. A quick google search will show you how.

What's different than what I'm doing now? Still have to 1) load xml and 2) iterated through the xml. I use XPath Navigator to do this. Either XPath or DOMDocument still requires node node iteration.

These files are very large and loading the entire xml file into memory has created problems. LoadXML loads the entire xml file - this would at times fail with some of the files I need to read. I also have to ignore bad xml characters - as the creation of these files are not always xml compliant. I've researched, tried, researched tried and have come to use the following as my conclusion. It works - always - but I'm still looking for a faster way to 1) read/iterate 2) save to database - predefined tables/fields

Dim sr As New System.IO.StreamReader(myfile, System.Text.Encoding.GetEncoding(1250))
Dim myreader As New XmlTextReader(sr)
myreader.XmlResolver = Nothing
xpathDoc = New XPathDocument(myreader)
xmlNav = xpathDoc.CreateNavigator
xmINI = xmlNav.Select("NodeSection/NodeTable/*")
...iterate child nodes, attributes etc...create sql insert
...navigate to next Node Section (different DB Table)...repeat

Then just open the file as at text file and read it line by line processing as you go. There is a lot more coding involved by it is much much faster to execute.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.