Hi,

I need some help extracting information from a XML file. So far I am able to extract data from elements that look like the following:

<element1>Element Information</element1>

<element2 attribute=”Element 2 Attribute Information” />

But I have come across an element that has got me stumped:

<element3><![CDATA[Information that I need to extract]]></ element3> 

So I need some help working out how to get the information between <![CDATA[ and ]]>. Either ignore it or be able it to be viewed as a sub-element of element3.

Here is the code I use to extract the information:

if (topElement.Element("element1") == null)
{
    l.element1Information = "";
}
else
{
    l.element1Information = topElement.Element("element1").Value;
}

Hi Abathurst,

The CData section is able to be referenced as a child node of element3. Specifically it is a XmlCDataSection.

using System;
using System.Xml;

namespace ExtractingCData
{
    class Program
    {
       static void Main(string[] args) {
           var xmlAsString = "<root><element3><![CDATA[Information that I need to extract]]></element3></root>";

           var xmlDoc = new XmlDocument();
           xmlDoc.LoadXml(xmlAsString);

           XmlNode element3 = xmlDoc.FirstChild.ChildNodes[0];

           XmlNode cDataAsNode = element3.ChildNodes[0];

           if (cDataAsNode is XmlCDataSection)
           {
               XmlCDataSection cData = cDataAsNode as XmlCDataSection;
               Console.WriteLine(cData.Value);
           }

           Console.ReadKey();
        }

    }
}

Thanks Pat, but I will be using this for multiple XML files and don't want to have to edit this line:

var xmlAsString = "<root><element3><![CDATA[Information that I need to extract]]></element3></root>";

everytime.

What exactly are you trying to do? Are all the XML files in the same format, are they completely different and you are trying to just get all the text?

The example I gave was just to show how to access the data, not really how it would be used in the program. If you provide a little more information about what you are trying to achieve or a sample xml file, I'd be happy to help further.

The XML files are the same format but some of the elements may be in one file but not another. So I have it checking to see if the element is present and if so, extract the information and enter it in the database otherwise enter nothing in the database

if (rental.Element("email") == null)
{
    l.Email = "";
}
else
{
    l.Email = rental.Element("email").Value;
}

So here is a sample of one of the XML files. Note: I have removed a lot of irrelevant data and left in two elements that I can retrieve the information from (email & telephone) and the two that are causing me grief (headline & description).

<?xml version="1.0" standalone="no" ?>
<!DOCTYPE propertyList SYSTEM "http://reaxml.realestate.com.au/propertyList.dtd">
<propertyList date="2012-04-20-13:36:01" username="Me" password="MyPass">
<rental modTime="2012-04-20-13:07" status="leased">

...

<telephone type="BH">01 2345 6789</telephone>
<email>aaron@domain.com.au</email>

...

<headline><![CDATA[PERFECT IN EVERY WAY]]></headline>
<description><![CDATA[Large 4 bedroom home situated in Ilfracombe, includes open plan living areas, very tidy,  modern kitchen and bathroom, large back deck, low maintenance established gardens all on dripper systems, large 2 bay shed, & shade house.]]></description>

...

</rental>
</propertyList>

If the nodes only will ever include cdata sections then you can just use XmlNode.InnerText

using System;
using System.Xml;

namespace ConsoleApplication
{
    class Program
    {
        static void Main(string[] args)
        {

            var xmlStringWithDescription = "<PropertyList><rental><telephone>01 2345 6789</telephone><email>a@b.c</email>" +
                "<headline><![CDATA[Perfect in every way]]></headline>" + 
                "<description><![CDATA[Large bla balb albaladkja;dfkj]]></description></rental></PropertyList>";

            var xmlStringWithoutDescription = "<PropertyList><rental><telephone>01 2345 6789</telephone><email>a@b.c</email>" + 
                "<headline><![CDATA[Perfect in every way]]></headline></rental></PropertyList>";

            var docWithDescription = new XmlDocument();
            var docWithoutDescription = new XmlDocument();

            docWithDescription.LoadXml(xmlStringWithDescription);
            docWithoutDescription.LoadXml(xmlStringWithoutDescription);

            var contactWithDescription =  RentalXmlParser.ParseRental(docWithDescription.SelectSingleNode("PropertyList/rental"));

            var contactWithoutDescription = RentalXmlParser.ParseRental(docWithoutDescription.SelectSingleNode("PropertyList/rental"));


            Console.WriteLine("Contact With:");
            Console.WriteLine("\tTelephone: " + contactWithDescription.Telephone);
            Console.WriteLine("\tEmail: " + contactWithDescription.Email);
            Console.WriteLine("\tHeadline: " + contactWithDescription.Headline);
            Console.WriteLine("\tDescription: " + contactWithDescription.Description);

            Console.WriteLine();
            Console.WriteLine("Contact Without:");
            Console.WriteLine("\tTelephone: " + contactWithoutDescription.Telephone);
            Console.WriteLine("\tEmail: " + contactWithoutDescription.Email);
            Console.WriteLine("\tHeadline: " + contactWithoutDescription.Headline);
            Console.WriteLine("\tDescription: " + contactWithoutDescription.Description);


            Console.ReadKey();
        }

    }

    class PropertyContact
    {
        public string Telephone { get; set; }
        public string Email { get; set; }
        public string Headline { get; set; }
        public string Description { get; set; }
    }

    class RentalXmlParser
    {
        public static PropertyContact ParseRental(XmlNode rental)
        {

            if (rental == null)
            {
                return null;
            }

            var telephoneNode = rental.SelectSingleNode("telephone");
            var emailNode = rental.SelectSingleNode("email");
            var headlineNode = rental.SelectSingleNode("headline");
            var descriptionNode = rental.SelectSingleNode("description");

            var contact = new PropertyContact()
            {
                Telephone = String.Empty,
                Email = String.Empty,
                Headline = String.Empty,
                Description = String.Empty
            };

            if (telephoneNode != null)
            {
                contact.Telephone = telephoneNode.InnerText;
            }

            if (emailNode != null)
            {
                contact.Email = emailNode.InnerText;
            }

            if (headlineNode != null)
            {
                contact.Headline = headlineNode.InnerText;
            }

            if (descriptionNode != null)
            {
                contact.Description = descriptionNode.InnerText;
            }

            return contact;
        }
    }
}

OK, so I used the information from you first post and come up with this:

    XmlNode Headline = doc.FirstNode.ChildNode[0];

    XmlNode cDataAsNode = Headline.ChildNodes[0];

    if (cDataAsNode is XmlCDataSection)
    {
        XmlCDataSection cData = cDataAsNode as XmlCDataSection;
        l.Headline = cData.Value;

    }

It is coming up with one error that I cannot seem to find an answer for.

'System.Xml.Linq.XNode' does not contain a definition for 'ChildNode' and no extension method 'ChildNode' accepting a first argument of type 'System.Xml.Linq.XNode' could be found (are you missing a using directive or an assembly reference?)

I believe I have all the correct references but am I missing something?

using System;
using System.Configuration;
using System.IO;
using System.ServiceProcess;
using System.Threading;
using System.Xml.Linq;
using System.Xml;
using DevExpress.Xpo;

Edited 4 Years Ago by abathurst

You just have a miss spelling which is a proper spelling in a namespace that you do not need.

Remove using System.Xml.Linq; Then, on line 1 above, you should see a red squiggly line.

doc.FirstNode.ChildNode chould be doc.FirstNode.ChildNodes.

This article has been dead for over six months. Start a new discussion instead.