First let me confess that I'm a beginner with little knowledge of Java. I'm facing a problem with XML to JSON conversion. The XML that I need to covert is complex with elements that will repeat throughout the document. And the order in which the elements appear need to be preserved too. Sample XML :

<?xml version="1.0" encoding="utf-8" ?>

<books>

    <book>

        <chapter>

            <number>i</number>
            
<title>Introduction</title>

        </chapter>
        <volume>
<number>1</number>

            <title>this title contains <italic>markup</italic> tags</title>
            <part>

                
<title>part may contain sections and nested parts</title>

                <section>
                    
<title>contains chapters</title>

                    <chapter>

                        <number>1.1.1</number>
                        
<title>first chapter</title>

                    </chapter>
                    <chapter>

                        <number>1.1.2</number>
                        
<title>second chapter</title>

                    </chapter>
                </section>
            </part>
            <chapter>

                <number>x</number>
                
<title>references for volume one</title>

            </chapter>
        </volume>

        <chapter>

            <number>xii</number>
            
<title>Acknowledgements</title>

        </chapter>
    </book>
</books>

I'm trying to build a Table of contents kind of page and I need this info in JSON format. Since the order is important here I decided that the structure of the resulting JSON should be such that all elements are transformed to similar structure in JSON.

Expected JSON:

{
  "Book": [
    {
      "Type": "chapter",
      "Title": "Introduction",
      "Number": "i",
      "List": ""
    },
    {
      "Type": "volume",
      "Title": "this title contains <![CDATA[<span style='italic'>markup</span>]]> tags",
      "Number": "",
      "List": [
        {
          "Type": "part",
          "Title": "part may contain sections and nested parts",
          "Number":"",
          "List": [
            {
              "Type":"section",
              "Title":"contains chapters",
              "Number":"",
              "List":[
                {
                  "Type":"chapter",
                  "Title":"first chapter",
                  "Number":"1.1.1",
                  "List":""
                }
              ]
            }
            ]
        },
        {
          "Type":"chapter",
          "Title":"last chapter in volume",
          "Number":"x",
          "List":""
        }
      ]
    },
    {
      "Type": "chapter",
      "Title": "last chapter in book",
      "Number": "xii",
      "List": ""
    }
  ]
}    

Two more problems that I'm facing are the data type conversion that occurs when using most JSON libraries like 'org.json'. And the last requirement is that if there are markup tags like '' inside elements then it should be wrapped within CDATA.

I tried the usual ways like org.json XML.toJSONObject() which (as expected) didn't do the trick. Next I tried converting XML to POJO and converting POJO to XML using JAXB. But this also proved ineffective as the structure was lost when the XML is unmarshalled into object. I also tried Staxon but the resulting JSON, even though it retained the original structure, wasn't valid.

Is there a way except using XSLT, to do this conversion?E.g When using JAXB to convert to POJO, is it possible that the use of LinkedHashMaps when unmarshalling will maintain the structure?

Is DOM/SAX the correct way to do this complex transformation? I really don't want to use XSLT to convert XML to JSON as I've faced many issues with this.

Any suggestions welcome. Thanks and sorry for the (very)long post.

Edited 10 Months Ago by newprogrammer14

This article has been dead for over six months. Start a new discussion instead.