anisha.silva 0 Posting Whiz in Training

Hi,

I am trying to read a html page and convert into xml and copy the content into a txt file in the local drive. The code below is to read the html page:

def cleaner = new HtmlCleaner()
    def node = cleaner.clean(address)

    // Convert from HTML to XML
    def props = cleaner.getProperties()
    def serializer = new SimpleXmlSerializer(props)
    def xml = serializer.getXmlAsString(node)

    // Parse the XML into a document we can work with
    return new XmlSlurper(false,false).parseText(xml)

and the below code is to write it to a local txt file:

static writeXml(page, fname) { 
    def d1= new File(base + '/' + fname).parentFile
    d1.mkdirs()
    def fw = new FileWriter(base + '/' + fname)
    groovy.xml.XmlUtil.serialize(page, fw)
    fw.close()
  }

the problem im haveing right now is, when it reads the &nbps tags in the html pages are converted to XML as ?. What I want to do is to replace the '&nbps' in to a empty string '' . How do i get access to the html tags and replace it.

Appreciate a reply
thanks in advance