Sorry if the question is naive, I am new at this.

I am trying to transform an xml document using xslt, with saxon. When I try to run my code I get this error:

SERE0014: Illegal HTML character: decimal 153

After searching for a bit I discovered that the error is due to the HTMLEmitter class of net.sf.saxon.event. There is a method writeEscape that handles escape characters that includes this code:

else if (c >= 127 && c < 160) {
                       // these control characters are illegal in HTML
                       DynamicError err = new DynamicError(
                        "Illegal HTML character: decimal " + (int) c);
                        err.setErrorCode("SERE0014");
                        throw err;

It seems that in the XML document I am trying to transform, there are some caracters that fall into this category and that is why I get this error. My question is, how can I correct this problem? Is there a way to either find these characters and change them manually, or to add something to my code that deals with the problem?

Of course, it could be that the error is due to something else, so I am posting my code just in case. (I have tried the code with similar documents and I don't get an error. Of course they do not contain all the information that I need).

public class XSLTTransformer {
    // Global value so it can be ref'd by the tree-adapter
    static Document document;
    static String xslt = "data/output.xsl"; 
    static String xmldoc = "data/result.xml"; 

    public static void main(String[] argv) {

        try {
            // Use a Transformer for output
            TransformerFactory tFactory = TransformerFactory.newInstance();
            StreamSource stylesource = new StreamSource(xslt);
            Transformer transformer = tFactory.newTransformer(stylesource);

            StreamSource source = new StreamSource(xmldoc);
            StreamResult result = new StreamResult(System.out);
            transformer.transform(source, result);
        } catch (TransformerConfigurationException tce) {
            // Error generated by the parser
            System.out.println("\n** Transformer Factory error");
            System.out.println("   " + tce.getMessage());

            // Use the contained exception, if any
            Throwable x = tce;

            if (tce.getException() != null) {
                x = tce.getException();
            }

            x.printStackTrace();
        } catch (TransformerException te) {
            // Error generated by the parser
            System.out.println("\n** Transformation error");
            System.out.println("   " + te.getMessage());

            // Use the contained exception, if any
            Throwable x = te;

            if (te.getException() != null) {
                x = te.getException();
            }

            x.printStackTrace();
        } 
    } // main

O.K. Sorry, I have partially corrected the error. I added :

xpath-default-namespace="http://www.saxonica.com/ns/doc/functions

to my stylesheet and I stoped getting an error. Obviously the error has something to do with the saxonica namespace for x-path. The xpath expressions I am using do not conform to it. Come to think of it I got this error for practically every xpath expression I used in "for-each". I should have thought of it before. Sorry for bothering you.

Yes, it was indeed the characters. The original XML file had some characters that could not be changed to ISO-8859-1 format. The problem is solved now and my code works, but I had to manually delete these characters from the input xml document. I am sure that this is a common problem and that there are probably many ready solutions for it. Does anyone know of any solution that can be incorporated into the code? Thank you in advance.

These characters are indeed invalid in Latin-1 (ISO-8859-1). They are part of the Windows-1252 character set. If you can make the parser accept win-1252 characters, you could then output to UTF-8 instead of Latin-1, and that should work. You cannot simply transcode from HTML to win-1252 to UTF8, because the HTML to win1252 step would also convert other entities (like &lt; to "<"), and break your xml.

This article has been dead for over six months. Start a new discussion instead.