Thank you for reading,

I am currently trying to pass through a project or whole sites containing many pages to determine wether they are xhtml-strict or not with a local xhtml-strict.dtd file. I cannot go through a webservice it would be too overkilling for server and host might even probably block my requests. Here is how I currently do it.

public void Execute(string xml, string dtd) {

            Errors = new List<string>();

            XmlReaderSettings settings = new XmlReaderSettings();
            settings.ProhibitDtd = false;
            settings.ValidationType = ValidationType.DTD;
            settings.ValidationEventHandler += new ValidationEventHandler( ValidationEventHandler );

            // Create a local reference for validation
            string newDoctype = string.Format("<!DOCTYPE html SYSTEM \"file://{0}\">", dtd);

            // Replace the current doctype with the new local one
            xml = Regex.Replace(xml, "<!DOCTYPE.*?>", newDoctype, RegexOptions.Singleline | RegexOptions.IgnoreCase);

             XmlReader reader = XmlReader.Create( new System.IO.StringReader( xml ) , settings );

              try {

                while (reader.Read()) {
                }

            } catch (Exception ex) {
                Errors.Add(ex.Message);
            } finally {
                if (reader != null) reader.Close();
            }
        }




        void ValidationEventHandler(object sender, ValidationEventArgs e) {
            Errors.Add(string.Format("({0}) {1} - [Line: {2}, Char: {3}]", e.Severity, e.Message, e.Exception.LineNumber, e.Exception.LinePosition));
        }

Now this looks fine, as soon as an xhtml error is found an event is raised containing the error as the ValidationEventArgs and I'm then capturing it into the error list.

However, the weakness of this is all the .NET Xml/X objects are crashing some structural severe error regarding Xml, so in this case, as I am reading the Xml (that is currently html and can be really poorly coded and not xhtml-strict compliant at all), it can catch severe errors, such as unallowed xml characters, missing tags,etc. Which are invalid in xhtml and that I need to catch for my report,however, I cannot keep on reading because of this error because the XmlReader doesn't support them at all. And I cannot do any things such as skip node, element, because it's not recognizing the xml structure since it is really screwed up.

Do anybody know a Library that I could bring into my solution to validate xhtml-strict.

Any kind of help/ideas will be really appreciated.

Thank you!

Recommended Answers

All 2 Replies

SOAP 1.2 - W3C Markup validation service.

Thank you for reading,

I am currently trying to pass through a project or whole sites containing many pages to determine wether they are xhtml-strict or not with a local xhtml-strict.dtd file. I cannot go through a webservice it would be too overkilling for server and host might even probably block my requests. Here is how I currently do it.

public void Execute(string xml, string dtd) {

            Errors = new List<string>();

            XmlReaderSettings settings = new XmlReaderSettings();
            settings.ProhibitDtd = false;
            settings.ValidationType = ValidationType.DTD;
            settings.ValidationEventHandler += new ValidationEventHandler( ValidationEventHandler );

            // Create a local reference for validation
            string newDoctype = string.Format("<!DOCTYPE html SYSTEM \"file://{0}\">", dtd);

            // Replace the current doctype with the new local one
            xml = Regex.Replace(xml, "<!DOCTYPE.*?>", newDoctype, RegexOptions.Singleline | RegexOptions.IgnoreCase);

             XmlReader reader = XmlReader.Create( new System.IO.StringReader( xml ) , settings );

              try {

                while (reader.Read()) {
                }

            } catch (Exception ex) {
                Errors.Add(ex.Message);
            } finally {
                if (reader != null) reader.Close();
            }
        }




        void ValidationEventHandler(object sender, ValidationEventArgs e) {
            Errors.Add(string.Format("({0}) {1} - [Line: {2}, Char: {3}]", e.Severity, e.Message, e.Exception.LineNumber, e.Exception.LinePosition));
        }

Now this looks fine, as soon as an xhtml error is found an event is raised containing the error as the ValidationEventArgs and I'm then capturing it into the error list.

However, the weakness of this is all the .NET Xml/X objects are crashing some structural severe error regarding Xml, so in this case, as I am reading the Xml (that is currently html and can be really poorly coded and not xhtml-strict compliant at all), it can catch severe errors, such as unallowed xml characters, missing tags,etc. Which are invalid in xhtml and that I need to catch for my report,however, I cannot keep on reading because of this error because the XmlReader doesn't support them at all. And I cannot do any things such as skip node, element, because it's not recognizing the xml structure since it is really screwed up.

Do anybody know a Library that I could bring into my solution to validate xhtml-strict.

Any kind of help/ideas will be really appreciated.

Thank you!

SOAP 1.2 - W3C Markup validation service.

Im kinda scared to overload the server with my requests or that the server refuse my request. It'll be a huge amount of pages. (over 80000)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.