Hi,

I have an app generating xml output using xmlwriter.
It is loooooong xml, hundreds of thousand of items,
is not possible to keep it all in memory, and always few
errors occurs.

Is there a way how to cancel part of already writen
xml within try-catch section and move to next item?

Thanks !

Recommended Answers

All 4 Replies

I am not sure if I got your question.
1. Error is in the way XML is written i.e. the generated XML is invalid?
2. XML is valid but the values stored on XML nodes is invalid?

for 1. AFAIK, XML classes wont be of much help but to help u discard the whole file using XML validator. One way that I can think of is to write your own parser and handle the file like a simple text file rather than as XML.
2. That's quite an easy thing after all its your data, you can handle it whatever way you want.

Well XML is built on the fly from potentially invalid data. In most of cases data pass ok, but sometimes an error occurs while making new instance of the data class (an embedded hierarchical structure). XML stream is written continuouslyso I was looking for some checkpoint/rollback mechanism.

It can be solved with another xmlwriter and merging outputs, but it is a bit messy.

Since it is a business specific scenario, I guess, I can offer little help :P

Though good to know of it. I will be interested in know the scenario in-case you can share some detail :P

One thing I can add though is that you can use HttpUtility.HtmlEncode to encode data before added as part of XML and later on decoded using HttpUtility.HtmlDecode. If your XML generation is valid, this shouldn't allow the data to damage it.

This is a hard to describe project, plenty of semantic checks. XML result is expected to be more than 100MB so I have to write it into stream, sometimes is there necessary to have a look back to do some checks and decide if to stop publishing certain part of it. It looks like this:

VVVVVVVVVVVVVVVVVVVpppa???????????????????????????????<-stream producer

V=valid data
p = already written data with some invalid nodes
a = recently processed item
??? = not processed input of unknown size

a and p can refer p and V but never ?

One way how to deal with it is to write everything and later erase some areas with " ", which is ugly and block possibility of further stream processing.

delayed writing schema based on two xml writers works well, but it is messy.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.