Hi, I am new to this...so hope I can get some help from you all...

I get XML feeds as input and have to store this data on our server. I have worked with databases but new to XML. Can someone tell me how I can store and manage this data?

You can store the XML document in the file system (i.e. as a file on the server) and read the data from it by using standard file system functions. Most of the technologies (J2EE, .NET, PHP, CGI/Perl, etc) have in-built libraries to work with XML files.

I have data in database and a lot of data resides as XML files as well. I do not want to maintain different data stores having the data in database and XML data in files. Is there any way to manage both XML and relational data together in a single data store?

If you do not want to store the XML data as files and want to maintain the complete data in a database, you can store the xml document into the database as BLOB or CLOB type. This would store the XML file in the binary format. This approach is recommended only if you want to store or retrieve the XML file in its entirety. But retrieving a specific element/attribute from XML document would have performance constraints as the complete XML document has to be loaded to fetch even a single record.

I am working on an application for data storage which deals mostly with XML data. I am storing the XML data as CLOB data type. Is there any other way to work with XML data?

Storing the XML as CLOB data type has its drawbacks. Searching and retrieving a subset of the document would be slow. The XML document needs to be parsed for each query to determine which portions of the document satisfy the specified search criteria. Furthermore, updating large objects would also take time. Your applications must provide the entire XML document to the DBMS for update, even if it only changed a small portion of it. This can result in unacceptably high processing costs, particularly if the XML document is very large

Alternatively, you can split the elements and attributes of the XML document to relational columns in one or more tables. Storing the data like this would have several advantages. You can directly work with selected portions of the “converted” XML data. You can add index to columns to make data retrieval more efficient and fast. You can make use of SQL queries to retrieve the required piece of data.

But decomposing the XML would be difficult (some time impossible) if XML documents contain heavily nested parent/child relationships and irregular structures. Shredding such documents may also require a large number of tables. Querying a “shredded” document would require complex SQL statements and it may include many joins. It would take time to write these queries. It will give runtime productivity and performance problems. Moreover, it is very difficult , if not impossible to recover (compose ) the XML document from these relational tables

You can use pureXML (DB2 from IBM) to store the XML data in the database. It stores the XML document natively in the database which means XML is not stored as CLOB or shredded into tables. XML remains XML even inside the database. The data can be queried using SQL or XQuery. Searching and retrieving subsets of XML would be very fast as XML is not parsed during querying.

I do not want to store only XML documents. I also have other data types in my application

DB2 is not a XML-only database. DB2 provides integration of relational data and XML data within a single database. Tables created with XML data types may also contain columns with “traditional” SQL data types, including numeric data, character strings, date/time data, and others.

I have not worked with pureXML. How difficult it is to query the XML data from database?

pureXML has new indexing technology; Indexes can be created for specific elements or attributes of the XML documents to speedup the data retrieval. It helps those queries which require specific portions of XML documents. Full text search over XML documents is also supported, and specialized text indexes can be defined to improve performance of such searches

pureXML stores XML data in database retaining the XML format. Is it possible to index the columns? This could give us better query performance. :-/

Yes. pureXML has an indexing feature that can index over XML columns. With this indexing feature, retrieving XML node sets is much faster compared to retrieving nodes from XML documents stored in CLOB.

If you create a table which has an integer column and an XML column, the integer column (relational data) is stored in same table , but DB2 does not store the XML document in same table. Instead, the XML column contains an XML data descriptor that has the document ID and version ID for the XML document. The document is stored in the XML Data Area (XDA), which is separate from the base table.

If a table has both normal data and XML data, can indexing still be done?:?:

There are 3 types of XML Indexes.

XML regions index is one such index which stores the locations of each XML document that is stored in XML storage area. There is one region index created by DB2 for each table having a XML column. This region index is automatically created and maintained by DB2.

The other 2 indexes are XML column path index and Index on an XML column.

XML column path index - This index is system-generated for each XML column created or added to the table. For example, if a table with two XML columns is created, there is one XML regions index, but two XML column path indexes generated by DB2 9.

Index on an XML column (XML index) - This index is created over an XML column. You can index every XML path in an XML column using XPath. Like other relational indexes, the XML index is created as a B-tree structure and stored in the same place as the relational indexes are stored. Multiple XML indexes can be defined in one XML column. Please check section 5.1 on http://www.redbooks.ibm.com/redbooks/pdfs/sg247298.pdf for information on this.

Region index is automatically created and maintained by DB2 for each table that has a XML column. Whenever the inserted XML document has a larger size than that of the database page size, DB2 automatically splits the XML document to fit it into multiple database pages. The region index is used to maintain the logical mapping between these split sub-documents. By splitting the XML document page-wise, DB2 is able to make use of the relational database facilities like prefetching.

Like indexes on relational data, using indexes on an XML column to improve the query performance may have some cost. The performance for INSERT, UPDATE, and DELETE can decrease as the number of indexes defined on XML column increases. Indexes also take space, so you should only create indexes that are really necessary.

The CREATE INDEX statement has been enhanced to support XML indexing. XML indexes are created on columns of type XML based on path expressions (xmlpattern). When creating an XML index, the following fields are required:
Index name: Specify the name of an XML index.
Table and column names: Specify which XML column is indexed.
XMLPATTERN: Specify the node you want to index.
Data type: Specify the SQL data type for the XML index.

For eg. CREATE INDEX xidx1 on PRODUCT(XMLDOC)
GENERATE KEY USING XMLPATTERN
‘/products/product/id’ as SQL DOUBLE

This will create a XML index on the XMLDOC column of the PRODUCT table.
The index xidx1 will have all the product id values indexed with their data type being DOUBLE.

You can check section 5.1.2 on http://www.redbooks.ibm.com/redbooks/pdfs/sg247298.pdf link for more information on this.

This article has been dead for over six months. Start a new discussion instead.