cereal 1,524 Nearly a Senior Poster Featured Poster

Hello everybody,

so, I realized that when I try to load some XML into a MySQL table through the LOAD XML LOCAL INFILE ... statement, if there is a parent with the same name of the child, MySQL will insert an extra row. Here's the example to reproduce the issue:

<fruit>
    <fruit>
        <name>Orange</name>
        <variety>Valencia</variety>
    </fruit>
    <fruit>
        <name>Apple</name>
        <variety>Fuji</variety>
    </fruit>
    <fruit>
        <name>Lemon</name>
        <variety>Eureka</variety>
    </fruit>
</fruit>

And the SQL:

CREATE TABLE IF NOT EXISTS `tmp_fruits`(
    `name` VARCHAR(50) NOT NULL,
    `variety` VARCHAR(50) NOT NULL
) ENGINE = MEMORY;

LOAD XML LOCAL INFILE 'data.xml' INTO TABLE tmp_fruits ROWS IDENTIFIED BY '<fruit>';

SELECT * FROM `tmp_fruits`;

I got four rows instead of three:

+--------+----------+
| name   | variety  |
+--------+----------+
| Orange | Valencia |
| Apple  | Fuji     |
| Lemon  | Eureka   |
| Lemon  | Eureka   |
+--------+----------+
4 rows in set (0.00 sec)

I think it happens because it matches the parent node (1) and the children nodes (3):

<fruit> <!-- parent -->
    <fruit> <!-- child -->
        <name>Orange</name>
        <variety>Valencia</variety>
    </fruit>
    <fruit> <!-- child -->
        <name>Apple</name>
        <variety>Fuji</variety>
    </fruit>
    <fruit> <!-- child -->
        <name>Lemon</name>
        <variety>Eureka</variety>
    </fruit>
</fruit>

How this could be solved? The XML is an example, I cannot change the parent to a plural <fruits> at the origin, as it's generated by an external source and defined by a namespace, maybe I could alter the received file or extract those nodes, but I'm not happy with these solutions.

NOTE
ExtractValue() works fine and matches only three records, example:

SET @xml_data := load_file('data.xml');
SELECT replace(ExtractValue(@xml_data, '//fruit/child::name'), ' ', ',') AS `name`
     , replace(ExtractValue(@xml_data, '//fruit/child::variety'), ' ', ',') AS `variety`;

+--------------------+----------------------+
| name               | variety              |
+--------------------+----------------------+
| Orange,Apple,Lemon | Valencia,Fuji,Eureka |
+--------------------+----------------------+
1 row in set (0.03 sec)

So this is already something to consider. Currently I'm inclined to submit this to the MySQL bug list, but I would like to know what you think about it.

Thanks, for your time.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.