1.11M Members

XML Extra content at the end of the documen

 
0
 

Hi
I have XML file that appends on the end of the file:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

    <html xmlns="http://www.w3.org/1999/xhtml" >
    <head><title>

    </title></head>
    <body>
        <form name="form1" method="post" action="GetProductsXML.aspx?username=UASERNAME&amp;password=PASSWORD" id="form1">
    <div>
    <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwULLTE2MTY2ODcyMjlkZC/1D4iGqP0urqyxWR+2OEQ90eHf" />
    </div>

        <div>

        </div>
        </form>
    </body>
    </html>

I am using this function:

$xml_url= 'http://b2b.domain.com/GetProductsXML.aspx?username=USERNAME&password=PASSWORD';
        $xml = simplexml_load_file(utf8_encode($xml_url), 'SimpleXMLElement', LIBXML_NOCDATA);

How can I filter extra content from this XML? When I open it in web browser I get HTML page with text

LastMitch
Deleted Member
 
0
 

How can I filter extra content from this XML? When I open it in web browser I get HTML page with text

Was there an error when you load the XML?

 
0
 

There is no error when I load it in broswer but I do not get XML but HTML document because of that code in the end of this XML file.
It is generated with that code in the end of the file and because of that I cannot read it like XML, so I need to strip that par somehow to read it like XML if it is possible.

LastMitch
Deleted Member
 
0
 

I think need to adjusted your $xml_url. The reason why because it's not letting you read the XML.

 
0
 

What do you mean?

 
0
 

As in that article you posted, my syntax is exatcly the same, problem is not in reading XML but this XML has extra HTML as I posted above.
Example:

</ProductDescription>
<ImageLarge>http://domain.com/images/products/KOMNET201_inf.jpg</ImageLarge>
<ImageSmall>http://domain.com/images/products/KOMNET201_kat.jpg</ImageSmall>
<BarCode>6935364052034</BarCode>
<ProducerWebPage>http://www.tp-link.com/en</ProducerWebPage>
<ProductWebPage>http://www.tp-link.com/en/products/prodetail.aspx?mid=0103030106&amp;id=541</ProductWebPage>
<Warranty>12 mj.</Warranty><CategoryName>Antene i dodatna oprema</CategoryName>
<ParentCategoryName>Mrežna oprema</ParentCategoryName>
<RowNumber>436</RowNumber><NetoPrice>95,93</NetoPrice>
<ProductDescriptionShort>ohms nominal, VSWR: 1.92 max., cable 1m, SMA</ProductDescriptionShort>
<AvailableQuantity>0</AvailableQuantity>
<InfoWindowLink>http://domain.com/ProductInfo.aspx?ProductID=53299</InfoWindowLink>
<Producer>TP-LINK</Producer></Product><Product>
<IsActiveRetail>true</IsActiveRetail>
<SortOrderRetail>16506</SortOrderRetail>
<SortOrderHomePageRetail>100</SortOrderHomePageRetail>
<ProductID>370770</ProductID>
<ProductCode>KOMNET272</ProductCode>
<ProductName>ANTENA TL-ANT2412D</ProductName>
<ProductDescription />



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">



<html xmlns="http://www.w3.org/1999/xhtml" >

<head><title>



</title></head>

<body>

    <form name="form1" method="post" action="GetProductsXML.aspx?username=domain.com&amp;password=89" id="form1">

<div>

<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwULLTE2MTY2ODcyMjlkZOOJeh0Tms5Udbf1jSVwRpTz4gUg" />

</div>



    <div>



    </div>

    </form>

</body>

</html>

How to exclude HTML from XML?

LastMitch
Deleted Member
 
0
 

It is generated with that code in the end of the file and because of that I cannot read it like XML, so I need to strip that par somehow to read it like XML if it is possible.

If you mention you can't read the XML but now you can?

How to exclude HTML from XML?

You just don't want the HTML tags appear?

I don't get.

XML file is separate file.
HTML file read the XML.
You don't put XML with HTML in 1 file.

 
0
 

The problem is that that is the "XML" which is given to me but it has html tags in it, so I cannot read it like XML, I need to find a way to exclude that tags when reading this so called XML

LastMitch
Deleted Member
 
0
 

The problem is that that is the "XML" which is given to me but it has html tags in it, so I cannot read it like XML, I need to find a way to exclude that tags when reading this so called XML

This:

$xml_url= 'http://b2b.domain.com/GetProductsXML.aspx?username=USERNAME&password=PASSWORD';
$xml = simplexml_load_file(utf8_encode($xml_url), 'SimpleXMLElement', LIBXML_NOCDATA);

Take everything except:

$xml = simplexml_load_file('GetProductsXML.xml');

I want to know can you load the GetProductsXML.xml without any issue?

If you can then there's no issue with reading the file.

Then the issue is has something to do with this:

$xml_url= 'http://b2b.domain.com/GetProductsXML.aspx?username=USERNAME&password=PASSWORD';

If there's an issue reading the GetProductsXML.xml that will tell you that you have a issue reading the GetProductsXML.xml file.

 
0
 

That is the problem because it canot read it as xml because extra html data in it

 
0
 

Why is there html in your xml?

 
0
 

Honestly, not shure, pearson who did that said that it is OK, and it should look like that :). Because for him this is good.
This is what I have and have to find a way to deal with it :)

 
0
 

XML files should only contain XML.

 
0
 

That I know, but I cannot do anything in this case, just remove it if possible?

 
0
 

That I know, but I cannot do anything in this case, just remove it if possible?

I would, but, you could however read the file into a string and then remove the html part, and use the remainder in simplexml_load_string().

 
0
 

I have tried to exclude it in a string but with no luck, I always get something from hmtl

 
0
 

What have you tried? show us the code you used. perhaps we can tweak it.

 
0
 
return preg_replace('~<(?:!DOCTYPE|/?(?:html|body))[^>]*>\s*~i', '',$retValue);

Also something like this:

$nedozvoljeno1 = array('<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"">      <html xmlns=""http://www.w3.org/1999/xhtml"" >     <head><title>      </title></head>     <body>         <form name=""form1"" method=""post"" action=""GetProductsXML.aspx?username=UASERNAME&amp;password=PASSWORD"" id=""form1"">     <div>     <input type=""hidden"" name=""__VIEWSTATE"" id=""__VIEWSTATE"" value=""/wEPDwULLTE2MTY2ODcyMjlkZC/1D4iGqP0urqyxWR+2OEQ90eHf"" />     </div>          <div>          </div>         </form>     </body>     </html>     <!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"">      <html xmlns=""http://www.w3.org/1999/xhtml"" >     <head><title>      </title></head>     <body>         <form name=""form1"" method=""post"" action=""GetProductsXML.aspx?username=UASERNAME&amp;password=PASSWORD"" id=""form1"">     <div>     <input type=""hidden"" name=""__VIEWSTATE"" id=""__VIEWSTATE"" value=""/wEPDwULLTE2MTY2ODcyMjlkZC/1D4iGqP0urqyxWR+2OEQ90eHf"" />     </div>          <div>          </div>         </form>     </body>     </html>');

                return str_replace($nedozvoljeno1, "", $retValue);

Maybe some new ideas?

You
This question has already been solved: Start a new discussion instead
Post:
Start New Discussion
View similar articles that have also been tagged: