1,105,371 Community Members

XML Extra content at the end of the documen

Member Avatar
mehnihma
Posting Whiz in Training
239 posts since Oct 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi
I have XML file that appends on the end of the file:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

    <html xmlns="http://www.w3.org/1999/xhtml" >
    <head><title>

    </title></head>
    <body>
        <form name="form1" method="post" action="GetProductsXML.aspx?username=UASERNAME&amp;password=PASSWORD" id="form1">
    <div>
    <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwULLTE2MTY2ODcyMjlkZC/1D4iGqP0urqyxWR+2OEQ90eHf" />
    </div>

        <div>

        </div>
        </form>
    </body>
    </html>

I am using this function:

$xml_url= 'http://b2b.domain.com/GetProductsXML.aspx?username=USERNAME&password=PASSWORD';
        $xml = simplexml_load_file(utf8_encode($xml_url), 'SimpleXMLElement', LIBXML_NOCDATA);

How can I filter extra content from this XML? When I open it in web browser I get HTML page with text

LastMitch
Deleted Member
 
0
 

How can I filter extra content from this XML? When I open it in web browser I get HTML page with text

Was there an error when you load the XML?

Member Avatar
mehnihma
Posting Whiz in Training
239 posts since Oct 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

There is no error when I load it in broswer but I do not get XML but HTML document because of that code in the end of this XML file.
It is generated with that code in the end of the file and because of that I cannot read it like XML, so I need to strip that par somehow to read it like XML if it is possible.

LastMitch
Deleted Member
 
0
 

I think need to adjusted your $xml_url. The reason why because it's not letting you read the XML.

Member Avatar
mehnihma
Posting Whiz in Training
239 posts since Oct 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

What do you mean?

LastMitch
Deleted Member
 
0
 

@mehnihma

What do you mean?

Read this and you will see an example of how to read xml since you are having issue with it:

http://fuelyourcoding.com/reading-xml-with-php/

Member Avatar
mehnihma
Posting Whiz in Training
239 posts since Oct 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

As in that article you posted, my syntax is exatcly the same, problem is not in reading XML but this XML has extra HTML as I posted above.
Example:

</ProductDescription>
<ImageLarge>http://domain.com/images/products/KOMNET201_inf.jpg</ImageLarge>
<ImageSmall>http://domain.com/images/products/KOMNET201_kat.jpg</ImageSmall>
<BarCode>6935364052034</BarCode>
<ProducerWebPage>http://www.tp-link.com/en</ProducerWebPage>
<ProductWebPage>http://www.tp-link.com/en/products/prodetail.aspx?mid=0103030106&amp;id=541</ProductWebPage>
<Warranty>12 mj.</Warranty><CategoryName>Antene i dodatna oprema</CategoryName>
<ParentCategoryName>Mrežna oprema</ParentCategoryName>
<RowNumber>436</RowNumber><NetoPrice>95,93</NetoPrice>
<ProductDescriptionShort>ohms nominal, VSWR: 1.92 max., cable 1m, SMA</ProductDescriptionShort>
<AvailableQuantity>0</AvailableQuantity>
<InfoWindowLink>http://domain.com/ProductInfo.aspx?ProductID=53299</InfoWindowLink>
<Producer>TP-LINK</Producer></Product><Product>
<IsActiveRetail>true</IsActiveRetail>
<SortOrderRetail>16506</SortOrderRetail>
<SortOrderHomePageRetail>100</SortOrderHomePageRetail>
<ProductID>370770</ProductID>
<ProductCode>KOMNET272</ProductCode>
<ProductName>ANTENA TL-ANT2412D</ProductName>
<ProductDescription />



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">



<html xmlns="http://www.w3.org/1999/xhtml" >

<head><title>



</title></head>

<body>

    <form name="form1" method="post" action="GetProductsXML.aspx?username=domain.com&amp;password=89" id="form1">

<div>

<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwULLTE2MTY2ODcyMjlkZOOJeh0Tms5Udbf1jSVwRpTz4gUg" />

</div>



    <div>



    </div>

    </form>

</body>

</html>

How to exclude HTML from XML?

LastMitch
Deleted Member
 
0
 

It is generated with that code in the end of the file and because of that I cannot read it like XML, so I need to strip that par somehow to read it like XML if it is possible.

If you mention you can't read the XML but now you can?

How to exclude HTML from XML?

You just don't want the HTML tags appear?

I don't get.

XML file is separate file.
HTML file read the XML.
You don't put XML with HTML in 1 file.

Member Avatar
mehnihma
Posting Whiz in Training
239 posts since Oct 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

The problem is that that is the "XML" which is given to me but it has html tags in it, so I cannot read it like XML, I need to find a way to exclude that tags when reading this so called XML

LastMitch
Deleted Member
 
0
 

The problem is that that is the "XML" which is given to me but it has html tags in it, so I cannot read it like XML, I need to find a way to exclude that tags when reading this so called XML

This:

$xml_url= 'http://b2b.domain.com/GetProductsXML.aspx?username=USERNAME&password=PASSWORD';
$xml = simplexml_load_file(utf8_encode($xml_url), 'SimpleXMLElement', LIBXML_NOCDATA);

Take everything except:

$xml = simplexml_load_file('GetProductsXML.xml');

I want to know can you load the GetProductsXML.xml without any issue?

If you can then there's no issue with reading the file.

Then the issue is has something to do with this:

$xml_url= 'http://b2b.domain.com/GetProductsXML.aspx?username=USERNAME&password=PASSWORD';

If there's an issue reading the GetProductsXML.xml that will tell you that you have a issue reading the GetProductsXML.xml file.

Member Avatar
mehnihma
Posting Whiz in Training
239 posts since Oct 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

That is the problem because it canot read it as xml because extra html data in it

LastMitch
Deleted Member
 
0
 

That is the problem because it canot read it as xml because extra html data in it

So the issue is this

$xml_url= 'http://b2b.domain.com/GetProductsXML.aspx?username=USERNAME&password=PASSWORD';
Member Avatar
diafol
Where are my eyes?
12,983 posts since Oct 2006
Reputation Points: 1,821 [?]
Q&As Helped to Solve: 1,848 [?]
Skill Endorsements: 92 [?]
Moderator
Featured
Sponsor
 
0
 

Why is there html in your xml?

Member Avatar
mehnihma
Posting Whiz in Training
239 posts since Oct 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Honestly, not shure, pearson who did that said that it is OK, and it should look like that :). Because for him this is good.
This is what I have and have to find a way to deal with it :)

Member Avatar
diafol
Where are my eyes?
12,983 posts since Oct 2006
Reputation Points: 1,821 [?]
Q&As Helped to Solve: 1,848 [?]
Skill Endorsements: 92 [?]
Moderator
Featured
Sponsor
 
0
 

XML files should only contain XML.

Member Avatar
mehnihma
Posting Whiz in Training
239 posts since Oct 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

That I know, but I cannot do anything in this case, just remove it if possible?

Member Avatar
diafol
Where are my eyes?
12,983 posts since Oct 2006
Reputation Points: 1,821 [?]
Q&As Helped to Solve: 1,848 [?]
Skill Endorsements: 92 [?]
Moderator
Featured
Sponsor
 
0
 

That I know, but I cannot do anything in this case, just remove it if possible?

I would, but, you could however read the file into a string and then remove the html part, and use the remainder in simplexml_load_string().

Member Avatar
mehnihma
Posting Whiz in Training
239 posts since Oct 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

I have tried to exclude it in a string but with no luck, I always get something from hmtl

Member Avatar
diafol
Where are my eyes?
12,983 posts since Oct 2006
Reputation Points: 1,821 [?]
Q&As Helped to Solve: 1,848 [?]
Skill Endorsements: 92 [?]
Moderator
Featured
Sponsor
 
0
 

What have you tried? show us the code you used. perhaps we can tweak it.

Member Avatar
mehnihma
Posting Whiz in Training
239 posts since Oct 2011
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 
return preg_replace('~<(?:!DOCTYPE|/?(?:html|body))[^>]*>\s*~i', '',$retValue);

Also something like this:

$nedozvoljeno1 = array('<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"">      <html xmlns=""http://www.w3.org/1999/xhtml"" >     <head><title>      </title></head>     <body>         <form name=""form1"" method=""post"" action=""GetProductsXML.aspx?username=UASERNAME&amp;password=PASSWORD"" id=""form1"">     <div>     <input type=""hidden"" name=""__VIEWSTATE"" id=""__VIEWSTATE"" value=""/wEPDwULLTE2MTY2ODcyMjlkZC/1D4iGqP0urqyxWR+2OEQ90eHf"" />     </div>          <div>          </div>         </form>     </body>     </html>     <!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"">      <html xmlns=""http://www.w3.org/1999/xhtml"" >     <head><title>      </title></head>     <body>         <form name=""form1"" method=""post"" action=""GetProductsXML.aspx?username=UASERNAME&amp;password=PASSWORD"" id=""form1"">     <div>     <input type=""hidden"" name=""__VIEWSTATE"" id=""__VIEWSTATE"" value=""/wEPDwULLTE2MTY2ODcyMjlkZC/1D4iGqP0urqyxWR+2OEQ90eHf"" />     </div>          <div>          </div>         </form>     </body>     </html>');

                return str_replace($nedozvoljeno1, "", $retValue);

Maybe some new ideas?

You
This question has already been solved: Start a new discussion instead
Post:
Start New Discussion
View similar articles that have also been tagged: