We're a community of 1076K IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,075,894 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

XML Extra content at the end of the documen

Hi
I have XML file that appends on the end of the file:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

    <html xmlns="http://www.w3.org/1999/xhtml" >
    <head><title>

    </title></head>
    <body>
        <form name="form1" method="post" action="GetProductsXML.aspx?username=UASERNAME&amp;password=PASSWORD" id="form1">
    <div>
    <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwULLTE2MTY2ODcyMjlkZC/1D4iGqP0urqyxWR+2OEQ90eHf" />
    </div>

        <div>

        </div>
        </form>
    </body>
    </html>

I am using this function:

$xml_url= 'http://b2b.domain.com/GetProductsXML.aspx?username=USERNAME&password=PASSWORD';
        $xml = simplexml_load_file(utf8_encode($xml_url), 'SimpleXMLElement', LIBXML_NOCDATA);

How can I filter extra content from this XML? When I open it in web browser I get HTML page with text

3
Contributors
20
Replies
5 Days
Discussion Span
3 Months Ago
Last Updated
25
Views
mehnihma
Posting Whiz in Training
234 posts since Oct 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

How can I filter extra content from this XML? When I open it in web browser I get HTML page with text

Was there an error when you load the XML?

LastMitch
Industrious Poster
4,146 posts since Mar 2012
Reputation Points: 132
Solved Threads: 334
Skill Endorsements: 45

There is no error when I load it in broswer but I do not get XML but HTML document because of that code in the end of this XML file.
It is generated with that code in the end of the file and because of that I cannot read it like XML, so I need to strip that par somehow to read it like XML if it is possible.

mehnihma
Posting Whiz in Training
234 posts since Oct 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

I think need to adjusted your $xml_url. The reason why because it's not letting you read the XML.

LastMitch
Industrious Poster
4,146 posts since Mar 2012
Reputation Points: 132
Solved Threads: 334
Skill Endorsements: 45

What do you mean?

mehnihma
Posting Whiz in Training
234 posts since Oct 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

@mehnihma

What do you mean?

Read this and you will see an example of how to read xml since you are having issue with it:

http://fuelyourcoding.com/reading-xml-with-php/

LastMitch
Industrious Poster
4,146 posts since Mar 2012
Reputation Points: 132
Solved Threads: 334
Skill Endorsements: 45

As in that article you posted, my syntax is exatcly the same, problem is not in reading XML but this XML has extra HTML as I posted above.
Example:

</ProductDescription>
<ImageLarge>http://domain.com/images/products/KOMNET201_inf.jpg</ImageLarge>
<ImageSmall>http://domain.com/images/products/KOMNET201_kat.jpg</ImageSmall>
<BarCode>6935364052034</BarCode>
<ProducerWebPage>http://www.tp-link.com/en</ProducerWebPage>
<ProductWebPage>http://www.tp-link.com/en/products/prodetail.aspx?mid=0103030106&amp;id=541</ProductWebPage>
<Warranty>12 mj.</Warranty><CategoryName>Antene i dodatna oprema</CategoryName>
<ParentCategoryName>Mrežna oprema</ParentCategoryName>
<RowNumber>436</RowNumber><NetoPrice>95,93</NetoPrice>
<ProductDescriptionShort>ohms nominal, VSWR: 1.92 max., cable 1m, SMA</ProductDescriptionShort>
<AvailableQuantity>0</AvailableQuantity>
<InfoWindowLink>http://domain.com/ProductInfo.aspx?ProductID=53299</InfoWindowLink>
<Producer>TP-LINK</Producer></Product><Product>
<IsActiveRetail>true</IsActiveRetail>
<SortOrderRetail>16506</SortOrderRetail>
<SortOrderHomePageRetail>100</SortOrderHomePageRetail>
<ProductID>370770</ProductID>
<ProductCode>KOMNET272</ProductCode>
<ProductName>ANTENA TL-ANT2412D</ProductName>
<ProductDescription />



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">



<html xmlns="http://www.w3.org/1999/xhtml" >

<head><title>



</title></head>

<body>

    <form name="form1" method="post" action="GetProductsXML.aspx?username=domain.com&amp;password=89" id="form1">

<div>

<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwULLTE2MTY2ODcyMjlkZOOJeh0Tms5Udbf1jSVwRpTz4gUg" />

</div>



    <div>



    </div>

    </form>

</body>

</html>

How to exclude HTML from XML?

mehnihma
Posting Whiz in Training
234 posts since Oct 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

It is generated with that code in the end of the file and because of that I cannot read it like XML, so I need to strip that par somehow to read it like XML if it is possible.

If you mention you can't read the XML but now you can?

How to exclude HTML from XML?

You just don't want the HTML tags appear?

I don't get.

XML file is separate file.
HTML file read the XML.
You don't put XML with HTML in 1 file.

LastMitch
Industrious Poster
4,146 posts since Mar 2012
Reputation Points: 132
Solved Threads: 334
Skill Endorsements: 45

The problem is that that is the "XML" which is given to me but it has html tags in it, so I cannot read it like XML, I need to find a way to exclude that tags when reading this so called XML

mehnihma
Posting Whiz in Training
234 posts since Oct 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

The problem is that that is the "XML" which is given to me but it has html tags in it, so I cannot read it like XML, I need to find a way to exclude that tags when reading this so called XML

This:

$xml_url= 'http://b2b.domain.com/GetProductsXML.aspx?username=USERNAME&password=PASSWORD';
$xml = simplexml_load_file(utf8_encode($xml_url), 'SimpleXMLElement', LIBXML_NOCDATA);

Take everything except:

$xml = simplexml_load_file('GetProductsXML.xml');

I want to know can you load the GetProductsXML.xml without any issue?

If you can then there's no issue with reading the file.

Then the issue is has something to do with this:

$xml_url= 'http://b2b.domain.com/GetProductsXML.aspx?username=USERNAME&password=PASSWORD';

If there's an issue reading the GetProductsXML.xml that will tell you that you have a issue reading the GetProductsXML.xml file.

LastMitch
Industrious Poster
4,146 posts since Mar 2012
Reputation Points: 132
Solved Threads: 334
Skill Endorsements: 45

That is the problem because it canot read it as xml because extra html data in it

mehnihma
Posting Whiz in Training
234 posts since Oct 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

That is the problem because it canot read it as xml because extra html data in it

So the issue is this

$xml_url= 'http://b2b.domain.com/GetProductsXML.aspx?username=USERNAME&password=PASSWORD';
LastMitch
Industrious Poster
4,146 posts since Mar 2012
Reputation Points: 132
Solved Threads: 334
Skill Endorsements: 45

Why is there html in your xml?

diafol
Keep Smiling
Moderator
10,644 posts since Oct 2006
Reputation Points: 1,628
Solved Threads: 1,509
Skill Endorsements: 57

Honestly, not shure, pearson who did that said that it is OK, and it should look like that :). Because for him this is good.
This is what I have and have to find a way to deal with it :)

mehnihma
Posting Whiz in Training
234 posts since Oct 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

XML files should only contain XML.

diafol
Keep Smiling
Moderator
10,644 posts since Oct 2006
Reputation Points: 1,628
Solved Threads: 1,509
Skill Endorsements: 57

That I know, but I cannot do anything in this case, just remove it if possible?

mehnihma
Posting Whiz in Training
234 posts since Oct 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

That I know, but I cannot do anything in this case, just remove it if possible?

I would, but, you could however read the file into a string and then remove the html part, and use the remainder in simplexml_load_string().

diafol
Keep Smiling
Moderator
10,644 posts since Oct 2006
Reputation Points: 1,628
Solved Threads: 1,509
Skill Endorsements: 57

I have tried to exclude it in a string but with no luck, I always get something from hmtl

mehnihma
Posting Whiz in Training
234 posts since Oct 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

What have you tried? show us the code you used. perhaps we can tweak it.

diafol
Keep Smiling
Moderator
10,644 posts since Oct 2006
Reputation Points: 1,628
Solved Threads: 1,509
Skill Endorsements: 57
return preg_replace('~<(?:!DOCTYPE|/?(?:html|body))[^>]*>\s*~i', '',$retValue);

Also something like this:

$nedozvoljeno1 = array('<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"">      <html xmlns=""http://www.w3.org/1999/xhtml"" >     <head><title>      </title></head>     <body>         <form name=""form1"" method=""post"" action=""GetProductsXML.aspx?username=UASERNAME&amp;password=PASSWORD"" id=""form1"">     <div>     <input type=""hidden"" name=""__VIEWSTATE"" id=""__VIEWSTATE"" value=""/wEPDwULLTE2MTY2ODcyMjlkZC/1D4iGqP0urqyxWR+2OEQ90eHf"" />     </div>          <div>          </div>         </form>     </body>     </html>     <!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"">      <html xmlns=""http://www.w3.org/1999/xhtml"" >     <head><title>      </title></head>     <body>         <form name=""form1"" method=""post"" action=""GetProductsXML.aspx?username=UASERNAME&amp;password=PASSWORD"" id=""form1"">     <div>     <input type=""hidden"" name=""__VIEWSTATE"" id=""__VIEWSTATE"" value=""/wEPDwULLTE2MTY2ODcyMjlkZC/1D4iGqP0urqyxWR+2OEQ90eHf"" />     </div>          <div>          </div>         </form>     </body>     </html>');

                return str_replace($nedozvoljeno1, "", $retValue);

Maybe some new ideas?

mehnihma
Posting Whiz in Training
234 posts since Oct 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

This article has been dead for over three months: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
 
View similar articles that have also been tagged:
 
© 2013 DaniWeb® LLC
Page rendered in 0.1532 seconds using 2.87MB