I'm creating a bot scraper that gathers information off of other websites and i am using html simple dom parser to do it.
I have found a bug though. I ran into one website that doesnt parse.\
Here is a sample of the code that it cannot parse:
<div class="header"><div class="container"><ul id="nav"><li><a id="home" href="http://thinkclay.com" class="selected" title="Return to the home page">Return to home</a></li><li><a id="about" href="http://thinkclay.com/about" title="Read more about Clay McIlrath">About Clay McIlrath</a></li><li><a id="design" href="http://thinkclay.com/graphic-design" title="View my Graphic Design Portfolio">Web Design Portfolio</a></li><li><a id="development" href="http://thinkclay.com/web-development" title="View my Web Development Portfolio">Web Development Portfolio</a></li><li><a id="photography" href="http://thinkclay.com/photography" title="View my Photography Portfolio">Photography Portfolio</a></li><li><a id="wallpaper" href="http://thinkclay.com/desktop-wallpapers" title="Download free desktop wallpapers">Free Desktop Wallpapers</a></li><li><a id="wordpress" href="http://thinkclay.com/wordpress" title="Download free wordpress themes">Free Wordpress Themes</a></li></ul><div style="clear:both;"></div><p>My name is Clayton McIlrath and I am an entrepreneur currently living in CO. I personally enjoy the process of learning, exploring, and doing all things creative as well as sharing my experiences with others. Being an entrepreneur and <a href="http://bychosen.com">business owner</a>, I hope that my experiences may help someone else start their own venture and find success and freedom as I have! Feel free to <a href="http://bychosen.com/contact">contact me</a> anytime for questions or opportunities.</p> <a class="close" href="#close" title="Close the Cloud"><img src="http://thinkclay.com/wp-content/themes/thinkclay_v2/images/close.png" alt="close" /></a></div></div><div class="container"> <a
its seems as if the code gets a line break after the tag name and before the first attribute.
I have tried str_replace'ing & preg_replacing white space characters with a single space and that still doesnt seem to work. Would anybody have any ideas as to why this is happening and how i can fix it?