gamebits 0 Newbie Poster

Sometimes ago with the help of some programmers I wrote a piece of code to parse result from ebay's website, title, item number, bids, price sold and date. This is the regex that used to work to get the title of an auction that ended with a sale.

$match_count1 = preg_match_all('#class\s*=\s*"vip[^"]*">([^<]*)</a>(?=(?:.(?!class\s*=\s*"vip[^"]*"))*<span\s+class\s*=\s*"sold">)#is',$source,$title_arr);

If the item didn't sell the auction listing would be skipped.

Here is an example of the text to parse:

<a name=\"item336fbfd20a\"> </a><table class=\"li\" r=\"19\" cellspacing=\"0\"><tbody><tr><td class=\"dtl lt\"><div class=\"ttl\"><a href=\"http://www.ebay.com/itm/Lone-Ranger-1940-Card-16-PSA-3-/220918174218?pt=LH_DefaultDomain_0&amp;hash=item336fbfd20a\" class=\"vip\">The Lone Ranger (1940) - Card # 16 PSA 3</a></div><div class=\"dyn dynS\"><div class=\"s1\"><div class=\"mWSpc\"></div>Item: <span class=\"v\">220918174218</span></div><div class=\"s2\">Returns: Accepted within 7 days</div></div><div class=\"anchors\"><div class=\"g-nav group\"><a href=\"/sch/sis.html?_kw=The+Lone+Ranger+%281940%29+-+Card+%23+16+PSA+3&amp;ssPageName=SRCH%3ACMPL%3AVS&amp;_fis=2&amp;_id=220918174218&amp;_isid=0&amp;_sibeleafcat=156521\">View similar active items</a><span class=\"vbar g-nav\">&nbsp;|&nbsp;</span><a href=\"http://cgi5.ebay.com/ws/eBayISAPI.dll?SellLikeItem&amp;item=220918174218&amp;ssPageName=STRK:MEWN:LILTX\">Sell one like this</a><div class=\"mi-l\"><div><a class=\"pll ppr\" id=\"v4-65\" href=\"javascript:;\" onmouseover='return gallery(event, {\"item\":\"220918174218\",\"offset\":null,\"images\":1,\"version\":0,\"href\":null});'>Enlarge</a></div></div></div></div></td><td class=\"trs\"></td><td class=\"bids\"><div class=\"bin1\">1 Bid</div><span class=\"sold\">Sold</span></td><td class=\"prc\"><div class=\"bidsold g-b\">$19.95</div><span class=\"ship fee\">+$3.00 shipping</span></td><td class=\"tme  rt\"><b class=\"hidlb\">End Date:</b><span>Dec-27 15:11</span></td></tr></tbody></table>

I have other regex in my script to get the other bit of information that I need.

Now the problem: they added a title to the link tag as such:

<a href=\"http://www.ebay.com/itm/Lone-Ranger-1940-Card-16-PSA-3-/220918174218?pt=LH_DefaultDomain_0&amp;hash=item336fbfd20a\" class=\"vip\" title=\"The Lone Ranger (1940) - Card # 16 PSA 3\">The Lone Ranger (1940) - Card # 16 PSA 3</a>

And that is enough to screw up my regex, and although I still can get the title by using this:

$match_count1 = preg_match_all("'title=\"(.*?)\">'si", $source, $title_arr);

I cannot for the life of me figure out how to do the look ahead so it parse only sold item, any help in the matter would be greatly appreciated.

I'd like to say this is not for a commercial application, it is aimed at a very specific niche (trading cards) and it is a tool to help me keep track of value of the cards I collect.