hello i am tryna extract information from a chinese website but i dont know which keyword to use to extract certain information. the html code snippet is shown below .this is a result from running a query on c++ books i want to get the following information

book title C++反汇编与逆向分析技术揭秘(《程序员》杂志“2011年度十大最具技术影响力图书”,好评如潮)
author 钱林松;赵海 旭(著) 机械工业出版社
book price ¥51.75

the link for this website is Link Anchor Text

i just need a method to retrieve the three

 <li> <ins>3.</ins>
          <pre><a target="_blank" href="http://product.china-pub.com/198624"><img border="0" title="C++反汇编与逆向分析技术揭秘(《程序员》杂志“2011年度十大最具技术影响力图书”,好评如潮)" height="110" width="79" src="http://images.china-pub.com/ebook195001-200000/198624/cover.jpg" mysrc='/ebook195001-200000/198624' n='-1' onerror='jp.oe(this);' border='0'  onload='jp.w(this);'/></a></pre>
          <div>
            <h2>
            <a target="_blank" href="http://product.china-pub.com/198624">
            <b>c++</b>反汇编与逆向分析技术揭秘(《程序员》杂志“2011年度十大最具技术影响力图书”,好评如潮)
            </a>

            <a target="_blank" href="http://www.china-pub.com/temporary/faq/2007FAQ/wenti_gouwu.asp#gouwuxd_04" class="fourhours">4小时出库</a>
            </h2>
            <h3>&nbsp;</h3>
            <p>钱林松;赵海旭<span>(著) |</span>  机械工业出版社 <span>|</span> 9787111356332 <span>|</span> 2011-09-01</p>
            <dl>
              <dd>

              <a href="http://www.china-pub.com/member/bookpinglun/viewpinglun.asp?id=198624" target="_blank"> <img src="http://www.china-pub.com/computers/common/image/art1.gif"/><img src="http://www.china-pub.com/computers/common/image/art1.gif"/><img src="http://www.china-pub.com/computers/common/image/art1.gif"/><img src="http://www.china-pub.com/computers/common/image/art1.gif"/><img src="http://www.china-pub.com/computers/common/image/art1.gif"/> [ 44 人评价]</a>

               </dd>
              <dd class="ad">满48元全国600个城市免运费!</dd>
            </dl>
            <ul>
              <li><b>¥51.75</b><span>(4-5星会员价)</span> ¥69.00<span>(定价)</span></li>
              <li class="button">
              <a href="javascript:addcsg(198624,'c');" class="bt5">收藏</a><a href="javascript:tgwinpop(198624,'c');" class="bt2">团购</a><a href="javascript:winpop(198624,'c');" class="bt1">购买</a>

              </li>
            </ul>

          </div>
          <br class="space" />
        </li>

So, is the issue "Parsing the HTML" or "getting around the character set"?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.