Manipulating a String that contains HTML source

Question

ReeciePoo 0 Light Poster

18 Years Ago

Im using the MS HTML Object Reference to do this

<a href="stockmarket.phtml?type=buy&ticker=VPTS">VPTS</a><br>
   <a href="stockmarket.phtml?type=profile&company_id=126"><font size=1>(profile)</font></a>
   </td>
   <td align="center">
   9   </td>
   <td align="center">
   9   </td>
   <td align="center">
   <font color="black"><b>0</b></font>
   </td>
   <td align="center">
   1,000</td>
   <td align="center">
   15,000   </td>
   <td align="center">
   9,000   </td>
   <td align="center">
   <font color="red"><b><nobr>-40.00%</nobr></b></font>
   </td>

Ok i need to split the following out of the above code (the numbers and ticker always changed)

VPTS
9
9
0
1,000
15,000
9,000
-40.00%

I need to get the above out of the Above Source Code and im not sure what to use i understand Left$() Mid$() Right()$ Len() would come into this problem but i have never used those functions

could someone offer me assistance or even a solution?

html-css visual-basic

2 Contributors
8 Replies
111 Views
1 Day Discussion Span
Latest Post 18 Years Ago Latest Post by ReeciePoo

All 8 Replies

davidcairns 1 Junior Poster

18 Years Ago

Or you could determine what doctype the html uses (xhtml strict would be ideal) and use an XML parser to cycle through the nodes.

Having said that if the html is always that simple the instr and mid should be sufficient for the task at hand.

Start with "ticker=" and grab that, then look for "<td align" and keep going from there. Remember instr allows you to specify a start character so you don't keep searching from the start of the string.

davidcairns 1 Junior Poster

18 Years Ago

heres some of the basics for option 2 (option 1 requires you to parse the XML document with whatever you wish to use). Note depending on how flexible the format is you may have to be a lot more clever than this

Dim arrVars(7) as String
Dim intStrPos as Integer
Dim intEndPos as Integer
Dim i as Integer

intStrPos = instr$(strHTML, "ticker=") + 7
intEndPos = instr$(intStrPos, strHTML, """>")

arrVars(0) = mid$(strHTML, intStrPos, intEndPos - intStartPos)

For i = 1 to 6

intStartPos = instr$(intStrPos, strHTML, "<td align=""center"">") + 19
intEndPos = instr$(intStrPos, strHTML, "</td>")
arrVars(i) = mid$(strHTML, intStrPos, intEndPos - intStartPos)

Next i

intStartPos = instr$(intStrPos, strHTML, "<nobr>") + 6
intEndPos = instr$(intStrPos, strHTML, "</nobr>")
arrVars(7) = mid$(strHTML, intStrPos, intEndPos - intStartPos)

I haven't tested that so you may have to add or subtract 1 to some of those mid function parameters to get them to work correctly.

If you want to go the XML route than add Microsoft XML, v3.0 to your references and use that. If your html gets much more complicated (and is well formed) then it will be a lot easier in the long run.

Cheers

D

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

ReeciePoo 0 Light Poster · Answer 1 · 2007-05-15T14:32:40+00:00

The best method in my opinion would be using replace() where the <htmlcode> and replacing it with a delimiter

for instance
<a href=ishitonyou.html>Ehh?</a>
would turn out like:
,Ehh?, with the comma delimiter

ReeciePoo 0 Light Poster · Answer 2 · 2007-05-15T15:31:02+00:00

Or you could determine what doctype the html uses (xhtml strict would be ideal) and use an XML parser to cycle through the nodes.
Having said that if the html is always that simple the instr and mid should be sufficient for the task at hand.
Start with "ticker=" and grab that, then look for "<td align" and keep going from there. Remember instr allows you to specify a start character so you don't keep searching from the start of the string.

Thanks heaps for your reply but do you think you could give me an example?

ReeciePoo 0 Light Poster · Answer 3 · 2007-05-15T15:31:28+00:00

its using

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

ReeciePoo 0 Light Poster · Answer 4 · 2007-05-15T16:16:04+00:00

ReeciePoo 0 Light Poster

18 Years Ago

is it XML but?

davidcairns 1 Junior Poster · Answer 5 · 2007-05-15T18:15:19+00:00

HTML documents these days are just XML using one of the HTML DTD's. Fully formed XML HTML documents use the XHTML DTD's, while the HTML 4 documents use non standard things, like no end tags on some elements.

ReeciePoo 0 Light Poster · Answer 6 · 2007-05-16T12:44:51+00:00

HTML documents these days are just XML using one of the HTML DTD's. Fully formed XML HTML documents use the XHTML DTD's, while the HTML 4 documents use non standard things, like no end tags on some elements.

Thanks for your help but it errors. do u think u could add comments to the example just for help. and i took that chunk of html out of a source that is like 20,000characters long, so im more up the **** creek without a paddle, with no possibility of survival.. so to say

Manipulating a String that contains HTML source

Recommended Answers Collapse Answers

All 8 Replies

Recommended Answers