954,515 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Screen Scrape remove spaces/line breaks between specified tags

Hi,

I'm doing a screen scrape of a web page, which works with out any problems

What I want to do is replace the contents of tag, I can do this if the tag match exactly but in this page there are allot of blank spaces.

lbltest.Text contains the page being scrapped. The tag is formatted like this

<li class="thisclass">
			   
			     TheText
			   
             </li>


I can't to a simple replace because of all the spaces. So I need to get it to look like this

<li class="thisclass">TheText</li>


Any ideas how I might do this?

Thanks in advance

webfort
Newbie Poster
4 posts since Aug 2008
Reputation Points: 10
Solved Threads: 0
 

Hi,
you specify, what method you are using to Scrape the Page . Have you heard Regex class?

selvaganapathy
Posting Pro
547 posts since Feb 2008
Reputation Points: 44
Solved Threads: 100
 

Hi,

This is the method I used:-
http://www.dotnetjohn.com/articles.aspx?articleid=93

Not heard of that class

webfort
Newbie Poster
4 posts since Aug 2008
Reputation Points: 10
Solved Threads: 0
 

Regex is a Class that used for Regular Expressions. It is useful for Parsing.

For more detail, refer http://www.regular-expressions.info/dotnet.html

selvaganapathy
Posting Pro
547 posts since Feb 2008
Reputation Points: 44
Solved Threads: 100
 

Regex is a Class that used for Regular Expressions. It is useful for Parsing.

For more detail, refer http://www.regular-expressions.info/dotnet.html

thanks, how would I pick up on the line breaks and spaces, would it be possile to show me an example?

webfort
Newbie Poster
4 posts since Aug 2008
Reputation Points: 10
Solved Threads: 0
 

Ever heard of Trim(). Use it!

iamthwee
Posting Expert
5,950 posts since Aug 2005
Reputation Points: 1,543
Solved Threads: 439
 
Ever heard of Trim(). Use it!


I think you have missed the point, I need to replace line breaks and spaces, if you look at the example given, trim will not do this

webfort
Newbie Poster
4 posts since Aug 2008
Reputation Points: 10
Solved Threads: 0
 

Incorrect, trim does replace line breaks and spaces. Please prove me wrong but you won't.

I have just tested it:

Dim nl As String = System.Environment.NewLine
Dim test As String = "                   " + nl + nl + "     TheText  " + nl + nl


TextBox1.Text = test   'show original string in a multiline text box
TextBox2.Text = test.Trim 'show changed string in a multiline text box
iamthwee
Posting Expert
5,950 posts since Aug 2005
Reputation Points: 1,543
Solved Threads: 439
 
thanks, how would I pick up on the line breaks and spaces, would it be possile to show me an example?

Hi,

Please google it for parsing HTML using Regex class. You will get a lot. Once you can able to parse HTML Tags, Ultimately you have to Use String.Trim() to remove unwanted white spaces .

selvaganapathy
Posting Pro
547 posts since Feb 2008
Reputation Points: 44
Solved Threads: 100
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You