String.Normalize()

Troy III 0 Tallied Votes 317 Views Share

We know we already have built-in Trim methods, but trimming doesn't get rid of internal, (and unwanted) extra spaces. -So this is where Normalize method comes to play. It trims left, it trims right, but most importantly it also trims on the inside, one could say: "it trims inside-out". In fact it treats the text-string exactly as HTML parser does.

I had hard time to actually decide how to name the method but I finally did.

String.Normallize();

Using it is strait and simple

[string_var].Normalize();

I've also taken care to make it usable in a manner like:

"".Normalize("   this   string   contains to many   spaces   ");

//to return:

>> "this string contains to many spaces"//not anymore.

Which might prove useful in rare occasions, and most probably never. Yet it doesn't hurt having it at hand, even though using it like:

"   this   string   contains to many   spaces   ".Normalize();

as with other existing methods would also be possible, but I don't consider it as clean and as readable as previous.

Yet when working with strings, - speed is always an issue...
So I wrote a test and run a few tests. Turns out that 'blazing' is not an overstatement.

To make a browser 'sweat' and escape some pseudo-optimization cheats I took a string of 1024bytes [1KB] x 100 000 iterations = 10MB worth of data processed and the results were; - well, very satisfactory. (~3 seconds). Because to open, (that is) render a page of 10MB (plain text) [locally], would most probably require more, or at least the same amount of time.

The test-string is 'a worst case scenario' highly atomized; every 2-letter "word" is separated by 2 white-space characters.

The code, which as it turns out, could also be used as a >>real-world<< browser benchmark (which will be provided here) takes only the bare algorithm of the method presented and adds some extra optimization code necessary for this lengthy string iteration [according to my experience] to be the fastest possible.

The loop used, is my fastest.
For the regexp pattern, -constructor is used,( which in modern browsers still provides a little improvement although barely noticeable).
The result assignment line is enclosed in (), where the improvement is very noticeable. etc...

The code is below; the results taken from my machine:

First click time:
Op 3.112 seconds
IE 3.136 seconds
Sa 3.316 seconds
Fx 3.402 seconds
Ch 4.801 seconds
[all latest release browser versions]

(your actual speed scale results will defer depending on your hardware)

p.s.:
the second click changes the string and the results but that's not very important because the second click will work on already normalized string.

The Test Page:

<!doctype html>
<html>
<head>
	<title>String Normalize: 100MB worth data</title>
	<style>
		#cnt { word-wrap: break-word }
	</style>
	<script>

	function go(){
	var cnt = document.getElementById('cnt');		
		var s = cnt.innerHTML;	

		var re = new RegExp("\\S+","gi");

		var c, endT, iter=100000;
		var start = new Date();
		while(iter--){ //the actual workplace
		 (c = s.match(re).join(' '));
		}
		endT = new Date();

		return cnt.innerHTML=
			"parsed in: "+
				((endT.valueOf()-start.valueOf())/1000)+
			' seconds!'+'<br>'+c.fontcolor('red');
		}	

	onclick=function(){go()}
	</script>
</head>
<body>
<p>click: test/result...</p>
<pre id='cnt'>oo  pp  qq  rr  ss  tt  uu  vv  xx  yy  zz  00  11  22  33  44  55  66  77  88  99  oo  pp  qq  rr  ss  tt  uu  vv  xx  yy  zz  00  11  22  33  44  55  66  77  88  99  oo  pp  qq  rr  ss  tt  uu  vv  xx  yy  zz  00  11  22  33  44  55  66  77  88  99  oo  pp  qq  rr  ss  tt  uu  vv  xx  yy  zz  00  11  22  33  44  55  66  77  88  99  oo  pp  qq  rr  ss  tt  uu  vv  xx  yy  zz  00  11  22  33  44  55  66  77  88  99  oo  pp  qq  rr  ss  tt  uu  vv  xx  yy  zz  00  11  22  33  44  55  66  77  88  99  oo  pp  qq  rr  ss  tt  uu  vv  xx  yy  zz  00  11  22  33  44  55  66  77  88  99  oo  pp  qq  rr  ss  tt  uu  vv  xx  yy  zz  00  11  22  33  44  55  66  77  88  99  oo  pp  qq  rr  ss  tt  uu  vv  xx  yy  zz  00  11  22  33  44  55  66  77  88  99  oo  pp  uu  vv  xx  yy  zz  00  11  22  33  44  55  66  77  88  99  oo  pp  uu  vv  xx  yy  zz  00  11  22  33  44  55  66  77  88  99  oo  pp  uu  vv  xx  yy  zz  00  11  22  33  44  55  66  77  88  99  oo  pp  uu  vv  xx  yy  zz  00  11  22  33  44  55  66  77  88  00</pre>

</body>
</html>

All suggestions and remarks are welcome.
Have fun.

String.prototype.Normalize=
/*b.b Troy III p.a.e*/
function(x){return(x||this).match(/\S+/gi).join(' ')}
Troy III 272 Posting Pro
             "                     

             "                                             
Dani 4,074 The Queen of DaniWeb Administrator Featured Poster Premium Member

Umm ... were you trying to post something?

Troy III 272 Posting Pro

I had to change my mind..., 'get inconsistent results with this...[problem]! -Would you mind checking if you are being able to prototype the String object on your Firefox?
'cause I'm having some problems with mine.

Troy III 272 Posting Pro

Nope the inf was correct!
Sorry I had to delete it
">>"You can't prototype the Sting and other built-in objects in Firefox!"<<"
Not anymoe!
I wonder though: - Is Firefox missing its old NN4.7 days - and thee reputation[?!]

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.