Remove leading and trailing white space from a string

KevinADC 0 Tallied Votes 1K Views Share

This is a routine task best done with two regexp's when using perl.

my $string = '  Mary had a little lamb.  ';
$string =~ s/^\s+//; #remove leading spaces
$string =~ s/\s+$//; #remove trailing spaces
print $string;
sut 0 Light Poster

Or:

$string =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;

KevinADC 192 Practically a Posting Shark

Yes, but benchmarking the two differnt ways is revealing:

my $string = "    Mary had a little lamb.   ";

my  $results = timethese(200000, 
        {
            'First' => sub {$string =~ s/^\s*//;$string =~ s/\s*$//;},
            'Second' => sub {$string =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;}
        },
    );
cmpthese( $results ) ;

results:

Benchmark: timing 200000 iterations of First, Second...
     First:  1 wallclock secs ( 2.25 usr +  0.00 sys =  2.25 CPU) @ 88888.89/s (n=200000)
    Second:  3 wallclock secs ( 3.73 usr +  0.00 sys =  3.73 CPU) @ 53619.30/s (n=200000)
          Rate Second  First
Second 53619/s     --   -40%
First  88889/s    66%     --

using two regexps is 66% percent faster (on my computer, results will vary).

rockslammer 0 Newbie Poster

$string =~ s/\s+$//; #remove trailing spaces
Worked very nicely for me. The expresion solved an LWP issue -- thanks

foobie 0 Newbie Poster

Actually, $string =~ s/^\s+//;$string =~ s/\s+$//; is faster, matching only when there is one or more whitespace characters.

Your Benchmark is flawed and only strips whitespace on the very first run - $string is global.

Try this benchmark instead:

#!/usr/bin/perl

use warnings;
use strict;

use Benchmark qw(cmpthese timethese);

sub double_star {
  my $string = shift;
  $string =~ s/^\s*//;
  $string =~ s/\s*$//;
  return $string;
}

sub double_plus {
  my $string = shift;
  $string =~ s/^\s+//;
  $string =~ s/\s+$//;
  return $string;
}

sub replace {
  my $string = shift;
  $string =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
  return $string;
}

sub for_star {
  my $string = shift;
  for ($string) { s/^\s+//; s/\s+$//; }
  return $string;
}

sub for_plus {
  my $string = shift;
  for ($string) { s/^\s*//; s/\s*$//; }
  return $string;
}

sub regex_or {
  my $string = shift;
  $string =~ s/(?:^ +)||(?: +$)//g;
  return $string;
}

cmpthese(
  -1,
  {
    'double_star' => q|double_star('    Mary had a little lamb.   ');|,
    'double_plus' => q|double_plus('    Mary had a little lamb.   ');|,
    'replace'     => q|replace(    '    Mary had a little lamb.   ');|,
    'for_star'    => q|for_star(   '    Mary had a little lamb.   ');|,
    'for_plus'    => q|for_plus(   '    Mary had a little lamb.   ');|,
    'regex_or'    => q|regex_or(   '    Mary had a little lamb.   ');|,
  }
);

Results:

Rate regex_or  replace for_plus double_star for_star double_plus
regex_or     55855/s       --     -47%     -49%        -60%     -73%        -84%
replace     105217/s      88%       --      -5%        -25%     -49%        -70%
for_star    110277/s      97%       5%       --        -22%     -46%        -68%
double_star 140894/s     152%      34%      28%          --     -31%        -59%
for_plus    204799/s     267%      95%      86%         45%       --        -41%
double_plus 345717/s     519%     229%     213%        145%      69%          --
KevinADC 192 Practically a Posting Shark

foobie,

excellent post and a good observation about my flawed test.

Regards,
Kevin

acca 0 Newbie Poster

Another one:

sub chomp_plus {
my $string=shift;
$string =~ s/^\s+//;
chomp $string;
return $string;

}

Rate regex_or replace for_plus double_star for_star double_plus chomp_plus
regex_or     98642/s       --    -50%     -55%        -63%     -71%        -81%       -88%
replace     196495/s      99%      --     -11%        -26%     -42%        -63%       -77%
for_plus    220554/s     124%     12%       --        -17%     -35%        -58%       -74%
double_star 265481/s     169%     35%      20%          --     -22%        -50%       -69%
for_star    341333/s     246%     74%      55%         29%       --        -35%       -60%
double_plus 526091/s     433%    168%     139%         98%      54%          --       -38%
chomp_plus  849541/s     761%    332%     285%        220%     149%         61%         --
rupert160 0 Newbie Poster

Using back references give you a one liner:

$trimmed_string =~ s/^\ *([A-Z,a-z,0-9]*)\ *$/\1/g;
bsinghrana 0 Newbie Poster

The following is more complete.
$trimmed_string =~ s/^\s(.?)\s*/$1/;

flexfeed 0 Newbie Poster

bsinghrana, I don't think your solution works.

The following appears to work for cases I tried:

$string =~ s/^\s*(.*\S+)\s*$/$1/;
bsinghrana 0 Newbie Poster

Had bunch of typos in my previous post.
here is the complete one
s/^\s(.?)\s*$/$1/

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.