I have a file containing data in form :
R1 1987 or 1789 and 8585 (7654)
R2 7698 or 8656 or 74746
Now I want my file in the form
R1 1987
R1 1789
R1 8585
R1 7654
R2 7698
R2 8656
R2 74746

Hi preeti2,
You can do the following:

Since split function uses regex, you can split on or OR and OR ( OR ) and space, then print out your result as you desire like thus:

use warnings;
use strict;

my $reg = qr/ or | and |\s+\(|\)\s+|\s+/;

while (<DATA>) {
    my @val = split /$reg/, $_;
    print join( ' ' => $val[0], $_ ), $/ for @val[ 1 .. $#val ];
}

__DATA__
R1 1987 or 1789 and 8585 (7654)
R2 7698 or 8656 or 74746

Which will produce the following:

R1 1987
R1 1789
R1 8585
R1 7654
R2 7698
R2 8656
R2 74746

Please, find the full explaination of the regex used in the code above:

The regular expression:

(?-imsx: or | and |\s+\(|\)\s+|\s+)

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
   or                      ' or '
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
   and                     ' and '
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  \(                       '('
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  \)                       ')'
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
 |                        OR
----------------------------------------------------------------------
  \s+                      whitespace (\n, \r, \t, \f, and " ") (1 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

Edited 3 Years Ago by 2teez

This article has been dead for over six months. Start a new discussion instead.