d5e5 109 Master Poster

Hi all,

I need a perl script that can open files (given in command line arguments) and extract/print out any dates or times found in it. The format of the dates and times can be any reasonable format.

The problem I have is I don't know how to print out the matching part of the file once i find it. Any ideas?

-Skyrim

Let's start by assuming all dates will look like 'mm/dd/yyyy' and all times will look like 'hh:mm'. You need to express these two patterns as regular expressions, aka regexes. Hopefully, no date or time will extend beyond one record so you can read each file one line at a time. (Otherwise you would have to 'slurp' each input file so the entire file goes into a scalar variable.) Look for matches in each input record you read from your file(s) in such a way as to capture the matching portions into variables which you can print. Easier said than done, right? But once you have a script that does this, and you need to find all the other reasonable formats that dates and times could have in the file you will be reading, you will simply create and incorporate additional regex patterns into your script.

d5e5 109 Master Poster

Using File::Find and filtering the result is a good approach, but the following script having a subroutine that calls itself also seems to work.

#!/usr/bin/perl
#LowestDirs02.pl
use strict;
use warnings;
my $startdir = '/home/david/Programming';
print_dirs_without_subdirs($startdir);

sub print_dirs_without_subdirs {
    #This subroutine calls itself for each subdirectory it finds.
    #If it finds no directories, it prints the name of the directory in which
    # it is looking, $dir.
    my $dir = $_[0];
    opendir DH, $dir or die "Failed to open $dir: $!";
    my @d;
    while ($_ = readdir(DH)) {
        next if $_ eq "." or $_ eq "..";
        my $fn = $dir . '/' . $_;
        if (-d $fn) {
            push @d, $fn;
        }
    }
    if (scalar @d == 0) { #If no directories found, $dir is lowest dir in this branch
        print "$dir\n";
        return;
    }
    foreach (@d) {
        print_dirs_without_subdirs($_); #Look for directories in directory
    }
}
d5e5 109 Master Poster

k_manimuthu, I think your script overlooks the last directory entered by the File::Find. I understand that your subroutine compares each current directory with the previous, and prints the previous if it is not equal to the first part of the current. But I don't think the last directory entered by the File::Find gets pushed into @low.

k_manimuthu commented: Very Good Analyze +1
d5e5 109 Master Poster

The following incorporates a couple of improvements on the above, including open (my $fh, '<', $filenames) or die "Could not open $filenames $!";

#!/usr/bin/perl
#CheckFilesExist.pl
use 5.006;
use strict;
use warnings;
my $cur_dir = '/users/david/Programming/Perl';
my $dir2check = '/users/david/Documents';
my $filenames = "$cur_dir/files.txt";
open (my $fh, '<', $filenames) or die "Could not open $filenames $!";
while (<$fh>) {
	chomp;
	if (-e $dir2check . '/' . $_) {
		print "***FOUND*** $_ \n";
	}
	else {
		print "***NOTFOUND*** $_ \n";
	}
}
close($fh);
d5e5 109 Master Poster
#!/usr/bin/perl
#CheckFilesExist.pl
use 5.006;
use strict;
use warnings;
my $directory = '/home/david/Documents';
open (my $fh, '<', 'files.txt');
while (<$fh>) {
	chomp;
	if (-e $directory . '/' . $_) {
		print "***FOUND*** $_ \n";
	}
	else {
		print "***NOTFOUND*** $_ \n";
	}
}
d5e5 109 Master Poster

Another way to do it is to open each directory found by File::Find as a directory handle to see it contains subdirectories.

#!/usr/bin/perl
#LastSubdirs02.pl
use 5.006;
use strict;
use warnings;

use File::Find;
find( \&print_lowest_dirs, '/home/david/Programming' );

sub print_lowest_dirs {
	if (-d $File::Find::name) {
		my $dir = $File::Find::name;
		if (not has_subdir($dir)) {
			print "$dir \n";
		}
	}
}

sub has_subdir {
	my $dir = $_[0];
	opendir DH, $dir or die "Failed to open $dir: $!";
	
	while ($_ = readdir(DH)) {
		next if $_ eq "." or $_ eq "..";
		return 1 if -d $dir . '/' . $_; # Is the full path plus the filename a directory?
	}
	return 0; # Did not find any subdirectory in this directory
}
d5e5 109 Master Poster

Now I have an issue I have to go through this folder structure and insert a file delete_me txt is yhe lowest level, and I have many lowest levels.

By "lowest level folder" do you mean any folder that does not have sub-folders?

d5e5 109 Master Poster

[...]What is the meaning of the local declaration in this case, and what does encapsulating the code segment do (if anything)? I get the comment that I am removing the end of line character, but I don't know how it's working.

The example code modifies the Perl special variable that contains the value of the input record separator so that instead of reading one record at a time, the entire file is read into a scalar variable. You should restore the original value of $/ after the file is 'slurped' because $/ is a global variable that could affect any subroutines or imported module methods used by your script. One way to do this would be the following:

open my $fh, "<", "foo" or die $!;
my $save_input_record_separator = $/; #Save original value before changing it
undef $/; # enable slurp mode
my $content = <$fh>;
close $fh;
$/ = $save_input_record_separator; #Restore original value to this global variable

Another way is to enclose the code that needs the modified value of $/ within {brace or 'curly' brackets} and use the local operator to localise the modified value to the current block defined by the enclosing brace brackets. See http://perldoc.perl.org/perlvar.html where it says

You should be very careful when modifying the default values of most special variables described in this document. In most cases you want to localize these variables before changing them, since if you don't, the change may affect other modules which rely on the default …

d5e5 109 Master Poster

With the bundling option, whether the parameter is entered with one or two dashes does make a difference. For example:

#!/usr/bin/perl
#OneDashOrTwo.pl
use 5.006;
use strict;
use warnings;

use Getopt::Long qw(:config bundling);

my ($verbose, $get, $v, $g);
GetOptions(
    "verbose" => \$verbose,
    "get" => \$get
);

print "verbose = $verbose\n";
print "get = $get\n";
d5e5 109 Master Poster

The Perl model for function call and return values is simple: all functions are passed as parameters one single flat list of scalars, and all functions likewise return to their caller one single flat list of scalars. Any arrays or hashes in these call and return lists will collapse, losing their identities--but you may always use pass-by-reference instead to avoid this.

from http://perldoc.perl.org/perlsub.html
For example:

#!/usr/bin/perl
#PrintArraysBetter.pl
use 5.006;
use strict;
use warnings;

my @genres = qw(Action Adventure Animation Biography Bollywood Comedy Crime Documentary Drama);
my @languages = qw(English French Spanish Greek German Dutch Esperanto);

print_line(\@genres, \@languages); # The '\' prefix converts an array into an array reference.

sub print_line {
    # @_ now contains the following list:
    print "You passed the following list of scalar values to the subroutine:\n @_ \n";
    my ($array_ref1, $array_ref2) = @_; # Assign the two array references to scalar variables
    my @MGenres    = @{$array_ref1}; #Dereference 1st array ref 
    my @MLanguages = @{$array_ref2}; #Dereference 2nd array ref

    print "\n\n The genres array now contains: \n";
    for my $element (@MGenres) { 
        print $element . ",";
    }

    print "\n\n The languages array now contains: \n";
    for my $element (@MLanguages) {
        print $element . ",";
    }
    print "\n";
    
}    #end sub

For information about creating and using references see http://perldoc.perl.org/perlreftut.html

d5e5 109 Master Poster

Do you have a package manager? If you have ActivePerl from ActiveState then you probably have PPM. Just type ppm at the command prompt and PPM starts up with a GUI. I don't know if that will solve the problem, but package managers are an alternative way to install binaries while automatically notifying you of dependencies (if any).

d5e5 109 Master Poster
#!/usr/bin/perl
#PrintArraysBad.pl
use 5.006;
use strict;
use warnings;

my @genres = qw(Action Adventure Animation Biography Bollywood Comedy Crime Documentary Drama);
my @languages = qw(English French Spanish Greek German Dutch Esperanto);

print_line(@genres, @languages);

sub print_line {
    # @_ now contains the following list:
    print "You passed the following list of scalar values to the subroutine:\n @_ \n";
    
    my @MGenres    = shift; #Assign the first scalar item from @_ to @MGenres
    my @MLanguages = shift; #Assign the next scalar item from @_ to @MLanguages

    print "\n\n The genres array now contains: \n";
    for my $element (@MGenres) { 
        print $element . ",";
    }

    print "\n\n The languages array now contains: \n";
    for my $element (@MLanguages) {
        print $element . ",";
    }
    print "\n";
    
}    #end sub
d5e5 109 Master Poster

I'm trying to run "Perl Names2Long.pl data.csv"

I didn't realise you wanted to run the script with the data file as a run-time parameter. If you want to run it that way, try the following modified version of the script:

#!/usr/bin/perl
#Names2Long02.pl
use strict;
use warnings;

$_ = <>; #Read and skip the header record
my $max_length = 20;

while (<>) {
    chomp;
    m/"([^"]*)"/; #Captures contents of first set of double quotes (if name is not in first field, change this)
    my $name = $1;
    my $length_of_name = length($name);
    if ($length_of_name > $max_length) {
        print "$_ <<===== Length of Name is $length_of_name characters (exceeds $max_length.)\n";
    }
    else {
        print "$_\n";
    }
}
d5e5 109 Master Poster

See next post...

d5e5 109 Master Poster

I borrowed sample data (but not the program) for the following from a Parsing CSV tutorial. The tutorial recommends using the Text::CSV module, which you may want to do if the "name" field is not the first field of your records.

#!/usr/bin/perl
#Names2Long.pl
use strict;
use warnings;

$_ = <DATA>; #Read and skip the header record
my $max_length = 20;
while (<DATA>) {
    chomp;
    m/"([^"]*)"/; #Captures contents of first field (if name is not first, change this)
    my $name = $1;
    if (length($name) > $max_length) {
        print "$_ <<=========== Name exceeds $max_length characters.\n";
    }
    else {
        print "$_\n";
    }
}


__DATA__
"Name","Address","Floors","Donated last year","Contact"
"Charlotte French Cakes","1179 Glenhuntly Rd",1,"Y","John"
"Glenhuntly Pharmacy","1181 Glenhuntly Rd",1,"Y","Paul"
"Dick Wicks Magnetic Pain Relief","1183-1185 Glenhuntly Rd",1,"Y","George"
"Gilmour's Shoes","1187 Glenhuntly Rd",1,"Y","Ringo

Running this gives the following output:

"Charlotte French Cakes","1179 Glenhuntly Rd",1,"Y","John" <<=========== Name exceeds 20 characters.
"Glenhuntly Pharmacy","1181 Glenhuntly Rd",1,"Y","Paul"
"Dick Wicks Magnetic Pain Relief","1183-1185 Glenhuntly Rd",1,"Y","George" <<=========== Name exceeds 20 characters.
"Gilmour's Shoes","1187 Glenhuntly Rd",1,"Y","Ringo
d5e5 109 Master Poster

Did you try ppm before you set the http_proxy environment variable? When I was using Windows Vista Home Premium 32-bit I used ppm without knowing anything about an http_proxy and it connected OK, probably because I don't have a proxy server. Since you say you don't have a proxy server either, maybe your http_proxy variable should not be set to any value? Maybe try deleting the http_proxy environment variable by entering SET "http_proxy=" on the command line of the cmd.exe window before starting up ppm? Just guessing.

d5e5 109 Master Poster
#!/usr/bin/perl

use 5.006;
use strict;
use warnings;


foreach (@ARGV) {
	if (m/^--/) {
		print "$_ starts with two hyphens.\n";
	}
	elsif (m/^-/) {
		print "$_ starts with one hyphen.\n";
	}
}
d5e5 109 Master Poster

Hi d5e5, it works for me finally except that writing into the file is still troublesome. I can't understand why this is so as it works for you guys both by writing into the file and onto the cmd.
Thanks

Does it work if you remove the '+' from the statement opening the DATA file? Try open (DATA,">react_out.txt") or die "Can't open data"; instead of open (DATA,"+>react_out.txt") or die "Can't open data"; As I said, the original worked for me without removing the '+' but you may have a different version of Perl or a different operating system. I've never used the +>filename before. It supposedly lets you read and write to the file, instead of just reading, but your script does not read from the DATA file anyway, so you don't need the '+'. I don't know if this will help... just guessing.

d5e5 109 Master Poster

I tried Murtan's version (posted yesterday March 8) of the script. Here's how it looked when I ran it in Terminal:

david@david-laptop:~$ cd /home/david/Programming/Perl
david@david-laptop:~/Programming/Perl$ perl SearchFile.pl
Enter reaction name for searching:A1_HTTT24
Enter reaction name for searching:B3_PGAI1
Enter reaction name for searching:
Given Key isn't found in the file
david@david-laptop:~/Programming/Perl$

This created the data.txt file and here are the contents:

A1_HTTT24,GLUC_ext = GLUC 
B3_PGAI1,GLUC6P = FRUC6P

I don't know why the script wouldn't work for you, Perly. Are you sure you entered the keys correctly during the test? It has to be upper case because 'A1_HTTT24' does not equal 'a1_httt24'.

d5e5 109 Master Poster

One way to accomplish the first three tasks:

#!/usr/bin/perl
use strict;
use warnings;
use Math::Complex;

my @atoms;
#For testing, I like to read from the __DATA__ section at the end of the program.
#If you prefer, you can do the following instead
#open FH, '<', 'somefile.txt' or die $!;
#and read from <FH> instead of <DATA>
while (<DATA>) {
	next if m/^#/; #Skip line if it is a comment
	push @atoms, [split /\s+/, $_];
}

my ($stat, $size);
$size = @atoms;
foreach (0..$size-1) {
	print "\nPrinting array of atoms by ascending order of distance from atom_id $atoms[$_]->[0]\n";
	my @arr = sort_atoms_by_distance(\@atoms, $_);
	# @arr contains a sorted list of distances of all atoms relative to one atom.
	foreach my $a (@arr) {
		print $a, "\n";
	}
}

sub sort_atoms_by_distance {
	my ($aref, $i) = @_;
	my @a = @{$aref};
	my $href = {};
	my $distance;
	foreach (0..$size-1) {
		$distance = calcdist($a[$i], $a[$_]);
		$href->{$a[$_]->[0]} = $distance;
	}
	my @sortedkeys = sort { ${$href}{$a} <=> ${$href}{$b} } keys %{$href};
	my @sorted;
	foreach my $j (@sortedkeys) {
		push @sorted, "Distance between atom_id $a[$i]->[0] and atom_id $j is ${$href}{$j}";
	}
	return @sorted;
}

sub calcdist {
	my ($ref1, $ref2) = @_;
	my ($atom_id1, $x1, $y1, $z1) = ($ref1->[0], $ref1->[3], $ref1->[4], $ref1->[5]);
	my ($atom_id2, $x2, $y2, $z2) = ($ref2->[0], $ref2->[3], $ref2->[4], $ref2->[5]);
	my $dist = sqrt(($x1 - $x2)**2 + ($y1 - $y2)**2 + ($z1 - $z2)**2);
	return $dist;
}

__DATA__
#F A 1 1 1 3 3 2
#C number type mass x y …
d5e5 109 Master Poster

Note that glob will split its arguments on whitespace, treating each segment as separate pattern.

from http://perldoc.perl.org/functions/glob.html
Instead you can use the File::Glob module to override the behaviour of Perl's built-in glob function.

#!/usr/bin/perl
use 5.006;
use strict;
use warnings;

use File::Glob ':glob';
my @files = </home/david/Programming/untitled folder/*>;
foreach (@files) {
	print "$_ \n";
}
d5e5 109 Master Poster

I ran your script and didn't get any error. I added some print statements to see the content and added another $mechanize->get to get a different page. They both downloaded in about a second and the output looks complete to me.

Is it working for you now? Perhaps the servers or the internet was slow when you tested it. Or maybe you have a firewall that is blocking your access to one of those sites?

I haven't any expertise using this module. Just saying your script works OK for me. (My platform is Win/DOS, but I don't see why it wouldn't work on Linux.)

#!/usr/bin/perl -w
use strict;
use warnings;
use WWW::Mechanize;
my $url="http://eutils.ncbi.nlm.nih.gov/";
my $mechanize = WWW::Mechanize->new(autocheck => 1);
$mechanize->get($url);
my $page = $mechanize->content;
print $page; #Output looked complete to me... no error
print '*' x 75, "\n"; #Printed a bunch of asterisks to separate from next page
$mechanize->get("http://www.example.com");
print $mechanize->content; #Output looked complete to me... no error
d5e5 109 Master Poster
#! /usr/local/bin/perl 
use warnings;
use strict;
my ($gene, $pos, $d, @descs);
#program to print one gene per line with all the ontologies
while (<DATA>) {
    chomp;
    if (!defined $gene or m/^$gene/) {
        #Never mind
    }
    else {
        @descs = printsummary($gene, $pos, @descs);
    }
    ($gene, $pos, $d) = split(/\s+/);
    push (@descs, $d)
}
@descs = printsummary($gene, $pos, @descs);

sub printsummary {
    my ($g, $p, @d) = @_;
    my $str = join ", ", ($g, $p, @d);
    print "$str\n";
    return ();
}


__DATA__
gene1	pos1	description1
gene2	pos2	description2a
gene2	pos2	description2b
gene2	pos2	description2c
gene3	pos3	description3
gene4	pos4	description4a
gene4	pos4	description4b
d5e5 109 Master Poster

Convert the string into a list of characters. Convert the list into a set whose members are unique characters in the list. Then print the length of this set.

>>> str1 = "abcdeb";
>>> str1
'abcdeb'
>>> charset = set(list(str1))
>>> len(charset)
5
d5e5 109 Master Poster
#!/usr/bin/env python
allthepoem = open ("c:\users\david\programming\python\poem.txt", "U").read()
allthepoem_upper_case = allthepoem.upper()
charslist = list(allthepoem_upper_case)
my_dict = dict()
for char in charslist:
    if ord(char) < 33: # space is chr(32), chr(10) is linefeed, chr(13) is carriage return, etc.
        char = "chr(" + str(ord(char)) + ")"
    if char in my_dict:
        my_dict[char] += 1
    else:
        my_dict[char] = 1
    
for char in my_dict:
    print char, " occurs ", my_dict[char], " times"
d5e5 109 Master Poster

Would 'Aa' count as A----2 or

A----1
a----1

?

d5e5 109 Master Poster

For example, if your file were called "logins.txt" and contained data like this:

20100217 11:05:18 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1011,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100218 11:07:20 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1014,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100219 11:09:22 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1011,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100220 11:11:24 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1014,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100221 11:13:26 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1011,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100222 11:15:28 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1015,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100223 11:17:30 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1011,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100224 11:19:32 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1017,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100225 11:21:34 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1018,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100226 11:23:36 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1014,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100227 11:25:38 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1011,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100228 11:27:40 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1014,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100229 11:29:42 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1011,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100230 11:31:44 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1014,ou=Internal,ou=people,dc=eis,dc=example,dc=com>				
20100231 11:33:46 0da0fbd0 <EDMV:NOTES> From MV Modify	<uid=PY1011,ou=Internal,ou=people,dc=eis,dc=example,dc=com>

The following would print the login counts that exceed a certain threshold (3, in this script).

#!/usr/bin/perl
use strict;
use warnings;
my $threshold = 3; # If user logs in more than 3 times, you want an alert
my $log = 'C:\users\david\programming\perl\logins.txt'; #My test log file
my %count;
open (FIN, '<', $log); #Open file for reading
while (<FIN>) {
    if (m/<uid=(\w+)/) {
        $count{$1}++;
    }
}
close FIN;
foreach my $user (sort (keys %count)) {
    if ($count{$user} > $threshold) {
        send_alert($user, $count{$user});
    }
    print "$user login count is $count{$user}\n";
}

sub send_alert {
    #This subroutine should send an alert somehow.
    #I don't know how to send an email from Perl
    my ($id, $c) = @_; #Save …
d5e5 109 Master Poster

That sounds like a lot of work. This may get you started on Step number 1. I'm assuming the text files contain only file names (one per line) between double quotes. If they contain other text as well you may have to fix it to skip the unwanted text. My platform is Windows.

#!/usr/bin/perl
use strict;
use warnings;
use 5.010; #Lets you say
use File::Find;
my $inidir = "C:\\Users\\David\\Programming\\INI"; #INI directory on my computer is in C:\Users\David\Programming
# my $testdir = "C:\\Users\\David\\Programming\\TestData";
my @directories_to_search = ($inidir);
find(\&wanted, @directories_to_search);

sub wanted {
    my $f = $File::Find::name;
    if ((-e $f) && (-T $f)) { #File exists and is a text file
        say "Reading $f";
        process_file($f);
    }
}

sub process_file {
    my $f = shift @_;
    open FH, '<', $f || die "cannot open file for reading: $!";
    while (<FH>) {
        my ($filename) = m/"(.+)"/;
        say "We want a copy of $filename";
    }
    print "\n\n";
}
d5e5 109 Master Poster

Ok, but I need to do this for an arbitrary number of file names. Is there any way to completely remove the first element in the array and make the file path the first element?

Create an empty list and append what you want from sys.argv into your list.

import sys

# there is a commandline
if len(sys.argv) > 1:
    arglist = []
    # sys.argv[0] is the program filename, slice it off
    for arg in sys.argv[1:]: #Slice of sys.argv starting at sys.argv[1] up to and including the end
        arglist.append(arg)
else:
    print "usage %s arg1 arg2 [arg3 ...]" % sys.argv[0]
    sys.exit(1)

# if the arguments were This.txt That.txt Other.log
# arglist should be ['This.txt', 'That.txt', 'Other.log']
print(arglist)
vegaseat commented: nice +10
d5e5 109 Master Poster

Good answer but permit me to suggest a minor change to the above example.

#!/usr/bin/env python
#argecho.py
import sys
if sys.argv:
    print "The name of this program is:",sys.argv[0]
    print "The first command line argument I received is:",sys.argv[1]
    print "The second command line argument I received is:",sys.argv[2]
lllllIllIlllI commented: Woops :) my bad. Nicely done +2
d5e5 109 Master Poster

You're welcome. And thanks for marking the thread solved.:) A lot of people forget to do that.

d5e5 109 Master Poster

I put your sample data into a file called PhoneNbrs.txt in the same directory where I wrote the following script:

#!/usr/bin/perl
use strict;
use warnings;
my $phones = "PhoneNbrs.txt";
my $phones_errors = "PhoneNbrsErrors.txt";
open (FIN, '<', $phones); #Open file for reading
my $header = <FIN>; #Read first record of file to get past it.
open (FOUT, '>', $phones_errors); #Open file for writing
while (<FIN>) { #Read each remaining record into $_ variable, one at a time
    if ($_ =~ m/,\s*\d\d\d-\d\d\d\d\s*$/) { #If the current record matches valid number pattern
        next # Go to the top of the loop to read the next record (skip this one)
    }
    else {
        print FOUT; #Write this record into output file
    }
}
print "Done. Look in $phones_errors for errors (if any).\n";
close FIN;
close FOUT;
d5e5 109 Master Poster

I don't have a script that does it but if you create and show us an example with a few correct and incorrect phone numbers I'm sure somebody could help you make a script that prints all the correct and / or the incorrect lines separately.

We need to know if your file has a header line at the start, whether the names are enclosed in quotes, etc. Not all csv files are formatted the same. Microsoft Excel creates them one way but other programs vary the format a bit. I understand you don't want to give us real names and numbers, but if you can make up a few it will give us something to test.

d5e5 109 Master Poster

One reason your attempt to remove the blank line before the line containing 'Status' doesn't work is because the loop is reading one line at a time into the $_ variable. You want to look ahead within the contents of $_ but the next line has not yet been read into $_ so you can't see it at this time. And by the time you read the next record that contains 'Status' you have already read and rewritten the blank line to the file so it is too late to skip it. Since the in-place edit method reads a file one line at a time it can't skip lines based on what it hasn't read yet.

You can accomplish what you want in a different way by reading the entire file into one string variable, changing the contents of the string as you want, and then writing the output to a new file. Please try the following:

#!/usr/bin/perl
use strict;
use warnings;

#my $path_and_file = 'C:\Documents and Settings\Desktop\test.txt'; #vesnushka's file location
my $path_and_file = 'C:\Users\David\Programming\Perl\test.txt'; #d5e5's file location
my $path_and_file_out = substr($path_and_file, 0, -4) . '_edited.txt';
open (F, '<', $path_and_file);
undef $/; # $/ usually contains \n to indicate end of record. 
my $string = <F>; # There's no value in $/ so entire file is read as one record.
$string =~ s/^Exec.*\n//gm; #delete all lines starting with Exec (g means global, m means multiline mode)
$string =~ s/^\s*\n(?=Status)//gm; #delete all (g for global) empty lines preceding line …
d5e5 109 Master Poster

As Salem says, redirect the output to a file. For example, lets say I have a script called "my_program.pl" that prints output onto the screen. If I want to have that output go into a file instead, so I can open it with Notepad or some other program, I would enter perl my_program.pl > my_output.txt on the command line (in the cmd box in Windows). This results in the output of the program going into a new file called my_output.txt instead of displaying on the screen.

d5e5 109 Master Poster

Could you explain me please what mean "$^I = '.bac';".

$^I = '.bac'; tells Perl do an in-place edit on the the file being processed by the <> construct. The <> construct opens whatever filehandles are named by the @ARGV array. The @ARGV array gets a list of files from the command line that calls Perl and your script, if you added any filenames to the command line after your script name. But if you didn't put filenames on the command line, you can put a statement in your script to put one or more filenames in a local copy of @ARGV. This allows you to process your files with the <> construct.

One advantage of using the <> construct is that you can use in-place editing on the file(s) processed by <>. In-place editing means that the file being read gets renamed with its original name plus the value you give to the $^I variable (in our case, '.bac'). A new, empty file with your original filename is created and any print statement within the block processing the <> construct will write a line to the file. This enables you to rewrite the file with whatever changes you wish to make. Instead of deleting a line, its easier to do an in-place edit, write the records you want to keep and don't write the records you don't want. See this example of in-place editing by Tek-tips.

d5e5 109 Master Poster
<html>
<head>
<script type="text/javascript">
function Calculate(num1, num2)
{
var total = num1 + num2;
alert(num1 + " plus " + num2 + " makes " + total);
}
</script>
</head>

<body>
<form>
<input type="button" value="Click to calculate" onclick="Calculate(8, 4)" />
</form>

<p>Clicking the above button calls a function that calculates the sum of 8 plus 4 and shows an alert message.</p>

</body>
</html>
d5e5 109 Master Poster

Actually it's simpler to read and write one line at a time in a loop instead of opening, closing and reopening the file and creating arrays. The following is based on KevinADC's code snippet

#!/usr/bin/perl
use strict;
use warnings;

#my $path_and_file = 'C:\Documents and Settings\Desktop\test.txt'; #vesnushka's file location
my $path_and_file = 'C:\Users\David\Programming\Perl\test.txt'; #d5e5's file location

{
   local @ARGV = ($path_and_file);
   local $^I = '.bac';
   while(<>){
      next if $_ =~ m/^Exec/;
      print;
   }
}
print "finished processing file.";
d5e5 109 Master Poster

I think using splice to remove some lines from an array is more difficult than just testing each member of the array and deciding whether or not to write it into your file.

use strict;
use warnings;

#my $path_and_file = 'C:\Documents and Settings\Desktop\test.txt'; #vesnushka's file location
my $path_and_file = 'C:\Users\David\Programming\Perl\test.txt'; #d5e5's file location
my @edit = readdata();
writedata(@edit);

sub readdata {

    open( F, ,'<', $path_and_file ) #open in input mode
      || die "Can not open: $!\n";
    my @data = <F>;
    close(F);
    return (@data);
}

sub writedata {
    my @arrayout = @_; # @_ contains list passed when calling this subroutine
    open( F, , '>', $path_and_file ) or die "Can not open: $!"; ##open in output mode
    foreach ( @arrayout ) {
        chomp; #remove trailing newline from $_
        unless ($_ =~ m/^Exec/) { #Do not write lines starting with 'Exec' etc. into your file
            print F "$_\n";
        }
    }
    close(F);
}
d5e5 109 Master Poster

Since the script worked for me and not for you we probably have different operating systems. I suspect the troublesome statement could be $pathandfilename = "$directory\\$filename"; . That assumes that putting a backslash (actually an 'escaped' backslash -- that's why there are two of them) between the directory and the filename results in a valid path. That is true for Windows/DOS systems but not for others. Try replacing $pathandfilename = "$directory\\$filename"; by $pathandfilename = "$directory/$filename"; and please let me know if it works.

d5e5 109 Master Poster

I made a few changes to get rid of error and warning messages. Also I removed the splice command because I wasn't sure what you wanted it to do. The following reads the test.txt file (you can change the path to make it work on your computer) into an array and prints only the records that start with Exec followed by other optional characters.

use warnings;
use strict;
my $i;
open (F,'C:\Users\David\Programming\Perl\Test.txt') || die "Can not open: $!\n";
my @file = <F>;

foreach (@file) {
    if ($_ =~ m/^Exec.+$/) {
        print; #print what is in $_ (the default variable)
    }
}

close (F);
d5e5 109 Master Poster

OK, I'll take a look at your text file. Meanwhile you might want to look at http://www.wellho.net/solutions/perl-using-perl-to-read-microsoft-word-documents.html These modules work only when you have MS-Word installed on your computer, which I don't so I haven't tried them.

d5e5 109 Master Poster

Today I googled some more and found another way to get a list of files using the opendir and readdir commands. Unlike glob which only works for the current directory, these commands require you to specify a directory. This means to get the file size you need to specify the full path with the filename. The advantage of the opendir and readdir is that you use a loop to push each filename into the array, and so you can have a counter and stop adding to your array when you reach your limit. This way you don't have to adjust your array afterwards.

#!/usr/bin/perl -w
use strict;
my $directory = 'C:\Users\David\Documents';
my ($filename, $pathandfilename, @files);
my ($count, $limit) = (0, 8);
opendir DIR, $directory or die $!;
while ($filename = readdir(DIR)) {
    $pathandfilename = "$directory\\$filename";
    push @files, $pathandfilename if -s $pathandfilename <= 10000 and -f $pathandfilename and $count++ < $limit;
}
#Print each filename (including path) in array, followed by file size
foreach my $f (@files) {
    my $filesize = -s $f;
    print "$f size is: $filesize\n";
}
d5e5 109 Master Poster

Also is there a way to set your array to a certain file size? thank you in advance.

Is there a way to create the array @files a certain size?

Sorry, I don't understand this part of your question. Do you mean can you make an array of only 8 files, for example, when the my @files = grep { -f and (-s > 3000) } glob( '*.txt' ); returns an array of more than 8 files? Or do you mean the array should contain only filenames of files that are less than or equal to a certain size?

Here is an example of both:

#!/usr/bin/perl -w
use strict;
use List::Util qw(min);
#Get list of filenames ending in .txt, test for file (not directory) attribute
# and test that file size is less than or equal to 3000
#Put list of filenames into an array
my @files = grep { -f and (-s $_ <= 3000) } glob( '*.*' );

#Print each filename in array, separated by newline
foreach my $f (@files) {
    my $filesize = -s $f;
    #printf "%-25s size is %15d \n", ($f, $filesize); If you want to format what you print
    print "$f size is:   $filesize\n"
}

#To limit the size of an array, take a slice of original array:
my $size_of_original_array = @files; #Number of elements in array

my @NoMoreThanEightFiles = @files[0..(min $size_of_original_array, 8) - 1];

#Print each filename in array, separated by newline
print "\nPrinting no more than eight filenames:\n";
foreach my $f …
d5e5 109 Master Poster

It is easy to read text files successfully in Perl. Reading any other type of file is more difficult because you have to know how many bytes you want to read each time you read the file and what you want to do with them. If you want to translate some of the bytes in a binary file into characters you have to know where in the file these bytes can be found and how to interpret them as characters.

Did the test.docx file that you attached look like text in your program? What program created it? My Windows platform had no program associated with the file type, or couldn't guess what the file type was. I don't have MS-Word so tried to open it with Open Office Writer, unsuccessfully. Also tried to open it with a text editor, unsuccessfully. Is it a music file? Video? Executable program?

d5e5 109 Master Poster

My platform is Windows and there happen to be some files whose filenames end in '.txt' in the current directory. Some of them are larger than 3,000 bytes.

#!/usr/bin/perl -w
use strict;
#Get list of filenames ending in .txt, test for file (not directory) attribute
# and test that file size is greater than 3000
#Put list of filenames into an array
my @files = grep { -f and (-s > 3000) } glob( '*.txt' );

#Print each filename in array, separated by newline
foreach my $f (@files) {
    my $filesize = -s $f;
    printf "%-25s size is %15d \n", ($f, $filesize);
}

As for state machines, I think that pertains to object-oriented programming in Perl, which I haven't learned yet. Are you looking for something like what is discussed in this page about Perl design patterns ?

d5e5 109 Master Poster

The above works fine? Not for me. You don't have code tags around your code and it looks like there is a smiley in there. Plus this line, if ($ch eq "?" ¦¦ $ch eq "!" ¦¦ $ch eq ".") doesn't look right, but maybe it is because your code doesn't display correctly because it is not preceded by code tags, which would allow it to display correctly.

Also, it looks like your loop reads each entire line into $ch. So if $ch eq 'Is not this a sentence?' then $ch is not equal to '?' (because it is equal to much more than '?') and will not be counted as a sentence. I think there are more errors which I don't have time to look for, such as using double quotes when you need single quotes to print a variable name without interpolating it.

d5e5 109 Master Poster

Add a next if m/^\s*$/; command in your while loop, immediately after the chomp; . Like this:

while (<FILE>) {
    chomp;
    next if m/^\s*$/;

next if m/^\s*$/; means that if the $_ variable matches a pattern of nothing but spaces (if anything) between the start and the end of the string then skip to the top of the loop to read the next input record from your file.

d5e5 109 Master Poster

I wouldn't want to try reading sequentially from a file and appending to it within the same loop. Instead I would read from one file and print output into a new file opened for output. After you have read through your input file and written everything you want to the output file, close both files.

After doing the above, you can easily append the new file onto the end of your original file.

d5e5 109 Master Poster

Currently I use this: $value =~ s/\D//g; I simply need to modify it so that anything after SDCSDC is dropped, whether it's a number or not.

Is this possible in RegEx?

You can modify the RegEx you were using from s/\D//g; into s/(SDCSDC.*$|\D)//g;

#!/usr/bin/perl -w
use strict;

#Here is my guess of what one of your long strings might look like
my $value = "The first digit is 3, the second digit is 4, the third is 2, and the SDCSDC rest of the digits 4 in the substring to be eliminated 9 4862 will not be included in the resulting value.";

#Eliminate SDCSDC and all other characters to the end of the string
# as well as eliminating all non-digit characters
$value =~ s/(SDCSDC.*$|\D)//g;

print "The value of the portion of the string to the left of SDCSDC is: $value \n";