Hi everyone!!
Help me please with my script. I'm using Perl Express for scripting in .
This is my script:

use warnings;
use strict;


open (file,'C:\Documents and Settings\soea\Desktop\Test.docx') || die "Can not open: $!\n";
@file = <file>;

for  (my $i = 0;
	 $i<=scalar(@file)-1;
         $i++;)
         	{
                	if ($f[@file] =~ /^Exec.+\n/)
                        {
                        splice(@file, $f, 1);
                        }
                }

close (file);

This is output:
Unrecognized \D passed through at ... line 6
Unrecognized \s passed through at ... line 6
Unrecognized \D passed through at ... line 6
Unrecognized \T passed through at ... line 6
Global symbol "@file" requires explicit package name at ... line 7
.....

Recommended Answers

All 24 Replies

Change your full filepath to:

open (file,'C:\\Documents and Settings\\soea\\Desktop\\Test.docx') ||

and give it a go.

HTH's
sinnerFA

See comments in the following where I modified your statements to avoid the errors and warnings you were getting. I still get a runtime error because the data file I use doesn't contain the same data as your file.

#!/usr/bin/perl -w
use strict;
my ( @file, @f, $f, $i );    #Declare variables before using them

#Use capital letters for filehandle. IN is a better filehandle name than 'file'.
open( IN, 'C:\Documents and Settings\soea\Desktop\Test.docx' )
  || die "Can not open: $!\n";

@file = <IN>;

for (
    $i = 0,                  #use comma, not semicolon
    $i <= scalar(@file) - 1, $i++
  )
{
    if ( $f[@file] =~ /^Exec.+\n/ ) {
        splice( @file, $f, 1 );
    }
}

close(IN);

If you still encounter problems, can you attach your data file, or show us some test data?

In this line: if ( $f[@file] =~ /^Exec.+\n/ ) { $f has not been initialized with any value. If you want to look at the first line in your input file, you would refer to it as @file[0] . The next record from the file is found in @file[1] and so on.

Thanks for your comments!!
I did so, but It is new problem appeared:

Can not open: No such file or directory

But I use correct address... What problem can it be?
I thought it is because of C:\ dir, so I changed file location to 'D:\\temp\\Test.doc' but nothing is changed.
Help me please...

Here is the file example.

This is the example of the file.

Test.docx is a large binary file. The following script opens it, loads it into the @file array and prints the first element of the array. Since this is not a text file, printing it is pointless except to show that you can open it and it contains binary data that means nothing to me.

#!/usr/bin/perl -w
use strict;
my ( @file, @f, $f, $i );    #Declare variables before using them

#Use capital letters for filehandle. IN is a better filehandle name than 'file'.
open( IN, 'C:\Users\David\Programming\Perl\Test.docx' )#Changed path to my folder location
  || die "Can not open: $!\n";

@file = <IN>;

for ($i = 0,$i <= scalar(@file) - 1, $i++)
    {
        #if ( $f[@file] =~ /^Exec.+\n/ ) { # $f has no value. What are you looking for?
        #    splice( @file, $f, 1 );
        #}
        print $file[$i]; #Prints lots of garbage and beeps.
    }

close(IN);

Does this solve your question?

What file/data type should it be in windows to execute perl script successfully?

It is easy to read text files successfully in Perl. Reading any other type of file is more difficult because you have to know how many bytes you want to read each time you read the file and what you want to do with them. If you want to translate some of the bytes in a binary file into characters you have to know where in the file these bytes can be found and how to interpret them as characters.

Did the test.docx file that you attached look like text in your program? What program created it? My Windows platform had no program associated with the file type, or couldn't guess what the file type was. I don't have MS-Word so tried to open it with Open Office Writer, unsuccessfully. Also tried to open it with a text editor, unsuccessfully. Is it a music file? Video? Executable program?

No, the test.docx is the Windows Word 2007 text document.

I copied the data to the .txt document (attached). But there is the problem with variable initialization... (attached)

Use of uninitialized value within @f in pattern match (m//) at t.pl line 13, <IN> line 255.
Use of uninitialized value within @f in pattern match (m//) at t.pl line 13, <IN> line 255.
Use of uninitialized value within @f in pattern match (m//) at t.pl line 13, <IN> line 255.

Is it some perl library for work with Windows Word? 'cause as I understood the data is stored in Windows Word as inner standard data format.

I made a few changes to get rid of error and warning messages. Also I removed the splice command because I wasn't sure what you wanted it to do. The following reads the test.txt file (you can change the path to make it work on your computer) into an array and prints only the records that start with Exec followed by other optional characters.

use warnings;
use strict;
my $i;
open (F,'C:\Users\David\Programming\Perl\Test.txt') || die "Can not open: $!\n";
my @file = <F>;

foreach (@file) {
    if ($_ =~ m/^Exec.+$/) {
        print; #print what is in $_ (the default variable)
    }
}

close (F);

your script is working good, thanks! but I need to remove exact line into the file...that's why I used "splice".

I've tryed another way but here I've met other problems that I can't solve myself...

The result is empty test.txt file and no warning messages.

use strict;
use warnings;

my $i;

sub readdata {

open (F, 'C:\Documents and Settings\Desktop\test.txt') || die "Can not open: $!\n";
	my @data = <F>;
		close (F);
			return (@data);
}


sub writedata {

open ( F, '>C:\Documents and Settings\Desktop\test.txt' ) or die "Can not open: $!";

	foreach (my @data){
		print F "$_\n";
			}
close (F);
}


foreach (my @edit = readdata()){
$_ = /\AExec.+\Z/ ;
splice (@edit, $_ , 0);
writedata(@edit);
}

I think using splice to remove some lines from an array is more difficult than just testing each member of the array and deciding whether or not to write it into your file.

use strict;
use warnings;

#my $path_and_file = 'C:\Documents and Settings\Desktop\test.txt'; #vesnushka's file location
my $path_and_file = 'C:\Users\David\Programming\Perl\test.txt'; #d5e5's file location
my @edit = readdata();
writedata(@edit);

sub readdata {

    open( F, ,'<', $path_and_file ) #open in input mode
      || die "Can not open: $!\n";
    my @data = <F>;
    close(F);
    return (@data);
}

sub writedata {
    my @arrayout = @_; # @_ contains list passed when calling this subroutine
    open( F, , '>', $path_and_file ) or die "Can not open: $!"; ##open in output mode
    foreach ( @arrayout ) {
        chomp; #remove trailing newline from $_
        unless ($_ =~ m/^Exec/) { #Do not write lines starting with 'Exec' etc. into your file
            print F "$_\n";
        }
    }
    close(F);
}

Actually it's simpler to read and write one line at a time in a loop instead of opening, closing and reopening the file and creating arrays. The following is based on KevinADC's code snippet

#!/usr/bin/perl
use strict;
use warnings;

#my $path_and_file = 'C:\Documents and Settings\Desktop\test.txt'; #vesnushka's file location
my $path_and_file = 'C:\Users\David\Programming\Perl\test.txt'; #d5e5's file location

{
   local @ARGV = ($path_and_file);
   local $^I = '.bac';
   while(<>){
      next if $_ =~ m/^Exec/;
      print;
   }
}
print "finished processing file.";

Could you explain me please what mean "$^I = '.bac';".

Could you explain me please what mean "$^I = '.bac';".

$^I = '.bac'; tells Perl do an in-place edit on the the file being processed by the <> construct. The <> construct opens whatever filehandles are named by the @ARGV array. The @ARGV array gets a list of files from the command line that calls Perl and your script, if you added any filenames to the command line after your script name. But if you didn't put filenames on the command line, you can put a statement in your script to put one or more filenames in a local copy of @ARGV. This allows you to process your files with the <> construct.

One advantage of using the <> construct is that you can use in-place editing on the file(s) processed by <>. In-place editing means that the file being read gets renamed with its original name plus the value you give to the $^I variable (in our case, '.bac'). A new, empty file with your original filename is created and any print statement within the block processing the <> construct will write a line to the file. This enables you to rewrite the file with whatever changes you wish to make. Instead of deleting a line, its easier to do an in-place edit, write the records you want to keep and don't write the records you don't want. See this example of in-place editing by Tek-tips.

Thanks a lot for your help!!
so one more question...
I need to delete empty line before "Status" word. I'm using the following string but it do nothing... I do not understand why..

s/(?=Status)\n//m;

One reason your attempt to remove the blank line before the line containing 'Status' doesn't work is because the loop is reading one line at a time into the $_ variable. You want to look ahead within the contents of $_ but the next line has not yet been read into $_ so you can't see it at this time. And by the time you read the next record that contains 'Status' you have already read and rewritten the blank line to the file so it is too late to skip it. Since the in-place edit method reads a file one line at a time it can't skip lines based on what it hasn't read yet.

You can accomplish what you want in a different way by reading the entire file into one string variable, changing the contents of the string as you want, and then writing the output to a new file. Please try the following:

#!/usr/bin/perl
use strict;
use warnings;

#my $path_and_file = 'C:\Documents and Settings\Desktop\test.txt'; #vesnushka's file location
my $path_and_file = 'C:\Users\David\Programming\Perl\test.txt'; #d5e5's file location
my $path_and_file_out = substr($path_and_file, 0, -4) . '_edited.txt';
open (F, '<', $path_and_file);
undef $/; # $/ usually contains \n to indicate end of record. 
my $string = <F>; # There's no value in $/ so entire file is read as one record.
$string =~ s/^Exec.*\n//gm; #delete all lines starting with Exec (g means global, m means multiline mode)
$string =~ s/^\s*\n(?=Status)//gm; #delete all (g for global) empty lines preceding line starting with Status
close F;

open (FOUT, '>', $path_and_file_out);
print FOUT $string;
print "Finished processing $path_and_file.\n\nLook for output in $path_and_file_out\n\n";
close FOUT;

Note that it doesn't change the original test.txt file but creates a new output file called test_edited.txt. I did it this way so I could test without having to keep replacing the original file. (I make a lot of mistakes while testing.)

.docx files are actually zip files, not sure if you knew that...
Inside of the zip (docx) files are multiple xml files. If you just want to extract the text, it's fairly easy. You just open the main xml file and strip out/replace all the xml tags. I've done it in PHP, I can give you that code if you want it. (Sorry, I'm just beginning to learn Perl and am not sure of how to do it yet.)

I'm sure Perl has a library to handle zip files, or one is available on the net.

Thank you for help so mutch!!
yes, it will be very interesting to see how PHP works... please send me a file... and tell me please what I environment should I install to use PHP?...

The easiest way that I know of to use PHP is XAMPP (from here: http://www.apachefriends.org/en/xampp.html). It is very easy to setup (pretty much does everything for you).

I can give you a link to the file tomorrow, I need to get it from school. However, you may want to note that a) it is just something I wrote quickly and some time ago, and b) the text loses all but the basic formatting. (You could parse more of the XML and retain more of the formatting if you wanted to)

That said, it does what I needed it to well: extract plaintext from .docx files.

Sorry to take so long to post the file. Here it is:
http://max-land.org/docx.zip

(The are a bunch of things in the file, but they are extracted from the word document. All you care about are the two PHP files.

Thanks a lot everyone for a help !!!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.