d5e5 109 Master Poster

Before the $filename = <STDIN>; statement you need the following statement: my $filename; to declare the $filename variable. That should stop the "Global symbol "$filename" requires explicit package name..." warnings.

d5e5 109 Master Poster

Test.docx is a large binary file. The following script opens it, loads it into the @file array and prints the first element of the array. Since this is not a text file, printing it is pointless except to show that you can open it and it contains binary data that means nothing to me.

#!/usr/bin/perl -w
use strict;
my ( @file, @f, $f, $i );    #Declare variables before using them

#Use capital letters for filehandle. IN is a better filehandle name than 'file'.
open( IN, 'C:\Users\David\Programming\Perl\Test.docx' )#Changed path to my folder location
  || die "Can not open: $!\n";

@file = <IN>;

for ($i = 0,$i <= scalar(@file) - 1, $i++)
    {
        #if ( $f[@file] =~ /^Exec.+\n/ ) { # $f has no value. What are you looking for?
        #    splice( @file, $f, 1 );
        #}
        print $file[$i]; #Prints lots of garbage and beeps.
    }

close(IN);

Does this solve your question?

d5e5 109 Master Poster

The tr/// operator replaces individual characters with individual characters. It cannot be used to replace one character with three characters. You can write a little Perl script to translate single character codes to triple character codes but tr/// does not do it. See this thread where vignesh31 asked the same question.

d5e5 109 Master Poster

In this line: if ( $f[@file] =~ /^Exec.+\n/ ) { $f has not been initialized with any value. If you want to look at the first line in your input file, you would refer to it as @file[0] . The next record from the file is found in @file[1] and so on.

d5e5 109 Master Poster

See comments in the following where I modified your statements to avoid the errors and warnings you were getting. I still get a runtime error because the data file I use doesn't contain the same data as your file.

#!/usr/bin/perl -w
use strict;
my ( @file, @f, $f, $i );    #Declare variables before using them

#Use capital letters for filehandle. IN is a better filehandle name than 'file'.
open( IN, 'C:\Documents and Settings\soea\Desktop\Test.docx' )
  || die "Can not open: $!\n";

@file = <IN>;

for (
    $i = 0,                  #use comma, not semicolon
    $i <= scalar(@file) - 1, $i++
  )
{
    if ( $f[@file] =~ /^Exec.+\n/ ) {
        splice( @file, $f, 1 );
    }
}

close(IN);

If you still encounter problems, can you attach your data file, or show us some test data?

d5e5 109 Master Poster

I think the problem is the match that seemed to work OK with our previous tests is not specific enough for the current lig_file, which has a couple of blank lines after the second bunch of ligand data. The following code: if ($row =~ m/^\s{2}(\s|\-)/){ says that a line has good ligand data if it begins with two spaces followed by either another space or a minus sign. This matches good data BUT it also matches lines that have nothing but two or more spaces and no data. Since we don't process the contents of @DataBunch until we read a line that DOESN'T match, we come to the end of the file and quit the program without having processed the second bunch of ligand data that is saved in @DataBunch . To try and fix this we can do two things:

  1. Improve the pattern in our match condition so it matches good data but does not match lines containing nothing but spaces.
  2. After reaching the end of our file, check if there remains any unprocessed data in @DataBunch, and if so, process it.

I would modify the match condition as follows to make it more specific for matching a data line that begins with spaces followed by an optional minus sign, followed by a number consisting of digits and a decimal point. That describes the first number in the good data lines and should be specific enough to distinguish from other lines. So in your program try …

d5e5 109 Master Poster

The following program attempts to calculate for each line of ligand data the distance to each of the protein atoms. It prints 2735 lines to the screen, so you may have to redirect the output to a file if you want to examine it all. Please try running it and let me know if it does at least some of what you want to do.

#!/usr/bin/perl -w
use strict;
my @BunchOfDataLines = ();
my $matched          = 0;
my $count;
#Read entire protein file into array
my $f = "pro_file";
open( IN, $f ) or die "Can't open $f : $!";
my @proteins = grep /^ATOM/, <IN>;#Reads all 30 lines of data into @B
close(IN);
    
#Read and process ligand file
$f = "lig_file";
open( IN, $f ) or die "Can't open $f : $!";
while ( my $row = <IN> ) {
    chomp($row);
    if ( $row =~ m/^\s{2}(\s|\-)\d\d\.\d\d\d\d/ ) {
        $matched = 1;
        push( @BunchOfDataLines, $row );
    }
    elsif ($matched) {
        my @BunchofDataLines = DoSomethingWithBunchOfData(@BunchOfDataLines);
        $matched = 0;
    }
}
close(IN);

sub DoSomethingWithBunchOfData {
    my @ligands = @_;    #Expect one array passed as parameter
    $count++;
    print "\nProcessing bunch number $count of data lines...\n";
    foreach my $ligand (@ligands) { #Each member of @ligands is a line of ligand data.
        my @ligfields = split (/\s+/, $ligand);
        my $ligid = "$ligfields[4]($ligfields[14])";
        my ($ligx, $ligy, $ligz) = @ligfields[1, 2, 3];
        foreach my $protein (@proteins) {
            my @profields = split (/\s+/, $protein);
            my $proid = "$profields[2]($profields[3], $profields[5])";
            my ($prox, $proy, $proz) = @profields[6, 7, 8]; …
d5e5 109 Master Poster

I started testing your program but so far there are more problems than I can fix today. One problem is that the protein subroutine is reading all 30 lines of your Pro_file data over and over again because it is called from a loop in the DoSomethingWithBunchOfData subroutine. This is probably not the problem that causes your program to go in an infinite loop. I haven't found the reason for that because I'm finding other problems like the protein sub.

sub DoSomethingWithBunchOfData {
    my @array = @_;    #Expect one array passed as parameter
    foreach (@array) { #This loop repeats for every member of @array
.....
.....
        protein(); #Calling this subroutine repeatedly for every member of @array

        #print "pdbpr : @pdbpr\n";
        calc( \@pdblig, \@ligatoms, \@pdbpr, \@pratoms );
    }
    @array = ();
    return @array;
}
sub protein {
    open( IN, "pro_file" ) or die "Could not open file!";
    my @B = grep /^ATOM/, <IN>;#Reads all 30 lines of data into @B
    close(IN);

    foreach my $ln (@B) {
        my @pr = ( split( /\s+/, $ln ) );

        # print ("@pr\n");
        my @arraypr = ( split( /\s+/, $ln ) )[ 6, 7, 8 ];

        # print ("@arraypr\n");
        push @pdbpr,   [@arraypr];
        push @pratoms, [@pr];
    }
    return ( \@pdbpr, \@pratoms );
}

There are more problems that I don't have time to list now, such as that the protein sub is returning array references that are not being saved or used by the statement that calls the sub. The program still needs more work.

d5e5 109 Master Poster

OK, I have your data files and will test later today or tomorrow when I have time. So far I see one problem in your program. One of your subroutines is returning array references (that's good!) BUT you call this subroutine in a void context, meaning you call it without assigning the returned list to any variables.

sub ligand {
    my @A = @_;
    foreach my $rw (@A) {
        my @lig = ( split( /\s+/, $rw ) );

        #print ("@lig \n");
        my @arraylig = ( split( /\s+/, $rw ) )[ 1, 2, 3 ];

        #print ("@arraylig\n");
        push @pdblig,   [@arraylig];
        push @ligatoms, [@lig];
    }
    return ( \@pdblig, \@ligatoms ); #ligand(@array) is returning array references, good.
}

BUT the statement that calls the ligand sub doesn't save or use the list of array references that it returns.

foreach (@array) {
        ##ligand(@array) is returning array references
        ##The following should save or use these references
        ligand(@array);#This statement calls the sub, but doesn't save the refs
        ##Better to have my ($aref1, $aref2) = ligand(@array);
        ##Then dereference the array refs to use arrays.

Here is an example of a program that calls a subroutine that returns a list of array refs. The calling program then saves the array refs, and dereferences them to use the arrays to which they refer.

#!/usr/bin/perl -w
use strict;
### The following line calls the subroutine AND saves the references.
my @arrayofrefs = colorsandtastes(); #Call sub which returns list of array refs.
my $colorsref = $arrayofrefs[0]; #Save first …
d5e5 109 Master Poster

Something like this?

>>> import os
>>> print os.listdir(".")
['DLLs', 'Doc', 'include', 'Lib', 'libs', 'LICENSE.txt', 'NEWS.txt', 'python.exe', 'pythonw.exe', 'README.txt', 'Scripts', 'tcl', 'Tools', 'unicows.dll', 'w9xpopen.exe']
>>> print os.path.getsize("readme.txt")
56189
>>> print (os.path.getsize("readme.txt")) < 10000000
True
>>>
d5e5 109 Master Poster
sub calc {
(@pdblig, @ligatoms)= &ligand; # Need to call subroutines ligand and protein for calculations... Is this correct??
(@pdbpr,@pratoms) = &protein;

The above way you are trying to get two arrays from a subroutine will not work. According to what I've read, it is recommended that your subroutine return references to the arrays it has created, which the calling statement can dereference in order to use the arrays. I haven't learned how to explain or do this very well, but you can read about it here in this article called "Pass by reference".

d5e5 109 Master Poster

Once you have a hash with triplet code values for all the letter keys you can translate any sequence from a string of letters into a string of triplet codes.

#!/usr/bin/perl -w
use strict;
my %hash = (
'a' => 'ala',
'r' => 'arg',
'n' => 'asn',
'd' => 'asp',
'c' => 'cys',
'e' => 'glu',
'q' => 'gln',
'g' => 'gly',
'h' => 'his',
'i' => 'ile',
'l' => 'leu',
'k' => 'lys',
'm' => 'met',
'f' => 'phe',
'p' => 'pro',
's' => 'ser',
't' => 'the',
'w' => 'trp',
'y' => 'tyr',
'v' => 'val');

my $sequence = "verydamp"; #sequence of arbitrary letters to translate into triplet codes
my $out;
print "\n" . '$sequence = ' . "'$sequence' which translates into:\n";
while ($sequence =~ m/([a-z])/g) { #match each lowercase letter in $sequence string
    $out .= "$hash{$1}, "; # $1 contains a letter from $sequence which is a key in %hash
}
$out = substr($out,0,-2); #Remove final comma and space from end of string
print $out, "\n";
onaclov2000 commented: Great post. My thoughts exactlyy +2
d5e5 109 Master Poster

I don't think it can be done with the tr/// operator. Can you use a hash table instead?

#!/usr/bin/perl -w
use strict;
my %hash = (
'a','ala',
'r','arg',
'n','asn',
'd','asp',
'c','cys',
'e','glu',
'q','gln',
'g','gly',
'h','his',
'i','ile',
'l','leu',
'k','lys',
'm','met',
'f','phe',
'p','pro',
's','ser',
't','the',
'w','trp',
'y','tyr',
'v','val');
print "$hash{'h'}, $hash{'w'}, and so on...";
d5e5 109 Master Poster

Without the sdf and pdb files, I can't test your program. I have looked it over briefly for now. Here it is with no major changes. I added some indentation for readability plus a few comments attempting to answer some of your questions.

#!/usr/bin/perl -w
use strict;
my ( $x1, $x2, $y1, $y2, $z1, $z2 );
my @pdbpr    = ();
my @pdblig   = ();
my @pratoms  = ();
my @pdbpro   = ();
my @ligatoms = ();
#my @pratoms  = (); #No need to declare same variable twice
my @A        = ();
my @B        = ();

#sub ligand ();
#sub protein ();
#sub calc ();

open( IN, "A.sdf" ) or die "Could not open file!";
#my @A = grep /^   /, <IN>; #Remove the 'my', already declared above
@A = grep /^   /, <IN>;
close(IN);

sub ligand {
    foreach my $row (@A)
    {    # Need to get array @A which was read from a file. How??
        #Since you declared @A outside the subroutine it is globally available
        #everywhere, including this subroutine. This could work but it could
        #cause confusion. It would be better to pass as a parameter when
        #calling the subroutine, and in the subroutine assign values in @_
        #to a new array declared within your subroutine.
        my @lig = ( split( /\s+/, $row ) );
        print("@lig \n");
        my @arraylig = ( split( /\s+/, $row ) )[ 1, 2, 3 ];
        print("@arraylig\n");
        push @pdblig,   [@arraylig];
        push @ligatoms, [@lig];
    }
    return ( @pdblig, @ligatoms )    # I need this for …
d5e5 109 Master Poster

Call the findall method instead of search to get an array of matches.

def matchmyip(line):
    if ippattern.search(line):
        print ippattern.findall(line)

import re
ippattern = re.compile(r"\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b")

if __name__ == "__main__":
    s = "1999-08-01 00:00:00 212.67.129.225 - W3SVC3 PROWWW01 194.128.73.195 GET /entrance/V0_1/default.asp bhcd2=933465599 302 0 497 396 15 80 HTTP/1.1 Mozilla/4.0+(compatible;+MSIE+4.01;+Windows+95;+FREESERVE_IE4) bhCookie=1;+ASPSESSIONIDGGGGGRDN=PPLCJMKDDJBMPIMEAEDPIJGF -"
    matchmyip(s)
d5e5 109 Master Poster

I don't understand what your program is doing. It seems to be reading and trying to split and do calculations on every line in the file except the one with $$$$.

Actually, i need to get the first part of the info:
Help with Code Tags
Perl Syntax (Toggle Plain Text)
0.3230 -0.6380 -0.7700 C 0 0
1.1810 -1.2460 -1.7340 C 0 0
2.5950 -1.1640 -1.6010 C 0 0
3.1670 -0.4730 -0.5020 C 0 0
This has to be captured in an array to do some calcuations. This first part of the file needs to be read only till '$$$$', which indicates the end of the array. Then, after 1st set of calculations, go to the second set of data :
Help with Code Tags
Perl Syntax (Toggle Plain Text)
-1.3610 -1.4670 -0.0270 C 0 0
-1.4000 -2.8910 -0.0080 C 0 0
-0.1970 -3.6480 0.0080 C 0 0

I would approach this by doing the following:

  1. Read and skip the lines that do not contain useful data
  2. Save the first series of data lines into an array
  3. Do something with the array
  4. Empty the array
  5. Skip some more useless lines
  6. Again, do something with the array

To do this you need to tell your program how to know the difference between a useful data line and one that you want to skip. I would do this by testing each line that you read to see …

d5e5 109 Master Poster

Hi d5e5,

Thanks a ton for all the help!! But, i didnt quite understand the script from if statement. Could u kindly explain this? Thanks again !!!

The if statement tests to see if the first few characters of the current line match a regular expression representing a pattern that should match the data lines you want to print. The /x parameter after the regular expression lets me break the regex into several lines in order to add comments explaining what each part of the regex represents. Since the regex matches not only the good data that you want to print, but also some of the bad data found further on in the file, we assume that all the good data occurs in one unbroken series of lines, and so as soon as we encounter a line that is not good data we want to stop reading any more, so we exit the loop with the last statement.

Notice that the regex represents only the first few characters of the good data line instead of trying do specify a pattern for every character from start to end of line. It does this because it works. You often have to decide how specific you need to be to match the good lines and not match the bad lines. If you use the same regex on a different data file and find that it matches too much you may have to modify the regex to make it more specific. Also, if …

d5e5 109 Master Poster

Once you have the data in your arrays you can print them like this:

#!/usr/bin/perl -w
use strict;
my @numbers = (1, 2, 3);
my @names = ("Sachin", "Sehwag", "Yuvraj");

print "Cricketer's name and the corresponding number are\n";
while (@numbers) {
    my $num = shift(@numbers);
    my $name = shift(@names);
    print "$num $name\n";
}
d5e5 109 Master Poster
#!/usr/bin/perl -w
use strict;
my $printed = 0; #Boolean switch to indicate good data have/have not been printed
while( <DATA> ) {
    chomp;
    if (m/^\s{3}    #The line should begin with at least three spaces
            (:?     #Start of a non-capturing group (space or hyphen)
             \s     #either another space
             |      #or
             \-     #a hyphen
             )/x)   #end of group, end of pattern. x means allow comments
    {
        print "$_\n";
        $printed = 1;
    }
    elsif ($printed) {#good data already printed
        last; # so don't print any more
    }
}

__DATA__
k1082
  SciTegic08250908273D

 30 32  0  0  0  0            999 V2000
    0.3230   -0.6380   -0.7700 C   0  0
    1.1810   -1.2460   -1.7340 C   0  0
    2.5950   -1.1640   -1.6010 C   0  0
    3.1670   -0.4730   -0.5020 C   0  0
    2.2620    0.0980    0.4020 C   0  0
    0.8930    0.0350    0.2990 N   0  0
    4.8610   -0.3640   -0.3230 S   0  0
  1  2  2  0
  2  3  1  0
  3  4  2  0
  4  5  1  0
  5  6  2  0
  6  1  1  0
  4  7  1  0
  7  8  1  0
  8  9  2  0
  9  5  1  0
  8 10  1  0
M  END
> <Name>
k1082
> <AbsoluteEnergy>
127.9

> <ConfNumber>
1

$$$$
k1083
  SciTegic08250908273D

 39 42  0  0  0  0            999 V2000
   -1.3610   -1.4670   -0.0270 C   0  0
   -1.4000   -2.8910   -0.0080 C   0  0
   -0.1970   -3.6480    0.0080 C   0  0
    1.0590   -2.9880    0.0060 C   0  0
    1.0270   -1.5900   -0.0130 C   0  0
   -0.1270   -0.8360   -0.0290 N   0  0
    2.5020   -3.8850    0.0280 S   0  0
d5e5 109 Master Poster

Have you tried # -*- coding: utf-8 -*- ? According to this Unicode HOWTO page the UTF-8 encoding can handle any unicode code point. As vernondcole says, when testing you can't rely on IDLE or DOS to display the characters correctly, no matter what encoding you use. To test, write the output to a text file and open it with a text editor that can handle utf-8 encoded files. I use ActiveState's Komodo Edit.

d5e5 109 Master Poster

Here is one way to read and print the first record from a file whose name has been passed as an argument from the command line or shell that invokes the Perl program.

#!/usr/bin/perl -w
use strict;
#readfile.pl
my $filename = shift @ARGV; #Returns the value of the first parameter and removes it from @ARGV
open FILE, $filename or die "Could not read from $filename, program halting.";
# read the record, and chomp off the newline
chomp(my $record = <FILE>);
close FILE;
print "\n$record\n";
d5e5 109 Master Poster
#!/usr/bin/perl -w
use strict;
my $string = "#324423asdd asd 'BecamePosters' ";
if($string =~ m/('[a-zA-Z]{1,40}')/) { #Put parentheses around what should be $1
print "ok!\n";
print $1;
} else {
print "not ok!\n";
}
d5e5 109 Master Poster

I realize your question is about object creation and use, about which I know little, but permit me to add an FYI about navigator.appName:

When I tested Airshow's code in my browser (Chrome) I noticed that the property you are testing, navigator.appName returns 'Netscape' when run in Chrome. I don't know why, but googled it and found that navigator.appName returns 'Netscape' in Safari and Firefox as well (see http://code.google.com/p/doctype/wiki/NavigatorAppNameProperty). You will have to test some property other than navigator.appName to detect browsers other than Microsoft Internet Explorer and Opera.

d5e5 109 Master Poster

One more thing...
I forgot to mention that you'll see a lot of examples where the first line is something like #!/usr/local/bin/perl -w where the -w turns on warnings. Warnings can tell you about unused filehandles or variables with no assigned value. Or you can turn on warnings by a use warnings; statement in your program, or perl -w from the command line.

d5e5 109 Master Poster

Also use strict; nags you until you declare all variables (by adding 'my' in foreach [B]my[/B] $arg (@ARGV) for example) which seems like a nuisance at first but you soon get used to it.

d5e5 109 Master Poster

I don't know. I'm sure you've googled and I've googled and can't find any struct format argument that allows you to pack a unicode string. What you could do is encode the unicode string into an 8-bit string, which the 's' argument will accept and allow you to pack. Which encoding you use depends on what characters will be in your string. In the following I use the 'utf-8' encoding:

#!/usr/bin/env python
import struct
asc_time = "10:15 AM"
language_id = 2
#message = unicode("This is a test")
umessage = u'This is a test'
smessage = umessage.encode('utf-8')
data = struct.pack('<8sB40s',
                    asc_time,
                    language_id,
                    smessage)

print "Printing the packed data:"
print data

print "\nPrinting the unpacked data:"
print struct.unpack('<8sB40s', data)
d5e5 109 Master Poster

You're welcome. I neglected to say the test worked for me only after I changed the '==' operator to 'eq' as ItecKid pointed out. Also, I recommend using the strict module in any Perl script -- just add use strict; -- unless you have a particular reason not to use it. It warns you of unused or misspelled variable names, etc.

d5e5 109 Master Poster

I don't have bash but when I tested your Perl script from the Windows command line (after copying an html file into the current directory it worked -- until it died because I don't have a readFile subroutine. But the point is that I didn't get the "$0: no existing file available for parsing" message on my platform. It's a mystery.

Remember that naming variables in Perl is case-sensitive. I see you have variables named "$htmlFile" and "$htmlfile" in your program, but I don't see how that would cause this particular error.

d5e5 109 Master Poster

Say i searched for abasement, i would like the output to be:

abasement
abasements
abases

As woooee says, you don't have to use regex for this but if you do use it you can modify your first program as follows:

#!/usr/bin/env python
import re
f = "fsp.txt"
search_term = "abasement"
search_term = r'\b%s\b' % search_term

found = False
for line in open(f, 'r'):
    if not found:
        matchobj = re.match(search_term, line)
        if matchobj:
            found = True
    if found:
        print line,
d5e5 109 Master Poster

This program prints out whether or not each line in the file of protein names occurs in the data file.

#!/usr/bin/perl -w
use strict;
#FindTextInFile.pl
my ($names, $data) = ("ProteinNames.txt", "ProteinData.txt");
open (FILE1, $names) || die;
open (FILE2, $data) || die;
undef $/; #Enter "file-slurp mode" by emptying variable indicating end-of-record
my $string = <FILE2>; #Read entire file to be searched into a string variable
$/ = "\n"; #Restore default value to end-of-record variable

while (<FILE1>) {
    chomp; #remove new-line character from end of $_
    #Use quotemeta() to fix characters that could spoil syntax in search pattern
    my $qmname = quotemeta($_);
    if ($string =~m/$qmname/i) {
        print "***FOUND*** $_ in $data.\n";
    }
    else {
        print "***NOTFOUND*** $_ in $data\n";
    }
}
d5e5 109 Master Poster

The easiest way is to select the code to be wrapped, then click the

[/b] icon at the top of your message box. This places an opening code tag at the beginning of the selection and a closing code tag  (square brackets containing a forward slash followed by the word [b]code[/b] ).

You can modify the opening code tag to specify what kind of code you are wrapping. The default for the Perl forum is [code=perl]. When posting an example of text data I like to wrap it in code tags opening with [code=text] so it will preserve the original spacing without coloring what it supposes to be Perl commands.

For example, [CODE=text]She said, "The name is '!&#', which is pronounced 'Bang Pound'."
ATOM 2 CA LYS A 257 -3.873 -29.331 -26.757 1.00 41.55 C

In case my explanation is confusing, try the Help with Code Tags link.

d5e5 109 Master Poster

To get rid of the warning about using uninitialized values in array elements change line 13 as follows:

#my @array = (split (/\s+/, $line))[0..12]; But there are not always 13 values!
my @array = (split (/\s+/, $line));

The [0..12] after the split function attempts to define an array having 13 elements (including element [0]) by taking a slice of the array resulting from the split. But in the data you have shown us there are not always 13 space-delimited values on each line, so some of the array elements in the slice specified by [0..12] have no values, which causes a warning, but not a fatal error.

d5e5 109 Master Poster

Or, putting aside my array of references suggestion (because trying it that way gave me a headache) and building on what ItecKid said, you could do it as follows:

#!/usr/bin/perl -w
use strict;
use math::Complex;
#CalcDistAtoms.pl
my (@atoms,@x,@y,@z);

# I appended the data to this program.
# If you open a file instead, @atoms = <whatever you called your file handle>
@atoms = <DATA>;
foreach my $row (@atoms){
    my @temp = split(/ /,$row);
    my $n = $temp[1];
    $x[$n] = $temp[6];
    $y[$n] = $temp[7];
    $z[$n] = $temp[8];
}
print "Distance between atom 1 and atom 2 is ", calcdist(1, 2);
sub calcdist {
    my ($i, $j) = @_;
    my $dist = sqrt(($x[$i] - $x[$j])**2 + ($y[$i] - $y[$j])**2 + ($z[$i] - $z[$j])**2);
    }

__DATA__
ATOM 1 N LYS A 257 -5.036 -29.330 -27.709 1.0041.51 N
ATOM 2 CA LYS A 257 -3.873 -29.331 -26.757 1.00 41.55 C
d5e5 109 Master Poster

When the original poster says read all the data for all the atoms into an array it sounds like an array of arrays. There is no such thing in Perl but you can get the same result using an array of references to arrays. There are tutorials such as http://perldoc.perl.org/perlreftut.html which may help.

d5e5 109 Master Poster

Some people, when confronted with a problem, think “I know,
I'll use regular expressions.” Now they have two problems. ~Jamie Zawinski (famous Netscape engineer who didn't like Perl)

One thing I like about Perl is that it has regular expressions support built into it, so you don't have to use special modules and objects to use RegEx. When I used to work in IT, using regular expressions in a text editor saved me lots of time reformatting data. At the time I knew nothing about Perl and only a little about regular expressions, but my colleagues knew even less about them and thought I was performing miracles. Those were the days.

Your solution sounds like it works fine. Regular expressions are not the best tool for every situation but sometimes they are very handy.:)

d5e5 109 Master Poster

%s Serves as a placeholder for a string value that will be supplied with values placed after the last % character in the print statement. Likewise, %d serves as a placeholder for a signed integer decimal value. For example:

qtylist = [5, 7, 3, 11, 2]
unitlist = ['bottles', 'flocks', 'loaves', 'bags', 'cups']
itemlist = ['beer', 'geese', 'bread', 'flax', 'tea']

for i in range(5):
    print "Give me %d %s of %s" % (qtylist[i], unitlist[i], itemlist[i])

You can read more about it at http://docs.python.org/library/stdtypes.html#string-formatting-operations

d5e5 109 Master Poster

I think I would start by combining all four arrays into one array. Then sort it and the highest priority element for each user will be at the top of each group of elements starting with that user.

Then I would loop through the sorted array, printing only the first element for each user.

#!/usr/bin/perl -w
use strict;
my @arr1 = qw(susan:susan@M1.domain.com ben:ben@M1.domain.com carol:carol@M1.domain.com carol:carol@M1.domain.com ben:ben@M1.domain.com);
my @arr2 = qw(carol:carol@M2.domain.com ben:ben@M2.domain.com ben:ben@M2.domain.com susan:susan@M2.domain.com ted:ted@M2.domain.com);
my @arr3 = qw(susan:susan@M3.domain.com ted:ted@M3.domain.com ben:ben@M3.domain.com susan:susan@M3.domain.com ben:ben@M3.domain.com);
my @arr4 = qw(susan:susan@M4.domain.com ben:ben@M4.domain.com alice:alice@M4.domain.com susan:susan@M4.domain.com alice:alice@M4.domain.com);
my @arrall = sort(@arr1, @arr2, @arr3, @arr4); #Combine all four arrays and sort the elements
my $wholestring;
my ($user,$saved) = ("","");
foreach (@arrall) {
    m/(^\w+):.*$/;
    $wholestring = $&;
    $user = $1;
    if ($user ne $saved) {
        print "$wholestring\n";
        $saved = $user;
    }
}
d5e5 109 Master Poster

Congratulations. That turned out to be harder than it looked at first.

The problem of printing one record consolidating all the authors and addresses for each $jas also turned out to be harder than it looked, at least for me. In fact, I never got that part working quite right. Anyway, you're done now, so that's good.

d5e5 109 Master Poster

it shud print it afterwards sumwhat like this:
1.1.2|1. Giorgio Brajnik 2. Marji Lines |1. Dipartimento di Matematica e InformaticaUniversit&agrave; di Udine Udine Italy Italy 2. Dipartimento di Scienze StatisticheUniversit&agrave; di Udine Udine Italy 33100 Italy |Udine|Italy|78|87

You want the output for ID #1.1.2 to look like the above, right? What I get in the output file looks as follows:

1.1.2|Giorgio Brajnik|Dipartimento di Matematica e InformaticaUniversit&agrave; di Udine	Udine	Italy	Italy 
|Udine|Italy|78|87

1.1.2|Marji Lines|Dipartimento di Scienze StatisticheUniversit&agrave; di Udine	Udine	Italy	33100	Italy 
|Udine|Italy|78|87

Is why you say it doesn't work? Apart from that can you show us another line in the address_out.txt file that should have matches in the city_lan.txt file which are not printed in the output?

Do you still need a program like this? If it's important we can probably improve the results, but how good the results are depends of course on the input data. For example: /to\?ky\?/ will not match "Tokyo". The quotemeta function added the backslashes before the question mark and that avoids the run-time error you were getting but the resulting pattern tries to match a literal '?' in the string so it will not match "Tokyo". Also "United States of America" in the city_lan.txt file will never match "United States" in the address_out.txt file. If you will need to process more data like these in the same format then maybe it would be worth trying to solve all these inconsistencies in the data with a program. Otherwise you may …

d5e5 109 Master Poster

Well the code isn't quite like that. It's in an if statement that checks a field in a form, if it comes up false, then it inserts a PUSH, then goes to the next if. And at the end if the array > 0, then it's supposed to spit out the array as a string.

I added push instances to the example found at http://www.w3schools.com/JS/tryit.asp?filename=tryjs_formvalidate and tested it. It seems to work OK. If your code is too complex or context-dependent to post here, can you invent a simplified example that demonstrates the problem?

<html>
<head>
<script type="text/javascript">
function validate()
{
var at=document.getElementById("email").value.indexOf("@");
var age=document.getElementById("age").value;
var fname=document.getElementById("fname").value;
var arr = new Array(0);
var msg = "";
submitOK="true";

if (fname.length>10)
 {
 msg = "The name may have no more than 10 characters";
 alert(msg);
 arr.push(msg);
 submitOK="false";
 }
if (isNaN(age)||age<1||age>100)
 {
 msg = "The age must be a number between 1 and 100";
 alert(msg);
 arr.push(msg);
 submitOK="false";
 }
if (at==-1) 
 {
 msg = "Not a valid e-mail!";
 alert(msg);
 arr.push(msg);
 submitOK="false";
 }
if (submitOK=="false")
 {
 alert("The following errors occurred: " + arr); 
 return false;
 }
}
</script>
</head>

<body>
<form action="tryjs_submitpage.htm" onsubmit="return validate()">
Name (max 10 characters): <input type="text" id="fname" size="20"><br />
Age (from 1 to 100): <input type="text" id="age" size="20"><br />
E-mail: <input type="text" id="email" size="20"><br />
<br />
<input type="submit" value="Submit"> 
</form>
</body>

</html>
d5e5 109 Master Poster

I'm no expert but I don't see why push can't be done based on the result of a condition such as an if statement. Here's an example:

<html>
<body>

<script type="text/javascript">

var arr = new Array(0);
var s = "";

while (s != "quit")
   {
    s = prompt("Enter a word", "quit");
    if (s != "don't push")
    {
      arr.push(s);
    }
    alert("Array now contains: " + arr);
   }
</script>

</body>
</html>
d5e5 109 Master Poster

Try modifying the first split statement to allow for the possibility of a space preceding the tab separating the city from country in your city_lan.txt input file. Add a space followed by ? to match one or fewer spaces followed by the \t. The statement will look like this:

#Modified the following to split on one or no space followed by tab
    my($city,$country,$lan,$lat) = (split(/ ?\t/))[0,1,2,3];

This should eliminate the trailing space from the city and country values which could be one cause for failed matches.

d5e5 109 Master Poster

Many of the city and country values in the city_lan.txt have a trailing space character, whereas the city and country values in your address_out.txt input file do not have trailing spaces and so don't match.

d5e5 109 Master Poster

In that regex for illegal characters I should not have put the \g modifier at the end. As soon as we find the first illegal character that is all we need to know. The /g global modifier attempts to match all illegal characters in the string starting at the position where it previously found a match. All we need to know is whether there is at least one illegal character in the string so we don't need /g and it is somehow giving us inconsistent results.

In your script, try replacing the regex like this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd"
    >
<html lang="en">
<head>
    <title>Test regex punctuation filter</title>
    
<script type="text/javascript">
function checkString (strng) {
    var error = false;
//var illegalChars = /[\u0021-\u002f\u003a-\u0040\u005b-\u005e\u0060\u007b-\u007e]/g; // The /g (for global) is a goof.
    var illegalChars = /[\u0021-\u002f\u003a-\u0040\u005b-\u005e\u0060\u007b-\u007e]/; // NOT global.
    error = (illegalChars.test(strng));
return error;
}

// The following lines test the function with a string containing illegal chars.
var teststring = "Java$cript"; //Illegal character. Test should return 'true'
var errmsg = checkString(teststring);
document.writeln("Testing \"<b>" + teststring + "</b>\" results in <b>" + errmsg + "</b><p>");
var teststring = "Javascript"; //No illegal character. Test should return 'false'
var errmsg = checkString(teststring);
document.writeln("Testing \"<b>" + teststring + "</b>\" results in <b>" + errmsg + "</b><p>");
</script>

</head>
<body>

</body>
</html>
Venom Rush commented: Thanks for your help ;) +3
d5e5 109 Master Poster

In case it makes a difference, is your platform *nix, Windows or Mac?

I don't know much about http or curl but before using the system command make sure the $cmd variable has been built successfully. You may want to use Perl's quote operator qq().

#!/usr/bin/perl -w
use strict;
my $cmd = "c:\\curl\\curl -X PUT -d \"user[email]=email$_\" "
 + "-d \"user[password]=pass$_\" "
 + "http://localhost:3000/site/users/user$_";
#For now, comment out the system command and see if $cmd contains correct string value
##print "create user failed\n" if system($cmd);
print $cmd;

Running the above gives a lot of error messages.

C:\Users\David\Programming\Perl>test.pl
Use of uninitialized value $_ in concatenation (.) or string at C:\Users\David\P
rogramming\Perl\test.pl line 3.
Use of uninitialized value $_ in concatenation (.) or string at C:\Users\David\P
rogramming\Perl\test.pl line 3.
...etc.......
d5e5 109 Master Poster

I'm on Windows and don't have the scp utility so can't test your code but I notice you use the variable $currentTranStats without having assigned a value to it.

Also note that while a dot between two text variables acts as the concatenation operator, a dot within double quotes is just a dot. For example,

DB<1> $x = 'hello';
DB<2> $y = 'there';
DB<3> print "$x.$y";
hello.there

The values of the two text variables are interpolated into the resulting string, but the dot appears as a literal dot.

d5e5 109 Master Poster

I simplified my test script. It still seems to find illegal characters wherever they are in the test string.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd"
    >
<html lang="en">
<head>
    <title>Test regex punctuation filter</title>
    
<script type="text/javascript">
function checkString (strng) {
    var error = false;
    var illegalChars = /[\u0021-\u002f\u003a-\u0040\u005b-\u005e\u0060\u007b-\u007e]/g; // Don't allow any of these
    error = (illegalChars.test(strng));
return error;
}

// The following lines test the function with a string containing illegal chars.
var teststring = "Here is a str&ing with illegal characters";
var errmsg = checkString(teststring);
document.writeln("Testing \"<b>" + teststring + "</b>\" results in <b>" + errmsg + "</b><p>");
var teststring = "Here is a string without any illegal characters";
var errmsg = checkString(teststring);
document.writeln("Testing \"<b>" + teststring + "</b>\" results in <b>" + errmsg + "</b><p>");
</script>

</head>
<body>
</body>
</html>
d5e5 109 Master Poster

...the 'illegal' characters are only detected if they are the first or last letter. If it's anywhere between valid characters it goes undetected.

That's strange, I can't duplicate the problem. Here is the test script I'm using. Can you give me an example to put in the teststring variable that results in undetected illegal characters?

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
    "http://www.w3.org/TR/html4/strict.dtd"
    >
<html lang="en">
<head>
    <title>Test regex punctuation filter</title>
    
<script type="text/javascript">
function checkName (strng) {
    var error = "";
    var illegalChars = /[\u0021-\u002f\u003a-\u0040\u005b-\u005e\u0060\u007b-\u007e]/g; // Don't allow any of these
    if (strng == "") {
   	error = "Please enter your name.\n";
    }
    else if((strng.length < 2)) {
    	error = "The name is the wrong length.\n";
    }
    else if (illegalChars.test(strng)) {
    	error = "The name contains illegal characters.\n";
    }
return error;
}

// The following lines test the function with a string containing illegal chars.
var teststring = "Here is a string with ill:egal ch@racters";
var errmsg = checkName(teststring);
//alert("Testing \'" + teststring + "\' results in \'" + errmsg + "\'");
document.writeln("Testing <p><b>" + teststring + "</b><p> results in <p><b>" + errmsg);
</script>

</head>
<body>

</body>
</html>
d5e5 109 Master Poster

Try the following character class: var illegalChars = /[\u0021-\u002f\u003a-\u0040\u005b-\u005e\u0060\u007b-\u007e]/g; // Don't allow any of these Javascript supports specifying unicode characters by hexadecimal expressions like \u0060 and ranges like \u007b-\u007e .

There is a fun website at http://hamstersoup.com/javascript/regexp_character_class_tester.html that gives you the unicode expressions for any character class you specify.

d5e5 109 Master Poster

final = reorder(a, b)

Look at line 18 of your first example, where you test your first function. Variables a and b have meaning only in your function. You need to pass the variables first_str and second_str when you call your function, not a and b, which are out of scope until control is passed to your function.

for i in range(len(reorder_str)):

Also look at the for loop in your first function. It loops through a range of 0 up to the length of reorder_str. The strings in variables a and b each have length of 3, so what happens when variable i = 4? a[4] and b[4] are undefined.