Hey,

Iam working with PDB files, and have some problems with extracting and calculating...

a PDB file look like thise for those how dont know:
see attached file

What I want is to read the whole file, find were there are lines with CA
atoms and calculate the distance between every two CA atom in a xzy-plane
and if the distance is over 4.3 it should give me the position of the two
CA atoms.

I am pretty stuck, but what I have done untill now is:

#!/usr/bin/perl -w


use strict;


################# Parameters ########################
#my $threshold_Å =  4.30;             # Save position with scores greater than $threshold_Å



###################### FILES ########################


#   PDB_input: input file
#
#
#
#The program is given a PDB file as input on the command line comments if there are erros



sub usage {
my ($msg) = @_;
print "$msg\n\n" if defined $msg;
print "Usage: project.pl <PDBfile.pdb>\n";
exit;
}
if (scalar @ARGV !=1){
&usage("Wrong number of arguments");
}


my $PDBfile = @ARGV ;


############## ALPHA CARBON SUBROUTINES ##################



sub get_alpha_carbon {
open(IN,'<',$PDBfile ) or die "Could not find file\n";
my @alpha_carbon = ();
my $line = '';
while (defined (my $line = <IN>)) {
chomp ($line);
if ($line =~ m/^ATOM\s+\d+\s+CA\s+\[A-Z]{3}\A\s\d+\s+\d+(\.\d+)\s+\d+(\.\d+)\s+\d+(\.\d+)$/) {
push my @alpha_carbon, my $alpha_carbon;
}
}



sub calc_chain_breaks {
my ($ax, $ay, $az, $bx, $by, $bz);


my $distance = sqrt(($ax - $bx) ** 2 + ($ay - $by) ** 2
+ ($az - $bz) ** 2);
if ($distance > 4.3){
print "....

I know I have to define my $ax, $ay, $az, $bx, $by, $bz
( which are coordinates to the two CA atoms) but I have no idea how!!!!

Thanks for your time...

Recommended Answers

All 11 Replies

here is the pdb file I was talking about

aside from any other problems, this line is wrong:

push my @alpha_carbon, my $alpha_carbon;

You can't declare your variables with "my" during the push() function. You have to predeclare them. Also, $alpha_carbon is not defind, not even initialized, in the code you posted.

I don't understand what you are trying to do here:

and calculate the distance between every two CA atom in a xzy-plane
and if the distance is over 4.3 it should give me the position of the two
CA atoms.

Give an example from the PDB file. Have you searched CPAN for any PDB modules?

Yeah I have done some researche but without any luck...

An exampel will be:
look first were there are a CA atom
in a line in the PDB file. it can look like that :

ATOM  345  CA LYS  A  24   129.78   32.123   23.98

then I want to find another line with CA atom in the file, it could be:

ATOM 746 CA  ARG  A  43.98   543.98  65.98

(I have done that by regular expression, but still have some problems )

the last 3 numbers are the xyz coordinate for the CA atoms. What I want now is to calculate the distance between the to given atoms by using eq. for distance between two points in a 3 D:
sqrt(($ax - $bx) ** 2 + ($ay - $by) ** 2+ ($az - $bz) ** 2)
if the distance is over 4.3 the program should give me a massege at witch atom possition the distance is over 4.3...

hope you understand

what I have done untill now :
CA atom is what I am looking for is is my alpha carbon

#!/usr/bin/perl -w
use strict;
open(IN,'<',"PDB.txt" ) or die "Could not find file\n";
my @all_alpha_carbon = (); #array
&get_alpha_carbon(\@all_alpha_carbon); #call sub through a reference

sub get_alpha_carbon {
my ($all_alpha_carbon) = @_;
my @alpha_carbon = ();
my $line = '';
while (defined (my $line = <IN>)) {
chomp ($line);
if ($line =~ m/^ATOM\s+(\d+)\s+CA\s+([A-Z]{3})\s+A\s+(\d+)\s+(-?\d+\.\d+)\s+(-?\d+\.\d+)\s+(-?\d+\.\d+)/){
push (@alpha_carbon, $1, $2, $3, $4, $5, $6);
push (@{$all_alpha_carbon},@alpha_carbon);
#print @alpha_carbon, "\n";
@alpha_carbon = ();
}
}


}


close IN;



&calc_chain_break(\@all_alpha_carbon);


sub calc_chain_break{
my ($all_alpha_carbon) = @_;
#for (my $i = 0.......)
my @array = @{$all_alpha_carbon};

}

It had finally succeed me to extract the importance information in a PDB file, but still have some problems with the calculations...

I will try to explain it in a more pedagogic manner

What I have now is a data set file contenting :

 2   CA       107.119   8.429   73.619
10   CA       106.868  11.078  76.312
28   CA       104.348  16.170  78.301
19   CA        104.598  13.956  75.243
59   CA          97.800  21.900  82.855
.
.
.
.
2   CA      107.119   8.429   73.619

(2 is the no of the atom CA indicate that its a carbon alpha atom and the 3 dicimale numbers indicate the x-coordinate, y-coordinate and z- cordinate)

I wanna now calculate the distance between every two alpha carbon in the given data set, meaning that calculate the distance between 2 CA and 10 CA and then between 10 CA AND 28 CA and so on....
using the distance between two point in a 3 D I cant calculate the distance. Now if the distance is over 4.3 it should give me a message at which atoms no. it is.

what I have done:

use strict;

my $distance;
open (IN, '<',"alpha_carbon.dat") or die "Could not find file\n";
my $line = '';

while (defined (my $line = <IN>)) {
   chomp $line;
   my @tmp =split(m/\t/,$line);
   my $distance =  sqrt(($tmp[3] - $bx) ** 2 + ($temp[4] - $by) ** 2
                    + ($tmp[5] - $bz) ** 2);
   if ($distance > 4.3){
   print "the distance $distance,\n"; 
    } 
 }
close IN;

have no idea how to define the next CA atom coordinate in the script I have call them $bx, $bz and $bz I kno it wrong.....

did you still need help?

did you still need help?

Yes...

I have a feeling there is a module or two that already does this type of stuff, search CPAN for 'matrix' and you may find some useful modules.

Not well tested:

use strict;
use warnings;
open (IN, '<',"alpha_carbon.dat") or die "Could not find file\n";
my $A = <IN>;#get first line of data
chomp($A);
while ($A && (my $B = <IN>)) {
   chomp $B;
   my @A = split(/\s+/,$A);
   my @B = split(/\s+/,$B);
   my $distance = sqrt( ($A[2] - $B[2]) ** 2 +
                        ($A[3] - $B[3]) ** 2 +
                        ($A[4] - $B[4]) ** 2
                       );
   if ($distance > 4.3){
      print "$A[0],$B[0] distance = $distance,\n";
   }
   $A = $B;
}
close IN;

BTW, is this school work?

No not really I'm trying to make my life easier using perl to handle a biological output. That would save me some time!!
Thanks for the help anyway:)

Did my suggestion work?

Yeah it works :) thanks

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.