Hey,

Iam working with PDB files, and have some problems with extracting and calculating...

a PDB file look like thise for those how dont know:
see attached file

What I want is to read the whole file, find were there are lines with CA
atoms and calculate the distance between every two CA atom in a xzy-plane
and if the distance is over 4.3 it should give me the position of the two
CA atoms.

I am pretty stuck, but what I have done untill now is:

#!/usr/bin/perl -w


use strict;


################# Parameters ########################
#my $threshold_Å =  4.30;             # Save position with scores greater than $threshold_Å



###################### FILES ########################


#   PDB_input: input file
#
#
#
#The program is given a PDB file as input on the command line comments if there are erros



sub usage {
my ($msg) = @_;
print "$msg\n\n" if defined $msg;
print "Usage: project.pl <PDBfile.pdb>\n";
exit;
}
if (scalar @ARGV !=1){
&usage("Wrong number of arguments");
}


my $PDBfile = @ARGV ;


############## ALPHA CARBON SUBROUTINES ##################



sub get_alpha_carbon {
open(IN,'<',$PDBfile ) or die "Could not find file\n";
my @alpha_carbon = ();
my $line = '';
while (defined (my $line = <IN>)) {
chomp ($line);
if ($line =~ m/^ATOM\s+\d+\s+CA\s+\[A-Z]{3}\A\s\d+\s+\d+(\.\d+)\s+\d+(\.\d+)\s+\d+(\.\d+)$/) {
push my @alpha_carbon, my $alpha_carbon;
}
}



sub calc_chain_breaks {
my ($ax, $ay, $az, $bx, $by, $bz);


my $distance = sqrt(($ax - $bx) ** 2 + ($ay - $by) ** 2
+ ($az - $bz) ** 2);
if ($distance > 4.3){
print "....

I know I have to define my $ax, $ay, $az, $bx, $by, $bz
( which are coordinates to the two CA atoms) but I have no idea how!!!!

Thanks for your time...

Edited 3 Years Ago by happygeek: fixed formatting

here is the pdb file I was talking about

Attachments
atom name            x-coord  y-coord  z-coord
ATOM   1568  CD  LYS A 164      67.691  41.641  76.406  1.00 35.28           C  
ATOM   1569  CE  LYS A 164      67.783  42.902  75.547  1.00 38.35           C  
ATOM   1570  NZ  LYS A 164      67.022  42.734  74.320  1.00 38.75           N  
ATOM   1571  H   LYS A 164      68.074  39.876  80.998  1.00  0.00           H  
ATOM   1572  HZ1 LYS A 164      66.030  42.519  74.552  1.00  0.00           H  
ATOM   1573  HZ2 LYS A 164      67.067  43.609  73.760  1.00  0.00           H  
ATOM   1574  HZ3 LYS A 164      67.427  41.949  73.773  1.00  0.00           H  
ATOM   1575  N   GLU A 165      66.192  38.270  78.770  1.00 16.15           N  
ATOM   1576  CA  GLU A 165      65.372  37.253  78.146  1.00 15.57           C  
ATOM   1577  C   GLU A 165      63.966  37.207  78.718  1.00 15.98           C  
ATOM   1578  O   GLU A 165      63.014  37.050  77.960  1.00 16.98           O  
ATOM   1579  CB  GLU A 165      66.052  35.890  78.316  1.00 17.14           C  
ATOM   1580  CG  GLU A 165      65.369  34.757  77.550  1.00 17.29           C  
ATOM   1581  CD  GLU A 165      65.279  34.938  76.037  1.00 18.64           C  
ATOM   1582  OE1 GLU A 165      66.018  35.713  75.430  1.00 22.37           O  
ATOM   1583  OE2 GLU A 165      64.442  34.290  75.440  1.00 22.54           O  
ATOM   1584  H   GLU A 165      66.970  37.994  79.294  1.00  0.00           H  
ATOM   1585  N   LEU A 166      63.838  37.357  80.044  1.00 17.02           N  
ATOM   1586  CA  LEU A 166      62.542  37.400  80.711  1.00 16.22           C  
ATOM   1587  C   LEU A 166      61.707  38.572  80.231  1.00 15.53           C  
ATOM   1588  O   LEU A 166      60.547  38.382  79.878  1.00 17.72           O  
ATOM   1589  CB  LEU A 166      62.735  37.494  82.228  1.00 14.19           C  
ATOM   1590  CG  LEU A 166      63.217  36.225  82.922  1.00 16.56           C  
ATOM   1591  CD1 LEU A 166      63.595  36.504  84.369  1.00 16.71           C  
ATOM   1592  CD2 LEU A 166      62.119  35.185  82.808  1.00 16.66           C  
ATOM   1593  H   LEU A 166      64.652  37.450  80.583  1.00  0.00           H  
ATOM   1594  N   ASN A 167      62.295  39.763  80.144  1.00 16.25           N  
ATOM   1595  CA  ASN A 167      61.600  40.953  79.678  1.00 20.12           C  
ATOM   1596  C   ASN A 167      61.204  40.859  78.210  1.00 23.67           C  
ATOM   1597  O   ASN A 167      60.113  41.282  77.816  1.00 22.63           O  
ATOM   1598  CB  ASN A 167      62.485  42.174  79.898  1.00 18.80           C  
ATOM   1599  CG  ASN A 167      62.845  42.445  81.359  1.00 21.79           C  
ATOM   1600  OD1 ASN A 167      63.766  43.216  81.628  1.00 25.03           O  
ATOM   1601  ND2 ASN A 167      62.194  41.866  82.372  1.00 16.65           N  
ATOM   1602  H   ASN A 167      63.239  39.835  80.390  1.00  0.00           H  
ATOM   1603 HD21 ASN A 167      62.503  42.076  83.276  1.00  0.00           H  
ATOM   1604 HD22 ASN A 167      61.450  41.258  82.187  1.00  0.00           H  
ATOM   1605  N   ASP A 168      62.076  40.255  77.395  1.00 23.85           N  
ATOM   1606  CA  ASP A 168      61.763  40.023  76.000  1.00 25.60           C  
ATOM   1607  C   ASP A 168      60.672  38.981  75.840  1.00 25.64           C  
ATOM   1608  O   ASP A 168      59.963  39.017  74.837  1.00 27.97           O  
ATOM   1609  CB  ASP A 168      63.014  39.571  75.236  1.00 29.47           C  
ATOM   1610  CG  ASP A 168      64.097  40.639  75.066  1.00 33.44           C  
ATOM   1611  OD1 ASP A 168      63.891  41.803  75.427  1.00 33.68           O  
ATOM   1612  OD2 ASP A 168      65.171  40.287  74.572  1.00 36.36           O  
ATOM   1613  H   ASP A 168      62.936  39.958  77.753  1.00  0.00           H

aside from any other problems, this line is wrong:

push my @alpha_carbon, my $alpha_carbon;

You can't declare your variables with "my" during the push() function. You have to predeclare them. Also, $alpha_carbon is not defind, not even initialized, in the code you posted.

I don't understand what you are trying to do here:

and calculate the distance between every two CA atom in a xzy-plane
and if the distance is over 4.3 it should give me the position of the two
CA atoms.

Give an example from the PDB file. Have you searched CPAN for any PDB modules?

Yeah I have done some researche but without any luck...

An exampel will be:
look first were there are a CA atom
in a line in the PDB file. it can look like that :

ATOM  345  CA LYS  A  24   129.78   32.123   23.98

then I want to find another line with CA atom in the file, it could be:

ATOM 746 CA  ARG  A  43.98   543.98  65.98

(I have done that by regular expression, but still have some problems )

the last 3 numbers are the xyz coordinate for the CA atoms. What I want now is to calculate the distance between the to given atoms by using eq. for distance between two points in a 3 D:
sqrt(($ax - $bx) ** 2 + ($ay - $by) ** 2+ ($az - $bz) ** 2)
if the distance is over 4.3 the program should give me a massege at witch atom possition the distance is over 4.3...

hope you understand

what I have done untill now :
CA atom is what I am looking for is is my alpha carbon

#!/usr/bin/perl -w
use strict;
open(IN,'<',"PDB.txt" ) or die "Could not find file\n";
my @all_alpha_carbon = (); #array
&get_alpha_carbon(\@all_alpha_carbon); #call sub through a reference

sub get_alpha_carbon {
my ($all_alpha_carbon) = @_;
my @alpha_carbon = ();
my $line = '';
while (defined (my $line = <IN>)) {
chomp ($line);
if ($line =~ m/^ATOM\s+(\d+)\s+CA\s+([A-Z]{3})\s+A\s+(\d+)\s+(-?\d+\.\d+)\s+(-?\d+\.\d+)\s+(-?\d+\.\d+)/){
push (@alpha_carbon, $1, $2, $3, $4, $5, $6);
push (@{$all_alpha_carbon},@alpha_carbon);
#print @alpha_carbon, "\n";
@alpha_carbon = ();
}
}


}


close IN;



&calc_chain_break(\@all_alpha_carbon);


sub calc_chain_break{
my ($all_alpha_carbon) = @_;
#for (my $i = 0.......)
my @array = @{$all_alpha_carbon};

}

Edited 3 Years Ago by pyTony: fixed formatting

It had finally succeed me to extract the importance information in a PDB file, but still have some problems with the calculations...

I will try to explain it in a more pedagogic manner

What I have now is a data set file contenting :

 2   CA       107.119   8.429   73.619
10   CA       106.868  11.078  76.312
28   CA       104.348  16.170  78.301
19   CA        104.598  13.956  75.243
59   CA          97.800  21.900  82.855
.
.
.
.
2   CA      107.119   8.429   73.619

(2 is the no of the atom CA indicate that its a carbon alpha atom and the 3 dicimale numbers indicate the x-coordinate, y-coordinate and z- cordinate)

I wanna now calculate the distance between every two alpha carbon in the given data set, meaning that calculate the distance between 2 CA and 10 CA and then between 10 CA AND 28 CA and so on....
using the distance between two point in a 3 D I cant calculate the distance. Now if the distance is over 4.3 it should give me a message at which atoms no. it is.

what I have done:

use strict;

my $distance;
open (IN, '<',"alpha_carbon.dat") or die "Could not find file\n";
my $line = '';

while (defined (my $line = <IN>)) {
   chomp $line;
   my @tmp =split(m/\t/,$line);
   my $distance =  sqrt(($tmp[3] - $bx) ** 2 + ($temp[4] - $by) ** 2
                    + ($tmp[5] - $bz) ** 2);
   if ($distance > 4.3){
   print "the distance $distance,\n"; 
    } 
 }
close IN;

have no idea how to define the next CA atom coordinate in the script I have call them $bx, $bz and $bz I kno it wrong.....

Edited 3 Years Ago by pyTony: fixed formatting

I have a feeling there is a module or two that already does this type of stuff, search CPAN for 'matrix' and you may find some useful modules.

Not well tested:

use strict;
use warnings;
open (IN, '<',"alpha_carbon.dat") or die "Could not find file\n";
my $A = <IN>;#get first line of data
chomp($A);
while ($A && (my $B = <IN>)) {
   chomp $B;
   my @A = split(/\s+/,$A);
   my @B = split(/\s+/,$B);
   my $distance = sqrt( ($A[2] - $B[2]) ** 2 +
                        ($A[3] - $B[3]) ** 2 +
                        ($A[4] - $B[4]) ** 2
                       );
   if ($distance > 4.3){
      print "$A[0],$B[0] distance = $distance,\n";
   }
   $A = $B;
}
close IN;

No not really I'm trying to make my life easier using perl to handle a biological output. That would save me some time!!
Thanks for the help anyway:)

This article has been dead for over six months. Start a new discussion instead.