How do I calculate my RMSD(root mean square deviation) score for my PDB protein structures, atom by atom.

I wish to calculate it line by line. But I do not know how to go about it? I was thinking of using a for loop but as you can see, I've got too many for loops.

PS: I'm reading my file from a text file and extracting my coordinates from it.

``````//------------------------------Translation-------------------------------------

for(int a=0; a<coordinates1.length; a++) {
for(int b=0; b<3; b++){
for(int c=0; c<3; c++){

X = coordinates1[1][0]/coordinates2[1][0];
Y = coordinates1[1][1]/coordinates2[1][1];
Z = coordinates1[1][2]/coordinates2[1][2];

double [][] Translation1={ {X,0.0,0.0}, {0.0,Y,0.0}, {0.0,0.0,Z} };

coordinates2T[a][b] = coordinates2[a][c]*Translation1[c][b];

//-------------------------Rotation & RMSD Cal----------------------------------

//Uses Leohard Euler's angle of rotation theorem

double [][] Rotz={ {Math.cos(0.087266463), Math.sin(-0.087266463), 0.0},
{Math.sin(0.087266463), Math.cos(0.087266463), 0.0},
{0.0, 0.0 , 1.0} };

double [][] RotX={ {1.0,0.0,0.0},
{0.0,Math.cos(0.087266463), Math.sin(-0.087266463)},
{0.0,Math.sin(0.087266463), Math.cos(0.087266463)} };

double [][] RotZ={ {Math.cos(0.087266463), Math.sin(0.087266463), 0.0},
{Math.sin(-0.087266463), Math.cos(0.087266463), 0.0},
{0.0, 0.0 , 1.0} };

coordinatesRotz[a][b]= coordinates2T[a][c]*Rotz[c][b];
coordinatesRotX[a][b]= coordinatesRotz[a][c]*RotX[c][b];
coordinatesRotZ[a][b]= coordinatesRotX[a][c]*RotZ[c][b];

for(a=0; a<coordinates1.length; a++){

R1 =Math.sqrt((Math.pow((coordinates1[a][0]-coordinatesRotZ[a][0]),2)
+Math.pow((coordinates1[a][1]-coordinatesRotZ[a][1]),2)
+Math.pow((coordinates1[a][2]-coordinatesRotZ[a][2]),2)));
}

if( R1!= 0.0)
System.out.println("R1="+fmt.format(R1));

}
}
}

}// end of main
}``````
2
Contributors
8
Replies
9
Views
6 Years
Discussion Span
Last Post by hazeeel

I wish to calculate it line by line.

Can you explain this?
Line by line implies a loop.

What does this loop have to do with the existing loops?
Can you do this part of the calculation in another part of the code away from the existing loops?

Can you explain this?
Line by line implies a loop.

What does this loop have to do with the existing loops?
Can you do this part of the calculation in another part of the code away from the existing loops?

Line 24- 35: Matrices for the rotation
Line 38- 40: Implementing the rotation
Line 43- 48: Calculation of the RMSD(Root Mean Square Deviation: Measurement of the average distance between the backbones of superimposed proteins)
Line 51- 52: Printing of my RMSD score.

So I may use a for loop? Because as you can see, I'm using the variable 'a' which is the same variable, I'm afraid that it might affect it.

Calculation in another part of the code away from the existing loops:
Do you mean creating a class file which computes all the calculation and my main class will receive the value?

Sorry, I don't understand the logic of your program.

I wish to calculate it line by line

Can you define what a 'line' is and how it relates to the data in your program?
And how that calculation relates to the ones the program is currently doing.

I'm using the variable 'a' which is the same variable, I'm afraid that it might affect it.

'a' is the for loop variable. You shouldn't change that inside the loops unless you want to skip over a value in the array.

Sorry, I don't understand the logic of your program.

I wish to calculate it line by line

Can you define what a 'line' is and how it relates to the data in your program?
And how that calculation relates to the ones the program is currently doing.

I'm using the variable 'a' which is the same variable, I'm afraid that it might affect it.

'a' is the for loop variable. You shouldn't change that inside the loops unless you want to skip over a value in the array.

Sorry, I don't understand the logic of your program.

Can you define what a 'line' is and how it relates to the data in your program?
And how that calculation relates to the ones the program is currently doing.

'a' is the for loop variable. You shouldn't change that inside the loops unless you want to skip over a value in the array.

Alright, I am currently doing a bioinformatics algorithm right now: Superposition Theorem.
The Superposition algorithm calculates how similar a protein is to another through Translation, Rotation and RMSD calculation.

What it does is that it first translates the query protein such that it will be on top of the target protein. Next, the query protein will have to undergo quite a no. of 'cycles'(rotation & RMSD calculation) until the score ideal. (The lower the RMSD score, the better it is)

Now, I'm doing the 'cycles' part. So, if you refer to the codes above, I've got the forumla for the rotation. I'm gonna rotate 5 degrees each time and calculate the RMSD score to get the best lowest score.

My data: Protein structure coordinates.
Here's an example in .txt:

ATOM 1 N VAL A 1 -3.595 15.273 13.451 1.00 38.34 N
ATOM 2 CA VAL A 1 -3.441 15.678 14.859 1.00 25.04 C
ATOM 3 C VAL A 1 -2.671 14.628 15.651 1.00 21.23 C
ATOM 4 O VAL A 1 -3.060 13.462 15.710 1.00 22.70 O
ATOM 5 CB VAL A 1 -4.828 15.869 15.483 1.00 34.86 C
ATOM 6 CG1 VAL A 1 -4.697 15.684 16.983 1.00 41.45 C
ATOM 7 CG2 VAL A 1 -5.400 17.237 15.157 1.00 41.70 C
Note: There are about a few thousand 'lines' in each file. The above is just a small portion.

The red coloured numbers: the coordinates of the atom- X, Y and Z respectively.

Thus, a line would be a set of coordinates(X, Y, Z) of 1 atom.
And I have extracted those coordinates and stored it in a 2D array(a x 3): {X, Y, Z}

Thank you for offering your help once again.

Oh right, I just realized. The RMSD scoring I need to do is:
To find out the distance is to find out the distance of 1 atom to the distance of the rest.

Eg. Dist. of X,Y, Z of ATOM1(1st line) to ATOM2 (2nd line), ATOM3(3rd line), ATOM4, ATOM5 etc.

And after that do the same thing for the rest and find the avg score.

There's a typo: to find out the distance of 1 atom to the distance of the rest of the atoms.

Eg. Dist. of X,Y, Z of ATOM1(1st line) to ATOM2 (2nd line), ATOM3(3rd line), ATOM4, ATOM5 etc.

Dist. of X,Y, Z of ATOM2(2ndline) to ATOM1 (1st line), ATOM3(3rd line), ATOM4, ATOM5 etc.

I am thinking now, which is the best way/ method for this.

OH NOOO!! I just realized I explained it wrongly. Please igonore the RMSD part for the above 2 posts.

Here's the real one:

RMSD SCORING:

The distance of ATOM1 (x, y, z) of the QUERY FILE(coordinatesRotZ) to the distance of ALL the ATOMS in the TARGET file(coordinates1) and subsequently,
ATOM2 (x, y, z) of the QUERY FILE(coordinatesRotZ) to the distance of ALL the ATOMS in the TARGET file(coordinates1) and it goes on and on..

And then I need find the average.

Heres the RMSD formula:

``RMSD =Math.sqrt((Math.pow((x1 - x2),2)+Math.pow((y1- y2),2)+Math.pow((z1- z2, 2)));``
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.