Data consolidation across an array a.k.a. Hashed to death... heeelp...

Question

BioTeq 0 Newbie Poster

17 Years Ago

First of all, hello!
I've been reading the forums for quite a while now, but this time I need real help.

To summarize things: I'm building a parser which has to consolidate data based on variables contained in an array.
The source file contains a set of tab-separated-values, and those are parsed out into an array which contains
pdbID | resNum | resID | secstructID, these are then consolidated into a file which should contain:
pdbID | startRes | endRes | secstructID

source array after parsing a file has the data for consolidation:
1b6g 1 M \N
1b6g 2 V \N
1b6g 3 N \N
1b6g 4 N H
1b6g 5 N H
1b6g 6 N \N
3hba 7 W H
2cdg 8 N H
2cdg 9 V \N
2cdg 10 M \N
2cdg 11 A B
2cdg 12 M \N

expected result after consolidation, should be:
1b6g 1 3 \N
1b6g 4 5 H
1b6g 6 6 \N
3hba 7 7 H
2cdg 5 6 H
2cdg 7 7 \N
2cdg 8 8 H
2cdg 9 10 H
2cdg 11 11 B
2cdg 12 12 \N

As you can see each pdbID is assigned a secStructID in a sequential manner and any interruptions in the secStructID are considered points from which the assignment restarts (should).
Each pdbID can thus have multiple occurences of for example \N in different places of the sequence and they are differentiated by the startRes and endRes values which are all derived from the resNum.
All is wonderful and I have a working code which consolidates the data, unfortunately it doesn't recognize the occurence of the new secstructID automatically as the end of the previous one rather it finds the last possible in the whole sequence for one pdbID and considers that as the end.

and so my result is incorrectly displayed as:
1b6g 4 5 H
1b6g 1 6 \N ---- error here - this should be in fact two separate "entities" because 4 and 5 do not belong to \N
3hba 7 7 H
2cdg 8 8 H
2cdg 9 12 \N ---- same here (7 and 8 should break this into two)
2cdg 11 11 B

And here's my code:

#!/usr/bin/perl -w
use strict;
use warnings;

#   --------------------------------------------------------------
# This script uses the residue.txt file generated by
# resTabmakerBatch.pl and creates a new file called
# SecStructList.txt
# each protein is described by secondary structures with a
# pdbID, 2ry structureID (char or \N'), startResidue, endResidue
# Input: residue.txt (this file is the output of resTabmakerBatch.pl)
# Output: secStructList.txt
# usage: secStructList.txt to populate the SecStructure entity
#   --------------------------------------------------------------

#Read arguments, print error message if insufficient
if ($#ARGV<0)
{
 die("\n\nUsage:  sstruct.pl [residue_table_file.txt]\n\n");
}

my $filename = $ARGV[0];

#if either file not found return error message
if (! -e "$filename")
{
 die("\n\nresidue file $filename does not exist!\n\n");
}

 # Read residue.txt file, extracting the data of interest - only
 # pdb id, resNum, resID, secondaryStructID

#First read file, storing each line in an array 'dssplines' splitting the
data
open (MYFILE,"$filename") or die ("\nERROR: Can't open $filename\n");
 my @dssplines= split(/\r/, <MYFILE>);
 my $arraySize=@dssplines;
close(MYFILE);
#read one line from the originally loaded array dssplines at a time and loop
#over it splitting the values using the tabs
 my @dsspdata;
 my $dsspdataSize=@dssplines;

 my $n=0;
 for (my $i=0; $i < $arraySize; $i++)
 {
  #each line from the array goes into a new dsspline variable
  my $dsspline = $dssplines[$i];
  for (my $j = 0; $j <=4; $j++)
  {
   #each time values inside are separated using the tabs
   my ($pdbID, $resNo, $resID, $phi, $psi, $chi1, $chi2, $secStruct,
$activesite) = split(/\t/, $dsspline);
   # now each value of interest is stored into a new array @dsspdata
   $dsspdata[$n][0] = $pdbID;
   $dsspdata[$n][1] = $resNo;
   $dsspdata[$n][2] = $resID;
   $dsspdata[$n][3] = $secStruct;
  }
  $n++;
 }

#my @dsspdata array is now perfect to reformat into a hash analyzing the
value correlation

 #initialize the hash and counter
 my %dane;
 my $k=0;
 #loop around the dsspdata array
 for (my $i=0; $i < $dsspdataSize; $i++)
 {
  #split each cell in a row into variables for the hash
  for (my $k = 0; $k <=4; $k++)
  {
   my $pdb = $dsspdata[$i][0];
   my $residueNum = $dsspdata[$i][1];
   my $secStructure = $dsspdata[$i][3];
   push @{ $dane{$pdb}->{$secStructure} }, $residueNum;
  }
  $k++;
 }

 #now for each pdbID using the hash keys
 foreach my $pdbID ( keys %dane )
 {
  #check the secondary structure id with pdbID as a key (only if the pdbID
is the same will the values be stored)
  foreach my $secID ( keys %{ $dane{$pdbID} } )
  {
   #finally create an array of residue numbers
   my @resnums = ( $dane{$pdbID}->{$secID}->[0],
$dane{$pdbID}->{$secID}->[-1] );
   #create a new file with the secondary structures list
   open (SStruc, ">>secStructList.txt") || die "Can't open file: $!";
   #append each line to the new file with tab separated data
   print SStruc ("$pdbID \t @resnums \t $secID\n");
  }
 }
close(SStruc);

I have attached the source file (this file is already processed from another script, which wasn't nearly as complicated as this issue ;))
Run the program with residue.txt as attribute.

If anyone has an idea how to deal with this I would be very grateful for suggestions, as you can see from my code I am slightly java-twisted.

Cheers,
Matt

perl

residue.txt (70.89 KB)

The attachment preview is chopped off after the first 10 KB. Please download the entire file.

1b6g	    1	M	 360.0	 152.2	77	178	\N	N
1b6g	    2	V	 -64.1	 136.6	159	999	\N	N
1b6g	    3	N	 -68.1	 110.3	-172	5	\N	N
1b6g	    4	A	-143.2	 156.8	\N	\N	E	N
1b6g	    5	I	-125.8	 164.7	59	-178	E	N
1b6g	    6	R	-125.2	 128.1	179	171	\N	N
1b6g	    7	T	 -66.7	 129.8	-53	999	\N	N
1b6g	    8	P	 -56.5	 138.9	-29	41	\N	N
1b6g	    9	D	 -62.4	 -27.3	-162	53	G	N
1b6g	   10	Q	 -60.1	 -24.3	-174	-81	G	N
1b6g	   11	R	 -62.1	 -17.4	-61	-41	G	N
1b6g	   12	F	-103.4	   4.9	-53	-78	G	N
1b6g	   13	S	 -60.2	 -48.1	-179	999	S	N
1b6g	   14	N	 -99.6	 103.7	-172	-4	S	N
1b6g	   15	L	-140.9	 139.8	-71	175	\N	N
1b6g	   16	D	 -90.7	 117.6	-174	-75	S	N
1b6g	   17	Q	  55.8	  49.1	-58	-72	S	N
1b6g	   18	Y	-132.8	  73.6	-170	85	\N	N
1b6g	   19	P	 -89.2	  16.6	37	-42	\N	N
1b6g	   20	F	 -88.6	 130.0	-45	-69	\N	N
1b6g	   21	S	 -75.9	 146.7	-68	999	\N	N
1b6g	   22	P	 -77.4	 146.0	26	-37	\N	N
1b6g	   23	N	-131.0	 148.2	-51	-87	E	N
1b6g	   24	Y	-133.6	 144.3	-59	-78	E	N
1b6g	   25	L	-119.7	 117.1	-56	-179	E	N
1b6g	   26	D	 -99.8	 138.6	-89	-64	E	N
1b6g	   27	D	  53.2	  44.3	-54	-57	S	N
1b6g	   28	L	 -52.0	 150.9	-74	162	\N	N
1b6g	   29	P	 -54.0	 126.2	-27	39	T	N
1b6g	   30	G	  94.7	  -8.5	\N	\N	T	N
1b6g	   31	Y	-133.0	  50.9	-60	-86	\N	N
1b6g	   32	P	 -66.1	 139.1	28	-40	T	N
1b6g	   33	G	  82.9	  -3.6	\N	\N	T	N
1b6g	   34	L	-109.5	 134.7	-63	170	\N	N
1b6g	   35	R	-106.5	 127.2	-171	174	E	N
1b6g	   36	A	-104.3	 137.3	\N	\N	E	N
1b6g	   37	H	 -95.3	 141.4	174	-107	E	N
1b6g	   38	Y	-149.6	 150.4	65	-75	E	N
1b6g	   39	L	 -85.7	 144.9	-56	-178	E	N
1b6g	   40	D	-142.1	  89.4	-173	-10	E	N
1b6g	   41	E	-113.6	 154.0	-68	-71	E	N
1b6g	   42	G	  82.3	-174.1	\N	\N	E	N
1b6g	   43	N	 -63.2	 115.9	176	11	\N	N
1b6g	   44	S	 -60.7	 -26.4	-60	999	T	N
1b6g	   45	D	 -96.2	  12.6	-66	-26	T	N
1b6g	   46	A	 -64.6	 149.6	\N	\N	\N	N
1b6g	   47	E	 -71.4	 -44.9	-52	-166	S	N
1b6g	   48	D	 -87.2	 136.8	-70	-77	\N	N
1b6g	   49	V	-127.6	 124.1	173	999	E	N
1b6g	   50	F	 -92.4	 110.0	-72	77	E	N
1b6g	   51	L	 -95.8	 107.9	177	60	E	N
1b6g	   52	C	-106.3	 111.1	-64	999	E	N
1b6g	   53	L	-108.5	 123.3	-55	-177	\N	N
1b6g	   54	H	 -86.4	 176.4	74	83	\N	N
1b6g	   55	G	-130.9	-150.7	\N	\N	\N	N
1b6g	   56	E	 -85.4	 142.1	171	73	T	N
1b6g	   57	P	 -99.7	  57.3	40	-32	T	N
1b6g	   58	T	-109.7	-159.7	69	999	\N	N
1b6g	   59	W	-163.8	-173.6	57	-67	\N	N
1b6g	   60	S	 -60.2	 -18.4	55	999	G	N
1b6g	   61	Y	 -50.3	 -35.3	-175	85	G	N
1b6g	   62	L	 -59.7	 -27.1	179	54	G	N
1b6g	   63	Y	 -88.9	   5.3	-73	-76	G	N
1b6g	   64	R	 -61.4	 -25.4	72	-160	T	N
1b6g	   65	K	 -84.8	 -22.5	-50	178	T	N
1b6g	   66	M	 -87.6	 -40.5	-56	-173	T	N
1b6g	   67	I	 -52.1	 -49.6	-66	161	H	N
1b6g	   68	P	 -60.3	 -35.8	-23	34	H	N
1b6g	   69	V	 -66.3	 -46.9	174	999	H	N
1b6g	   70	F	 -63.5	 -43.0	-66	-34	H	N
1b6g	   71	A	 -63.3	 -42.1	\N	\N	H	N
1b6g	   72	E	 -64.3	 -21.6	-127	-13	H	N
1b6g	   73	S	 -79.1	  -3.6	78	999	T	N
1b6g	   74	G	  98.5	   5.1	\N	\N	T	N
1b6g	   75	A	 -88.8	 162.7	\N	\N	\N	N
1b6g	   76	R	 -90.3	 148.8	-174	-179	E	N
1b6g	   77	V	-130.9	 123.8	-170	999	E	N
1b6g	   78	I	-114.3	 124.4	-61	-72	E	N
1b6g	   79	A	-135.0	  94.4	\N	\N	E	N
1b6g	   80	P	 -85.5	 153.6	28	-28	E	N
1b6g	   81	D	 -95.4	 125.9	-61	-21	\N	N
1b6g	   82	F	 -60.2	 149.7	-74	-67	\N	N
1b6g	   83	F	 -54.9	 136.3	-64	-59	T	N
1b6g	   84	G	  98.2	  -7.3	\N	\N	T	N
1b6g	   85	F	-139.2	 157.3	-62	84	S	N
1b6g	   86	G	  57.8	-132.4	\N	\N	T	N
1b6g	   87	K	 -97.5	   5.9	-58	-176	T	N
1b6g	   88	S	 -78.7	 157.4	-69	999	S	N
1b6g	   89	D	 -63.8	 158.9	-73	-9	\N	N
1b6g	   90	K	-133.0	  95.0	-58	-176	E	N
1b6g	   91	P	 -46.4	 137.5	-30	39	E	N
1b6g	   92	V	 -89.0	 -13.4	-64	999	S	N
1b6g	   93	D	-107.6	 129.2	-63	-21	\N	N
1b6g	   94	E	 -57.0	 -33.5	-69	-175	G	N
1b6g	   95	E	 -62.4	 -23.4	-63	64	G	N
1b6g	   96	D	 -79.0	 -14.3	-75	-44	G	N
1b6g	   97	Y	 -90.3	 119.4	-59	89	\N	N
1b6g	   98	T	-138.8	 167.9	68	999	\N	N
1b6g	   99	F	 -54.2	 -53.5	-172	86	H	N
1b6g	  100	E	 -70.6	 -27.7	-67	-158	H	N
1b6g	  101	F	 -55.2	 -48.6	171	85	H	N
1b6g	  102	H	 -72.5	 -41.8	-69	-59	H	N
1b6g	  103	R	 -62.1	 -45.4	-176	174	H	N
1b6g	  104	N	 -70.5	 -27.5	-73	-19	H	N
1b6g	  105	F	 -58.2	 -48.8	156	69	H	N
1b6g	  106	L	 -62.2	 -43.7	-62	167	H	N
1b6g	  107	L	 -59.9	 -44.1	-69	174	H	N
1b6g	  108	A	 -64.7	 -38.3	\N	\N	H	N
1b6g	  109	L	 -62.6	 -46.9	178	147	H	N
1b6g	  110	I	 -59.3	 -43.6	-69	158	H	N
1b6g	  111	E	 -69.2	 -40.8	-81	-145	H	N
1b6g	  112	R	 -63.5	 -45.2	156	78	H	N
1b6g	  113	L	 -82.8	  -7.7	-65	176	H	N
1b6g	  114	D	  48.9	  53.7	-162	10	T	N
1b6g	  115	L	 -76.1	 130.2	-59	-176	\N	N
1b6g	  116	R	-132.8	 170.5	-51	-159	\N	N
1b6g	  117	N	  51.4	  49.1	-61	-61	S	N
1b6g	  118	I	 -85.3	 131.9	-58	168	E	N
1b6g	  119	T	 -97.1	 108.5	-65	999	E	N
1b6g	  120	L	 -78.6	 127.1	176	59	E	N
1b6g	  121	V	-121.8	 118.7	-175	999	E	N
1b6g	  122	V	-139.7	 159.2	-57	999	E	N
1b6g	  123	Q	-166.4	 142.9	176	-97	\N	N
1b6g	  124	D	  51.3	-135.4	-168	72	T	N
1b6g	  125	W	 -62.9	 -25.6	-89	99	H	N
1b6g	  126	G	 -60.3	 -31.4	\N	\N	H	N
1b6g	  127	G	 -83.7	 -44.6	\N	\N	H	N
1b6g	  128	F	 -50.4	 -41.3	-53	-15	H	N
1b6g	  129	L	 -86.8	 -51.5	-70	155	H	N
1b6g	  130	G	 -61.6	 -35.1	\N	\N	H	N
1b6g	  131	L	 -65.7	 -13.5	-66	167	T	N
1b6g	  132	T	 -94.7	   1.6	74	999	T	N
1b6g	  133	L	 -83.7	 -43.7	-56	172	S	N
1b6g	  134	P	 -57.1	 -35.4	-29	41	G	N
1b6g	  135	M	 -65.3	 -22.2	68	-165	G	N
1b6g	  136	A	 -75.0	 -30.0	\N	\N	G	N
1b6g	  137	D	-150.6	  70.8	-150	-22	S	N
1b6g	  138	P	 -62.2	 -30.9	9	-16	G	N
1b6g	  139	S	 -68.9	 -14.3	69	999	G	N
1b6g	  140	R	 -80.5	 -15.3	-66	-177	G	N
1b6g	  141	F	-110.2	 128.4	-70	88	E	N
1b6g	  142	K	-112.9	 -39.6	176	-170	E	N
1b6g	  143	R	-146.4	 164.6	-42	-164	E	N
1b6g	  144	L	-134.3	 131.5	175	65	E	N
1b6g	  145	I	-112.3	 113.3	-59	173	E	N
1b6g	  146	I	-114.2	 118.8	178	171	E	N
1b6g	  147	M	-107.3	 162.2	-67	164	E	N
1b6g	  148	N	 -17.1	 101.0	-64	29	S	N
1b6g	  149	A	-169.2	 163.3	\N	\N	\N	N
1b6g	  150	X	-159.8	-163.7	\N	\N	\N	N
1b6g	  151	L	-117.0	 138.1	-72	160	\N	N
1b6g	  152	M	 -83.6	  52.6	-67	-66	\N	N
1b6g	  153	T	 -79.6	 177.2	-171	999	\N	N
1b6g	  154	D	 -76.8	 162.0	68	-79	\N	N
1b6g	  155	P	 -63.7	 -29.3	-15	22	T	N
1b6g	  156	V	 -72.9	 -42.2	178	999	T	N
1b6g	  157	T	 -62.8	 -48.2	-59	999	T	N
1b6g	  158	Q	-141.7	  90.4	-67	-62	\N	N
1b6g	  159	P	 -59.5	 -31.5	16	-29	T	N
1b6g	  160	A	 -60.8	 -27.9	\N	\N	H	N
1b6g	  161	F	 -88.2	   4.2	-55	-16	H	N
1b6g	  162	S	-124.0	 -27.7	-70	999	H	N
1b6g	  163	A	 -62.0	 -33.5	\N	\N	H	N
1b6g	  164	F	 -63.2	 -21.3	74	85	T	N
1b6g	  165	V	 -64.0	 -31.9	171	999	T	N
1b6g	  166	T	-113.4	 -17.9	55	999	T	N
1b6g	  167	Q	-139.5	 148.7	-68	-169	S	N
1b6g	  168	P	 -74.1	 163.0	30	-35	S	N
1b6g	  169	A	 -55.4	 -35.6	\N	\N	T	N
1b6g	  170	D	 -88.9	  11.2	69	-6	T	N
1b6g	  171	G	 -98.9	-123.8	\N	\N	T	N
1b6g	  172	F	 -54.0	 -39.0	-179	81	H	N
1b6g	  173	T	 -54.5	 -47.3	-68	999	H	N
1b6g	  174	A	 -67.0	 -46.0	\N	\N	H	N
1b6g	  175	W	 -51.9	 -53.6	164	-78	H	N
1b6g	  176	K	 -63.3	 -45.2	176	162	H	N
1b6g	  177	Y	 -54.0	 -46.8	174	79	H	N
1b6g	  178	D	 -59.9	 -32.5	-74	3	H	N
1b6g	  179	L	 -90.5	 -39.3	-69	159	H	N
1b6g	  180	V	 -98.1	 -15.1	-54	999	H	N
1b6g	  181	T	 -92.5	 -43.5	-52	999	S	N
1b6g	  182	P	 -64.0	 137.2	-15	27	\N	N
1b6g	  183	S	 -70.0	 -31.4	53	999	S	N
1b6g	  184	D	-118.8	  84.9	-170	71	S	N
1b6g	  185	L	 -76.7	 122.7	173	62	\N	N
1b6g	  186	R	-110.4	 103.3	-61	-165	\N	N
1b6g	  187	L	 -75.1	 -21.5	-74	-174	H	N
1b6g	  188	D	 -62.1	 -45.3	50	-28	H	N
1b6g	  189	Q	 -68.1	 -42.3	-69	-178	H	N
1b6g	  190	F	 -55.2	 -50.8	178	-85	H	N
1b6g	  191	M	 -68.0	 -35.9	-70	-62	H	N
1b6g	  192	K	 -62.7	 -34.5	-76	-172	H	N
1b6g	  193	R	 -68.1	 -50.5	167	174	H	N
1b6g	  194	W	 -88.7	 -11.9	-68	94	H	N
1b6g	  195	A	-134.9	  75.0	\N	\N	S	N
1b6g	  196	P	 -68.5	  -7.5	7	-10	T	N
1b6g	  197	T	 -85.6	  -5.4	62	999	T	N
1b6g	  198	L	 -62.4	 145.3	-64	-178	\N	N
1b6g	  199	T	 -76.3	 166.4	71	999	\N	N
1b6g	  200	E	 -55.8	 -45.2	-71	-75	H	N
1b6g	  201	A	 -64.1	 -42.0	\N	\N	H	N
1b6g	  202	E	 -60.8	 -45.3	-66	169	H	N
1b6g	  203	A	 -60.4	 -38.1	\N	\N	H	N
1b6g	  204	S	 -61.9	 -30.8	179	999	H	N
1b6g	  205	A	 -65.6	 -27.0	\N	\N	H	N
1b6g	  206	Y	 -79.1	 -21.5	-67	-68	H	N
1b6g	  207	A	 -82.0	 -26.4	\N	\N	H	N
1b6g	  208	A	 -48.9	 -42.7	\N	\N	T	N
1b6g	  209	P	 -64.0	 -13.4	-22	37	T	N
1b6g	  210	F	-126.2	  85.2	-56	-66	\N	N
1b6g	  211	P	 -57.7	 -36.2	-33	40	S	N
1b6g	  212	D	-156.2	-179.2	65	33	S	N
1b6g	  213	T	 -62.9	 -23.5	61	999	G	N
1b6g	  214	S	 -67.9	 -18.5	70	999	G	N
1b6g	  215	Y	 -89.5	  -0.6	-65	79	G	N
1b6g	  216	Q	-107.7	  16.1	-64	173	\N	N
1b6g	  217	A	 -63.8	 -39.0	\N	\N	H	N
1b6g	  218	G	 -66.8	 -37.5	\N	\N	H	N
1b6g	  219	V	 -60.0	 -44.9	173	999	H	N
1b6g	  220	R	 -68.7	 -33.0	-66	-170	H	N
1b6g	  221	K	 -78.1	 -32.6	-176	60	H	N
1b6g	  222	F	 -53.6	 -51.3	-99	-29	H	N
1b6g	  223	P	 -67.8	 -28.7	12	-22	H	N
1b6g	  224	K	 -61.7	 -35.1	-56	176	H	N
1b6g	  225	M	 -74.9	 -22.9	-67	-165	H	N
1b6g	  226	V	 -73.4	 -39.5	177	999	H	N
1b6g	  227	A	 -78.9	 -29.7	\N	\N	H	N
1b6g	  228	Q	-131.9	  97.4	-65	-65	S	N
1b6g	  229	R	 -96.3	 119.6	-58	169	\N	N
1b6g	  230	D	 -74.4	 170.3	64	16	\N	N
1b6g	  231	Q	 -66.6	 -37.2	-163	65	H	N
1b6g	  232	A	 -58.4	 -48.7	\N	\N	H	N
1b6g	  233	X	 -56.0	 -49.7	\N	\N	H	N
1b6g	  234	I	 -59.6	 -45.7	-67	167	H	N
1b6g	  235	D	 -67.3	 -45.8	-66	-8	H	N
1b6g	  236	I	 -61.3	 -43.5	-66	163	H	N
1b6g	  237	S	 -66.3	 -38.8	-62	999	H	N
1b6g	  238	T	 -67.4	 -40.2	-59	999	H	N
1b6g	  239	E	 -60.2	 -40.6	-70	159	H	N
1b6g	  240	A	 -63.3	 -36.6	\N	\N	H	N
1b6g	  241	I	 -55.2	 -46.2	-61	177	H	N
1b6g	  242	S	 -62.3	 -39.7	-179	999	H	N
1b6g	  243	F	 -55.3	 -52.2	173	83	H	N
1b6g	  244	W	 -67.1	 -34.7	-70	108	H	N
1b6g	  245	Q	 -76.0	 -40.1	-74	-169	H	N
1b6g	  246	N	-114.3	 -22.0	-57	-48	H	N
1b6g	  247	D	-113.0	 -31.3	-58	-32	T	N
1b6g	  248	W	 -71.3	 132.8	176	-116	\N	N
1b6g	  249	N	-135.3	  15.7	-73	-64	\N	N
1b6g	  250	G	 -85.8	 174.1	\N	\N	S	N
1b6g	  251	Q	 -73.4	 138.9	-60	164	E	N
1b6g	  252	T	-135.6	 141.0	-50	999	E	N
1b6g	  253	F	-137.9	 129.3	-176	78	E	N
1b6g	  254	M	-117.2	 139.3	177	175	E	N
1b6g	  255	A	-133.2	 132.8	\N	\N	E	N
1b6g	  256	I	-113.7	 124.8	-52	169	E	N
1b6g	  257	G	 -78.7	 104.1	\N	\N	E	N
1b6g	  258	M	 -69.7	 -19.0	-61	-64	T	N
1b6g	  259	K	 -90.3	   0.6	-67	-172	T	N
1b6g	  260	D	 -70.1	 126.1	-173	7	\N	N
1b6g	  261	K	 -83.5	  -8.1	-67	177	S	N
1b6g	  262	L	 -97.8	 -68.3	-76	63	S	N
1b6g	  263	L	-121.9	  46.5	-49	179	S	N
1b6g	  264	G	 -83.2	-154.8	\N	\N	S	N
1b6g	  26

2 Contributors
5 Replies
92 Views
11 Hours Discussion Span
Latest Post 17 Years Ago Latest Post by KevinADC

KevinADC 192 Practically a Posting Shark

17 Years Ago

Is this school work? Do you this question posted on other perl forums? Whats the purpose of including the resID if it's not being used in the output?

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

BioTeq 0 Newbie Poster · Answer 1 · 2007-05-24T06:58:42+00:00

Terribly helpful of you ;) but to ease your curiosity - it's not homework, I'm trying to rewrite a parser that I wrote in Java some time ago (originally for dssp files if that tells you anything), this particular script is part of a larger set of programs used to populate a database and the residue.txt is the output of a different script (in the final version it will be reintegrated into a batch without processing the additional file. It's not posted on other web forums, it's posted on usenet though. It's not finished otherwise I wouldn't be posting questions regarding it, resID might turn out to be potentially useful for me, as I am considering storing a whole sequence in a separate attribute of a DB entity, I haven't decided yet as I am still modifying the DB schema.
Now instead of answering a question with a question, do you think you could give me a hint on dealing with the hash for this rather unusual case :) ? I would be really grateful, I'm trying to learn Perl by writing it, but with hashes I seem to be stumbling in the dark, I'm more of a java person I think :).

BioTeq 0 Newbie Poster · Answer 2 · 2007-05-24T07:02:11+00:00

I've slightly simplified the code after a couple of suggestions that I got elsewhere and now it should be easier to follow:

#!/usr/bin/perl -w
use strict;
use warnings;
if ($#ARGV<0)
{ die("\n\nUsage:  sstruct.pl [residue_table_file.txt]\n\n"); }
my $filename = $ARGV[0];
if (! -e "$filename")    
{ die("\n\nresidue file $filename does not exist!\n\n"); }
open (MYFILE,"$filename") or die ("\nERROR: Can't open $filename\n");
    my @dssplines= split(/\r/, <MYFILE>);
    my $arraySize=@dssplines;
close(MYFILE);

my @dsspdata;
my $dsspdataSize=@dssplines;
    for (my $i=0; $i < $arraySize; $i++)
    {
      my $dsspline = $dssplines[$i];
      my ($pdbID, $resNo, $resID, $phi, $psi, $chi1, $chi2, $secStruct, $activesite) = split(/\t/, $dsspline);
      push( @dsspdata, [$pdbID,$resNo,$resID,$secStruct] );
    }

my %dane;
my $k=0;
for (my $i=0; $i < $dsspdataSize; $i++)
{
    for (my $k = 0; $k <=4; $k++)
    {
        my $pdb = $dsspdata[$i][0];
        my $residueNum = $dsspdata[$i][1];
        my $secStructure = $dsspdata[$i][3];
        push @{ $dane{$pdb}->{$secStructure} }, $residueNum;
    }
    $k++;
}
    
foreach my $pdbID ( keys %dane )
{
    foreach my $secID ( keys %{ $dane{$pdbID} } )
    {
        my @resnums = ( $dane{$pdbID}->{$secID}->[0], $dane{$pdbID}->{$secID}->[-1] );
        open (SStruc, ">>secStructList.txt") || die "Can't open file: $!";
        print SStruc ("$pdbID \t @resnums \t $secID\n");
    }
}
close(SStruc);

BioTeq 0 Newbie Poster · Answer 3 · 2007-05-24T08:09:46+00:00

Seems that I will be solving this problem using a response from a usenet user.

my ( $x, $y, undef, $z ) = split ' ', <DATA>;
my ($last_x,$last_z,$min,$max)=($x,$z,$y,$y);

while (<DATA>) {
    my ( $x, $y, undef, $z ) = split;
    if ($x ne $last_x or $z ne $last_z) {
       print "$last_x $min $max $last_z\n";
       ($last_x,$last_z,$min,$max)=($x,$z,$y,$y);
    };
    $max=$y;
}
print "$last_x $min $max $last_z\n";

It works flawlessly however, it's not the way I want it solved, as I will still try to get it with the hashes.

KevinADC 192 Practically a Posting Shark · Answer 4 · 2007-05-24T09:26:07+00:00

KevinADC 192 Practically a Posting Shark

17 Years Ago

I'm going to pass on helping you.