Hello there,
I have got a csv file which has

Chromosom_id    fstart  fstop   Count
 1      105     1       14.5
1       105     1       14.5
1       105     1       14.5
1       813     797     4
1       813     797     22
1       813     797     4

In this the fstart represents the start of a matching with the genome and the fstop represents the stop of the match(Means the match starts at 105 and ends at 1.) and the counts represents the number of similar matches available with in this region(1-105) which are all of equal lengths. If the counts are greater some arbitrary value (say 7) then those regions are to be taken into account. I have attached the code below.

open (FILE ,"$file") or die "Cannot open the file\n";
my @hit_clusters = <FILE>;
close FILE;

my ($id, $fstart, $fstop, $count);
my ($cluster_start, $cluster_stop, $cluster_dist);
my $row_number =0;

foreach my $file_line(@hit_clusters){

    next if $file_line =~m/^\s*$/;#removes spaces
    next if $file_line =~m/^(Chromosom_id.+)$/;

    if ($file_line =~m/^(.+?)\t(\d+?)\t(\d+?)\t(\d+?)\b/){
    ($id, $fstart, $fstop, $count)= ($1,$2,$3,$4);

    if ($count >= $mini_num_hits){ #to check the counts greater than the arbitrary value 

        if (!$row_number){
        if  ($fstart > $fstop){ # if fstart is grater than fstop assign fstop to cluster_start.
            $cluster_start = $fstop;

        }else {$cluster_start = $fstart;} #if not assign fstart to cluster_start

        if ($fstop <$fstart){#if similar to the above case.
            $cluster_stop = $fstart;

        }else {$cluster_stop = $fstop;}

        ++$row_number;

        }

but the problem is the row_number is not incrementing and it prints the same value all the time.

1       105     1
1       105     1
1       105     1
1       105     1
1       105     1
1       105     1
1       105     1
1       105     1
1       105     1
1       105     1

What I have to do is: set the first fstart in the file as the $cluster_start and while reading through the file if I get another fstop that is less then 250 from the first fstart then I have to add their counts together and extend the region from the first fstart to the current fstop and then reset the cluster_start to the new fstart continue further.

Thanks in advance,

Recommended Answers

All 2 Replies

hi,

I cannot say what is the problem, because your code is incomplete. May be this line, if ($count >= $mini_num_hits){ #to check the counts greater than the arbitrary value in your code causing problem.

You can use split () function to get what you wanted from each file line.

use strict;
use warnings;

use FileHandle;

my $fh = new FileHandle;
my ($cluster_start, $cluster_stop, $cluster_dist);
my $row_number =0;

my $file_name = 'C:\Documents and Settings\kath\Desktop\input.txt';
open($fh, $file_name);

foreach (<$fh>){
	my ($id, $fstart, $fstop, $count) = split(/\s/);
	if ($fstart > $fstop){ # if fstart is grater than fstop assign fstop to cluster_start.
    $cluster_start = $fstop;
    $cluster_stop = $fstart;
  }
  else {
  	$cluster_start = $fstart;
  	$cluster_stop = $fstop;
 	}     
  ++$row_number;
	print "ROW-$row_number: ID: $id, FSTART: $fstart, FSTOP: $fstop, COUNT: $count -- CLUSTERSTART: $cluster_start, CLUSTERSTOP: $cluster_stop\n";
}

close($fh);

kath.

Personally, I can not understand the specifications of what the program is supposed to do. Posting partial code is not much help, there is no place in the code posted that even prints anything so there is no way to tell why it's not working properly. And you say you have a csv file but you are using tabs in the regexp to pull the data fields out of the lines.

You have to do better at explaining your program specs and please start using code tags around your perl code.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.