When we update the code is it possible change line 76
next unless defined $g_other and $g_other eq 'gene'
to also run if $g_other eq ' ' as there are additional queries which should be examined which have nothing in the column $g_other column.
I'm not sure about that. I've been trying to modify the script to save the gene data in a more suitable data structure that would allow faster look up of the genes for each scaffold when determining nearest gene, but so far I'm not getting good results.
Maybe we would need to consider a radical change to the program design, such as reading the gene data into a database table. Then you could access it with SQL queries that could include or filter out rows according to values in the columns, and it might run faster. Do you have SQLite for Perl?
Meanwhile, here's the corrected version of the last script I posted. It skips blank lines in the hairpin file to avoid the error messages you were getting.
#!/usr/bin/perl;
#comparepremiRNAtocoding-pulloverlap3.pl;
use strict;
use warnings;
use List::Util qw[min max];
use Benchmark;
my $t0 = Benchmark->new;
#You can delete the following line. My testing setup works better without command-line arguments
#@ARGV = qw(cab-ALL.LU-premiRNAs.withLOCATION.Vmatch-Nvmr-EST.list bab-All.LuCoding.061511) if @ARGV == 0;
@ARGV = qw(dani-sampleHairpindata.txt dani-sampleGenedata.txt) if @ARGV == 0;
my $hairpin_filename = $ARGV[0];
my $gene_filename = $ARGV[1];
my @genedata;
$| = 1;#Flush print buffer
read_genedata();#Call subroutine to read selected records from file into array
my $t1 …