KevinADC 192 Practically a Posting Shark

I have no idea why you say perl is headed to the graveyard. Your premise is wrong so there is no use in discussing it. Perl may not be as popular as it was at one time but it is far fom a dead language.

I don't know what perl does better than python or vice-versa.

KevinADC 192 Practically a Posting Shark

Emotional attachment? Perl is a good language, that is why people continue to use it.

People will stop using it when it no longer serves a purpose. That will not be for a long time I suspect.

KevinADC 192 Practically a Posting Shark

Did you have a question? If not, what is the purpose of posting that code?

KevinADC 192 Practically a Posting Shark

my bad, but can u help me?!

I will try and help you with specific problems but so far you have not provided that information.

KevinADC 192 Practically a Posting Shark

As said before: describe in details the problems you are having.

KevinADC 192 Practically a Posting Shark

There is no urgent help here. All questions have the same priority: none.

If you want help you need to post your code, describe in detail the problems you are having and post any error messages you are getting.

Then if someone wants to help you they will.

jephthah commented: "There is no urgent help here. All questions have the same priority: none." LOL +2
KevinADC 192 Practically a Posting Shark

Post your code.

Ditto. Post your code.

KevinADC 192 Practically a Posting Shark

Use a hash of arrays:

my %HoA = ();
for (1..4) {
   push @{$HoA{$_}},$value;
}
KevinADC 192 Practically a Posting Shark

There are a lot of array handling modules that do this kind of work already. But you could use a hash of arrays and give the keys any value you wanted that would associate it with the respective array.

You could also use the grep() function because arrays can be processed super fast, and you can compare many arrays for various conditions very rapidly with grep.

KevinADC 192 Practically a Posting Shark

Good. But I still think it would be better to just count the words while building the initial data set, it would be less work since perl would only need to parse the file once to get all the data, instead of parse the file, then the data again.

godevars commented: All posts have helped me understand Perl more. +1
KevinADC 192 Practically a Posting Shark
OUTTER: while(<DATA>){
   if(m#^/\*#){
      while (<DATA>) {
         next OUTTER if (m#^\*/# );
      }
   }
   print;
}
__DATA__
foo bar
/*
if () {
....
...
}
*/
bash baz
KevinADC 192 Practically a Posting Shark

Look in the HTTP::Request modules documentation and see if has some authentication method. Or look in the documentation of other modules your script might use, like LWP or LWP::Simple or whatever.

KevinADC 192 Practically a Posting Shark

Still learning on my end.
I am reviewing the code and am trying to understand this:

$cnter{$_}{$title}++ for @line;

I see this as a hash, %cnter, being populated . $_ is the word and the $title is the section. I see that the keys = $_ in this loop which are all the words. The value would then be the $title and the ++ is to count each individual word appearing in the section. Is the 'for @line' portion used for reading each line as it comes through?

I think this means this code already has a hash of all the words in each section: keys %cnter. I am tryin to figure out how to detemrine how to identify the hash for each section.

Thanks-

%cnter is a two dimensional hash ( a hash of hashes). $_ (the words) and $title (the section title) are bot hash keys. The value of a hash key can be another hash (and more things besides). ++ is the count of each word per section.

'for @line' just loops through the @line array and applies the value of each "line" to $_ which is used to build the hash up with. Its a short way of writing:

for (@line) {
    $cnter{$_}{$title}++;
}

It does mean there is a hash with all the words counted per all sections.

%cntr = (
word1 => {
    title1 => count,
    title2 => count,
 },
word2 => {
    title1 => count,
    title2 => count,
 } …
KevinADC 192 Practically a Posting Shark

Hi Kavin,
the code does work with push @t, (exists ${$cnter{$word}}[$index{$title}]) ? ${$cnter{$word}}[$index{$title}] : 0; line. I am not sure why, may be you can tell, if you have an idea, because code does include user strict; use warnings; construct and it does not seems to throw any warnings or errors... I forgot to mention that point in the earlier thread.

katharnakh.

Then I guess the exists funtion does work for arrays as well as hashes. I will try some simple tests later and see what happens.

KevinADC 192 Practically a Posting Shark

Did you run the code? The first obvious problem is declaring the variable twice with "my". The code you posted looks like it will always display zero because of that. But even if you properly declare the variable it will not count the total of words per section. It looks like it will be the total of only each instance of a word for all the sections it is found in. Say the word were "foo" and it was found 3 times in section I and 2 times in section III the code will print 5, I think. I did not run the code to see but I can tell it is not totalling the words per section. I would do that while the data is being read in from the file, not after, while the data is being printed to the OUT file, although that is more than likely possible.

KevinADC 192 Practically a Posting Shark

Your code does not seem to work properly katharnakh. I did not try and determine why. I don't think the "exists" function works on arrays:

exists ${$cnter{$word}}[$index]

like it does on hash keyes:

exists $cnter{$word}{$title}

so that might be a problem.

KevinADC 192 Practically a Posting Shark

A basic script, the output foramt is not real good but that will be up to you to change to your needs:

use strict;
use warnings;
open(IN, "readme.txt") or die "ERROR: $!";
open(OUT, ">seeme.txt") or die "ERROR: $!";
my (%cnter, $title, @order);
while(<IN>) {
      next if (/^\s*$/);
      chomp;
      my @line = split(/\s+/);
      if($line[0] =~ /^=/) {
            $line[0] =~ tr/=//d; # remove all the "=" from the section title
            $title = "@line";
            push @order, $title;
      }
      else {
            tr/,.?!//d for @line; #remove some punctuation
            tr/A-Z/a-z/ for @line; #convert all text to lower case so 'Word' and 'word' are the same
            $cnter{$_}{$title}++ for @line; 
      }
}

print OUT join("\t",@order),"\n";
foreach my $word (sort keys %cnter){
      print OUT "$word : ";
      my @t = ();
      foreach my $title (@order) {
            push @t, (exists $cnter{$word}{$title}) ?  $cnter{$word}{$title} : 0;
      }
      print OUT join("\t", @t),"\n";
}
close(IN);
close(OUT);

This does not allow for a lot of data analysis in and of itself. It simply lists the data by word and its count per section. If you wanted to sort by highest word frequency per section (for example) you would need to build a more robust data structure or open the file this script creates and parse that file with another script. You could look at the output of the above script as your basic statistics from which you could perform more analysis of the data.

KevinADC 192 Practically a Posting Shark

I think the approach for the main data structure will have to be:

word => {
      section_titte => count
      section_title => count
}
word => {
      section_titte => count
      section_title => count
}

plus have a seperate array that holds the names of each section in the order it was found in the file.

KevinADC 192 Practically a Posting Shark

It seems that the array approach would require you know all the sections beforehand.

It may require more than one process to get the final output. First build the hash that counts the words per section, then have another routine that builds the hash of arrays then prints the final output.

KevinADC 192 Practically a Posting Shark

it's probably being caused by a blank line in the file. It is also a warning, not an error. Errors terminate scripts, warnings alert you to possible problems but the script keeps running.

You may want to try and skip blank lines in the file:

while(<IN>) {
      [B]next if (/^\s*$/;[/B]
		chomp;
KevinADC 192 Practically a Posting Shark

Borrowing from katharnakh's code I corrected a couple of things and expanded on the processing.

use strict;
use warnings;
open(IN, "readme.txt") || die "ERROR: $!";
open(OUT, ">seeme.txt") || die "ERROR: $!";

my (%cnter, $marker);
while(<IN>) {
		chomp;
		my @line = split(/\s+/);
		if($line[0] =~ /^=/) {
            $line[0] =~ tr/=//d; # remove all the "=" from the section title
            $marker = join(' ',@line); # rejoin the section title into a string
		}
		else {
            tr/,.?!//d for @line; #remove some punctuation
            tr/A-Z/a-z/ for @line; #convert all text to lower case so 'Word' and 'word' are the same
			   $cnter{$marker}{$_}++ for @line; 
 		}
}

# sort words by count in descending order
foreach my $section (keys %cnter){
		print OUT "$section\n";
		foreach my $word (sort { $cnter{$section}{$b} <=> $cnter{$section}{$a} } keys %{$cnter{$section}}){
				print OUT "$word: $cnter{$section}{$word}\n";
		}
}
close(IN);
close(OUT);
KevinADC 192 Practically a Posting Shark

this might return false matches:

if($line[0] =~ /=/)

it would be better (I think judging by the sample data) to anchor it to the beginning of the string:

if($line[0] =~ /^=/)

at least that way you know it not somewhere else in the string. Might be better to just use the index() function though to avoid the unecessary overhead of a regexp.

This is also not correct syntax:

elsif ($line[0] =! /=/)

should be:

elsif ($line[0] !~ /=/)
KevinADC 192 Practically a Posting Shark

the warnings pragma:

use warnings;

is better than the -w switch. It allows you to do this:

no warnings;

within a block of code that might cause warnings but you want the perl script to ignore them. Once you turn on the -w switch there is no way to turn it off. And it affects all modules that your program might use as well. The warnings pragma does not, it is only scoped to the program that loads it.

KevinADC 192 Practically a Posting Shark

post your code and some sample data.

KevinADC 192 Practically a Posting Shark
KevinADC 192 Practically a Posting Shark

post a new thread cute121180 and post the code you have written so far to try and solve your requirements.

KevinADC 192 Practically a Posting Shark

You have this question also on DEVSHED where it has a number of replies.

KevinADC 192 Practically a Posting Shark

Mank,

learn how to use hashes, this could be done pretty easy using a hash.

KevinADC 192 Practically a Posting Shark

Your explnation is not very clear:

fileX
name1 account1 123
name2 account2 324
name3 account3 345

fileY
name1 account1 123
name2 account4 324
name5 account3 345

So I want output file to be like
outputfile

name1 account1 123
name2 account2 324
name3 account3 345
name5 account3 345

The duplicate is "name2", but why did you keep "account2" instead of "account4"? In other words, when there is a duplicate of the first column, how do you know which line to keep and which one to not keep?

KevinADC 192 Practically a Posting Shark

look in the error log. If there are multiple error logs look in the most recent one.

KevinADC 192 Practically a Posting Shark

I would find a BioPerl forum or mailing list and ask there.

KevinADC 192 Practically a Posting Shark

path to perl may have changed:

#!usr/bin/perl

Look in the server error log if you can, most hosts have an error log you can look at.

KevinADC 192 Practically a Posting Shark

You probably want to remove the newlines though.

KevinADC 192 Practically a Posting Shark

this creates an array, not a hash:

($temp1, $temp2) = split (/,/, $_);
$words[$temp1]= $temp2

actually unless $temp1 is a number it should not create anything.

This would create a hash:

($temp1, $temp2) = split (/,/, $_);
$words{$temp1}= $temp2

Your code used [] (array indices) instead of {} (hash keys).

KevinADC 192 Practically a Posting Shark

If you want to define the IO files in the script look into the open() function or using @ARGV to pass in parameters to use as files.

Please note, there is no urgent help here. Your question is no more important or less important then anyone elses. It will be answered by anyone that reads it when they read it and if they want to answer it.

KevinADC 192 Practically a Posting Shark
sub validateForm
{      $failedFields="";
	if ($username =~ /\d/ && $username  /^[a-zA-Z][a-zA-Z0-9]{3,7}$/) {
             return 1;
        }
        else {
	     $failedFields .= "Username,";
             return 0;
	}
}

Looks like you're not using the "strict" pragma, you should. If you need to add more characters/symbols into the {3,7} range just add them to the character class, or add \W which is the opposite of \w, to add all symbols.

KevinADC 192 Practically a Posting Shark


Also in your regex and the one I used, abcd1& and abcd1? are rejected when they shouldn't be, so yes youre right mine doesn't work as it should.

You need to think about what your requirements are and post them clearly, I think we are all under the impression that the string can only contain digits and alphas (must start with an alpha) and can only be a certain length.

KevinADC 192 Practically a Posting Shark

okay, bradleykirby, i see the problem.

you did not put the carat ^ at the beginning, where it should be.

Ahh, you caught it. Very good. :)

KevinADC 192 Practically a Posting Shark

i dont know why you're seeing what you're seeing if it is somehow allowing it.

look closely and you will see why:

=~/[a-z]

that is the code he posted, not you.

KevinADC 192 Practically a Posting Shark

oops, dangit. you're right.

i thought that looked odd for some reason, but it was late, and i was in a hurry to go to bed.

that's two, now.

:embarrass:

hehehe.... sounds like something I would do. Watch out for those post-in-hurry-to-get-to-bed-it's-late replies. ;)

KevinADC 192 Practically a Posting Shark

what was wrong with my previous suggestion?

use strict;
use warnings;
while (<DATA>) {
   chomp;
   if (/\d/ &&  /^[a-zA-Z][a-zA-Z0-9]{3,7}$/) {
      print "$_ GOOD\n";
   }
   else {
      print "$_ BAD\n";
   }
}
__DATA__
a
ab
abc
abcd
abcdefghi
abcde1fgh
1abc
$abcd1
?abcd1
abcd1&
abcd1?
Abc1
a123
a1234567
KevinADC 192 Practically a Posting Shark


but i would rather

if (str =~ /^[a-z](\w{3,7})/ )
{
    if ($1 =! /\d/ || $1 =! /[A-Z]/ || $1 != /[a-z]/)
    { 
        die "must have one number, cap, and lowercase!\n";
    }

    ... stuff ...
}

Your second "if" condition contains some errors. "=!" is not a valid perl operator and "!=" is the wrong operator, they should all be "!~".

KevinADC 192 Practically a Posting Shark

Help you with what? You should not expect anyone here to debug your code, explain where you are having problems.

Please note there is no urgent help here. Your question is no more important or less important then any other question posted on the forum. If you need urgent help hire a programmer.

KevinADC 192 Practically a Posting Shark
if ($username =~ /\d/ &&  $username =~ /^[a-zA-Z][a-zA-Z0-9]{3,7}/) {
   $username is good do whatever you want
}
else {
   $username is bad
}

But that might still not be good enough. The above will match strings like:

a1111111111111111111111111111111111111111111111111111111111111.....


if you need to match a specific length you have to add the end of string anchor ($) to the second regexp:

if ($username =~ /\d/ &&  $username =~ /^[a-zA-Z][a-zA-Z0-9]{3,7}$/) {
   $username is good do whatever you want
}
else {
   $username is bad
}
KevinADC 192 Practically a Posting Shark

/^\D\w{3,7}\d+/;

the above means:

^\D starts with one non-digit character
\w{3,7} followed by 3 to 7 word characters, same as a-zA-Z0-9_
\d+ followed by one or more digits

what you probably want is is two regexps:

/\d/ has at least one digit

/^[a-zA-Z][a-zA-Z0-9]{3,7}/;

KevinADC 192 Practically a Posting Shark

Guess he figured it out.

KevinADC 192 Practically a Posting Shark

ask on an OSX forum what the problem is. I don't know anything about OSX. After you get the read only problem fixed you should be able to get apache configured correctly. Sorry I can't be of more help.

KevinADC 192 Practically a Posting Shark

change the httpd.conf file from read only (I don't know how to do that on OSX) so you can edit it.

KevinADC 192 Practically a Posting Shark

Additionally, this condition will always be true:

if($page = "") $page = 1;

because you have used the assingment operator "=" instead of a comparison operator like "==" or "eq" to check the value of $page.

KevinADC 192 Practically a Posting Shark