User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the Perl section within the Software Development category of DaniWeb, a massive community of 374,176 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 3,458 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Perl advertiser:
Views: 3217 | Replies: 32
Reply
Join Date: Jan 2006
Posts: 215
Reputation: katharnakh is an unknown quantity at this point 
Rep Power: 3
Solved Threads: 19
katharnakh's Avatar
katharnakh katharnakh is offline Offline
Posting Whiz in Training

Re: count characters in a string

  #11  
Apr 9th, 2008
Originally Posted by KevinADC View Post
...

This is also not correct syntax:

elsif ($line[0] =! /=/)

should be:

elsif ($line[0] !~ /=/)

Thanks Kavin...
challenge the limits
Reply With Quote  
Join Date: Mar 2006
Posts: 560
Reputation: KevinADC is an unknown quantity at this point 
Rep Power: 4
Solved Threads: 30
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Posting Pro

Re: count characters in a string

  #12  
Apr 9th, 2008
Borrowing from katharnakh's code I corrected a couple of things and expanded on the processing.

use strict;
use warnings;
open(IN, "readme.txt") || die "ERROR: $!";
open(OUT, ">seeme.txt") || die "ERROR: $!";

my (%cnter, $marker);
while(<IN>) {
		chomp;
		my @line = split(/\s+/);
		if($line[0] =~ /^=/) {
            $line[0] =~ tr/=//d; # remove all the "=" from the section title
            $marker = join(' ',@line); # rejoin the section title into a string
		}
		else {
            tr/,.?!//d for @line; #remove some punctuation
            tr/A-Z/a-z/ for @line; #convert all text to lower case so 'Word' and 'word' are the same
			   $cnter{$marker}{$_}++ for @line; 
 		}
}

# sort words by count in descending order
foreach my $section (keys %cnter){
		print OUT "$section\n";
		foreach my $word (sort { $cnter{$section}{$b} <=> $cnter{$section}{$a} } keys %{$cnter{$section}}){
				print OUT "$word: $cnter{$section}{$word}\n";
		}
}
close(IN);
close(OUT);
Last edited by KevinADC : Apr 9th, 2008 at 3:43 am.
Reply With Quote  
Join Date: Mar 2008
Posts: 17
Reputation: godevars is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 0
godevars's Avatar
godevars godevars is offline Offline
Newbie Poster

Re: count characters in a string

  #13  
Apr 9th, 2008
Thank you. I updated my script and the output looks good. I did start using the 'warnings' and 'strict' commands as suggested to me in another thread. When I run the updated script I receive an 'unitialized value in pattern match' error for this line:

if($line[0] =~ /^=/) {

My output is correct, just not sure why I continue to receive this error. I tried to correct by updating it to read.

if(my $line[0] =~ /^=/) {

Didn't work. Also tried to add @line (as well as $line) to the line where 'my' variables are defined but no luck. Is this an error I would receive but could ignore? Thanks again for helping me understand.
Reply With Quote  
Join Date: Mar 2006
Posts: 560
Reputation: KevinADC is an unknown quantity at this point 
Rep Power: 4
Solved Threads: 30
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Posting Pro

Re: count characters in a string

  #14  
Apr 9th, 2008
it's probably being caused by a blank line in the file. It is also a warning, not an error. Errors terminate scripts, warnings alert you to possible problems but the script keeps running.

You may want to try and skip blank lines in the file:

while(<IN>) {
      next if (/^\s*$/;
		chomp;
Reply With Quote  
Join Date: Mar 2008
Posts: 17
Reputation: godevars is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 0
godevars's Avatar
godevars godevars is offline Offline
Newbie Poster

Re: count characters in a string

  #15  
Apr 10th, 2008
I didn't realize the blank lines would create warnings. No more warnings now. In reviewing the script, it looks like a hash of hashes were made. I added (after the 'while' section' and before the '#sort..':

my ($key, $value);
while (($key, $value) = each %cnter) {
      print FILEOUT "$key => $value\n";
}

I wanted to see what the values in each hash were (wanted understand the logic behind each portion). My output looked something like this:

Retirement Plan Fundamentals Part I => HASH(0x18f0554)
Retirement Plan Fundamentals Part III => HASH(0x1876a74)
.....

I am looking to compare the words from each section in an easier way to review. For example, output would like more like:

word/section Plan Part I : Plan Part II : Plan Part III

plans: 1 : 0 : 1
course: 0 : 1 : 1
.....

I know (keys %cnter) is the title part.
(keys %{cnter{$section}}) is each word in the title hash. Not sure where to start. If I do this:

(keys %{cnter{$section}}) eq (keys %{cnter{$section}})

I am only comparing the words in the section. I believe I need to assign each section a unique hash name so that it would look something like this:

(keys %{cnter1{$section}}) eq (keys %{cnter2{$section}})

I would use one of the sections as the base list so that when I compare other sections, I would just add a word if it's not there already.

Also, please let me know if I should start another thread. I want to make sure I follow proper form here.

Thanks-
Last edited by godevars : Apr 10th, 2008 at 5:55 pm.
Reply With Quote  
Join Date: Jan 2006
Posts: 215
Reputation: katharnakh is an unknown quantity at this point 
Rep Power: 3
Solved Threads: 19
katharnakh's Avatar
katharnakh katharnakh is offline Offline
Posting Whiz in Training

Re: count characters in a string

  #16  
Apr 11th, 2008
Originally Posted by godevars View Post
...
I know (keys %cnter) is the title part.
(keys %{cnter{$section}}) is each word in the title hash. Not sure where to start. If I do this:

(keys %{cnter{$section}}) eq (keys %{cnter{$section}})

I am only comparing the words in the section. I believe I need to assign each section a unique hash name so that it would look something like this:

(keys %{cnter1{$section}}) eq (keys %{cnter2{$section}})
...


Hi,
Its a good idea to have two or three sections, if you have two or three sections only(not more than that). But if sections in the input file is more than 3(may be 5 or 10) then having unique hash for each sections is not a good idea. So we can change a little bit in the way data is stored in the hash, so that it is easy to handle.

Now the data is stored in the hash is like this,
cnter{
	section1 => {
				word1 => count1,
				word2 => count2
			},
	section2 => {
				word1 => count1,
				word2 => count2
			},
        ...
}

we will alter above hash, so that we will have one hash for entire section,
cnter{
		word1 => [sec1_count, sec2_count, sec3_count....],
		word2 => [sec1_count, sec2_count, sec3_count....],
		...
}
Now only word is stored as KEY and VALUE is an anonymous array holding count of word in each section.

Note: Each index in the array is dedicated to perticular section, so word count should be incremented accordingly.

katharnakh.
Last edited by katharnakh : Apr 11th, 2008 at 12:28 am.
challenge the limits
Reply With Quote  
Join Date: Mar 2006
Posts: 560
Reputation: KevinADC is an unknown quantity at this point 
Rep Power: 4
Solved Threads: 30
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Posting Pro

Re: count characters in a string

  #17  
Apr 11th, 2008
It seems that the array approach would require you know all the sections beforehand.

It may require more than one process to get the final output. First build the hash that counts the words per section, then have another routine that builds the hash of arrays then prints the final output.
Reply With Quote  
Join Date: Mar 2006
Posts: 560
Reputation: KevinADC is an unknown quantity at this point 
Rep Power: 4
Solved Threads: 30
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Posting Pro

Re: count characters in a string

  #18  
Apr 11th, 2008
I think the approach for the main data structure will have to be:

word => {
      section_titte => count
      section_title => count
}
word => {
      section_titte => count
      section_title => count
}

plus have a seperate array that holds the names of each section in the order it was found in the file.
Reply With Quote  
Join Date: Mar 2006
Posts: 560
Reputation: KevinADC is an unknown quantity at this point 
Rep Power: 4
Solved Threads: 30
KevinADC's Avatar
KevinADC KevinADC is offline Offline
Posting Pro

Re: count characters in a string

  #19  
Apr 11th, 2008
A basic script, the output foramt is not real good but that will be up to you to change to your needs:

use strict;
use warnings;
open(IN, "readme.txt") or die "ERROR: $!";
open(OUT, ">seeme.txt") or die "ERROR: $!";
my (%cnter, $title, @order);
while(<IN>) {
      next if (/^\s*$/);
      chomp;
      my @line = split(/\s+/);
      if($line[0] =~ /^=/) {
            $line[0] =~ tr/=//d; # remove all the "=" from the section title
            $title = "@line";
            push @order, $title;
      }
      else {
            tr/,.?!//d for @line; #remove some punctuation
            tr/A-Z/a-z/ for @line; #convert all text to lower case so 'Word' and 'word' are the same
            $cnter{$_}{$title}++ for @line; 
      }
}

print OUT join("\t",@order),"\n";
foreach my $word (sort keys %cnter){
      print OUT "$word : ";
      my @t = ();
      foreach my $title (@order) {
            push @t, (exists $cnter{$word}{$title}) ?  $cnter{$word}{$title} : 0;
      }
      print OUT join("\t", @t),"\n";
}
close(IN);
close(OUT);

This does not allow for a lot of data analysis in and of itself. It simply lists the data by word and its count per section. If you wanted to sort by highest word frequency per section (for example) you would need to build a more robust data structure or open the file this script creates and parse that file with another script. You could look at the output of the above script as your basic statistics from which you could perform more analysis of the data.
Last edited by KevinADC : Apr 11th, 2008 at 3:36 am.
Reply With Quote  
Join Date: Jan 2006
Posts: 215
Reputation: katharnakh is an unknown quantity at this point 
Rep Power: 3
Solved Threads: 19
katharnakh's Avatar
katharnakh katharnakh is offline Offline
Posting Whiz in Training

Re: count characters in a string

  #20  
Apr 11th, 2008
Originally Posted by KevinADC View Post
It seems that the array approach would require you know all the sections beforehand.
It may require more than one process to get the final output. First build the hash that counts the words per section, then have another routine that builds the hash of arrays then prints the final output.
Not really, only thing which we need to consider is extra overhead(which, i consider) to find the index of the section in the array, so that count of a word belong to that section is incremented correctly.

Here is the implementation according to the datastructure which i had mentioned earlier. Below code modifies Kavin's earlier posted code, just to show how the implementation goes. I have commented some lines just to compare two datastructure creation and manipulation
  1. use strict;
  2. use warnings;
  3. open(IN, "readme.txt") or die "ERROR: $!";
  4. open(OUT, ">seeme.txt") or die "ERROR: $!";
  5. my (%cnter, $title, @order);
  6. while(<IN>) {
  7. next if (/^\s*$/);
  8. chomp;
  9. my @line = split(/\s+/);
  10. if($line[0] =~ /^=/) {
  11. $line[0] =~ tr/=//d; # remove all the "=" from the section title
  12. $title = "@line";
  13. push @order, $title;
  14. }
  15. else {
  16. tr/,.?!//d for @line; #remove some punctuation
  17. tr/A-Z/a-z/ for @line; #convert all text to lower case so 'Word' and 'word' are the same
  18.  
  19. # a local hash to store { section => its index in the array
  20. my %index;
  21. @index{@order} = (0..$#order);
  22.  
  23. #$cnter{$_}{$title}++ for @line;
  24. ${$cnter{$_}}[$index{$title}]++ for @line;
  25. }
  26. }
  27.  
  28. print OUT join("\t",@order),"\n";
  29. foreach my $word (sort keys %cnter){
  30. print OUT "$word :";
  31. my @t = ();
  32. my $index=0;
  33. foreach my $title (@order) {
  34. #push @t, (exists $cnter{$word}{$title}) ? $cnter{$word}{$title} : 0;
  35. push @t, (exists ${$cnter{$word}}[$index]) ? ${$cnter{$word}}[$index] : 0;
  36. }
  37. print OUT join("\t", @t),"\n";
  38. }
  39. close(IN);
  40. close(OUT);

I appreciate Kavin's approach which is more simple, clean and code is more readable.

katharnakh.
Last edited by katharnakh : Apr 11th, 2008 at 7:18 am.
challenge the limits
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 

DaniWeb Perl Marketplace
Thread Tools Display Modes

Similar Threads
Other Threads in the Perl Forum

All times are GMT -4. The time now is 4:27 am.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC