0

i have been given an assignment, to find frequencies of all words in a large text file. I have tried a program which finds the same in a sample string. Done by taking that string in an array. But in case of a text file spanning many pages with thousands of words, won't that array eat up a lot space? I have been asked to consider performance as a prime criteria.
Any suggestion will be awesome

5
Contributors
10
Replies
12
Views
6 Years
Discussion Span
Last Post by edwinhermann
0

I think this would work reasonably fast:

<?php
$filename = "/path/to/file.txt";
$handle = fopen($filename,"r");
if ($handle === false) {
  exit;
  }
$word = "";
while (false !== ($letter = fgetc($handle))) {
  if ($letter == ' ') {
    $results[$word]++;
    $word = "";
    }
  else {
    $word .= $letter;
    }
}
fclose($handle);
print_r($results);
?>

Note: This assumes the file is in the format <word><space><word><space><word>.... etc.

Edited by edwinhermann: n/a

0

You came here for a suggestion and got a complete solution. Lucky guy.

I just re-read the original post, and now I realise it was for an assignment. I suppose I shouldn't have provided the complete solution. Oh well :)

0

You could also do this:

$str = file_get_contents("text.txt"); //get string from file - no error handling here though
preg_match_all("/\b(\w+[-]\w+)|(\w+)\b/",$str,$r); //place words into array $r - this includes hyphenated words
$c = array_count_values(array_map("strtolower",$r[0])); //create new array - with case-insensitive count 
foreach($c as $key => $val){
	echo $key . " [" . $val . "]<br />";  //output data	
}

This will give you output for ASCII character words. However multibyte characters (รข etc) won't work. That needs something more sophisticated.

I bet somebody could write a better regex than I've used though.

Edited by diafol: n/a

0

You came here for a suggestion and got a complete solution. Lucky guy.

what can i say, edwinhermann is really kind !!

-1

i have a text file written in utf-8 format like arabic script. for that suggest me what to do. how to output text in same arabic script

0

Khan - please start a new thread. This thread is solved and it's not appropriate to post additional messages here.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.