My script merges 18 files and returns all numbers that occur >=13 times within the merger. I timed my script and array_count_values is so slow it accounts for 80% of the 2.35 sec time. The files are large, 200,000 numbers per file, so the merged array is well over 2 million.

Any ideas how I can kick out the array_count_values function or write it in a better way and still get a return of all numbers that occur >= 13 times in the merged array?

Note: I shortened code to reflect only 3 files out of 18 to be merged.

for($b=0; $b<1; $b++)
    echo $b."\n";
for($a=0; $a<10; $a++)

    for($i=0; $i<30; $i++)//30

    $holdpreset=explode(" ",$linespreset);
    $holdpreset=array_map("trim", $holdpreset);

$healthy = " ";
$yummy   = "_";
$print1= strtr($print1,$healthy,$yummy);
$print2= strtr($print2,$healthy,$yummy);
$print3= strtr($print3,$healthy,$yummy);

$resultround=$print1."\r\n".$print2."\r\n".$print3."\r\n".$print4."\r\n".$print5."\r\n".$print6."\r\n".$print7."\r\n".$print8."\r\n".$print9."\r\n".$print10."\r\n".$print11."\r\n".$print12."\r\n". $print13."\r\n".$print14."\r\n".$print15."\r\n".$print16."\r\n".$print17."\r\n".$print18;

$somearray = str_word_count($resultround, 1, '1234567890:@&_');

$frequency = array_count_values($somearray);

$result = array_filter($frequency, function ($x) { return $x >=13; });

//fwrite to print out $result array with numbers that occur >=13 times in the merged array




So 2.35 seconds? Is this on a SSD or HDD?
I've found payback to be good enough that a move to SSD and more RAM is worth it. I don't see anything outstanding in the code that would make a big difference.


in addition, have you tried with SplFixedArray? It should be faster than standard arrays. Also if you want to open files from the script, than use fopen() instead of file_get_contents(), because the latter will load the entire file in memory before starting processing, while the former will read in chunks and start the execution immediately.