I have written a dehasher and the first 446000 rows in the mysql database have turned out ok but now my script isn't placing the entries into the database in order and is taking like a minute per entry to do it. Does anybody know what is wrong with the following script?

<?php
set_time_limit(90);
$load=file_get_contents('/proc/loadavg');
$load=explode(' ',$load);
echo 'Load='.$load[0];
if ($load[0]<=0.25) {
mysql_connect('localhost','user','password');
mysql_select_db('database');
$char=array('','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','1','2','3','4','5','6','7','8','9','0','~','`','!','@','#','$','%','^','&','*','(',')','-','_','+','=','\\','|','{','}','[',']',';',':','"','\'',',','<','.','>','?','/',' ');
$bchar=array(''=>0,'a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5,'f'=>6,'g'=>7,'h'=>8,'i'=>9,'j'=>10,'k'=>11,'l'=>12,'m'=>13,'n'=>14,'o'=>15,'p'=>16,'q'=>17,'r'=>18,'s'=>19,'t'=>20,'u'=>21,'v'=>22,'w'=>23,'x'=>24,'y'=>25,'z'=>26,'A'=>27,'B'=>28,'C'=>29,'D'=>30,'E'=>31,'F'=>32,'G'=>33,'H'=>34,'I'=>35,'J'=>36,'K'=>37,'L'=>38,'M'=>39,'N'=>40,'O'=>41,'P'=>42,'Q'=>43,'R'=>44,'S'=>45,'T'=>46,'U'=>47,'V'=>48,'W'=>49,'X'=>50,'Y'=>51,'Z'=>52,'1'=>53,'2'=>54,'3'=>55,'4'=>56,'5'=>57,'6'=>58,'7'=>59,'8'=>60,'9'=>61,'0'=>62,'~'=>63,'`'=>64,'!'=>65,'@'=>66,'#'=>67,'$'=>68,'%'=>69,'^'=>70,'&'=>71,'*'=>72,'('=>73,')'=>74,'-'=>75,'_'=>76,'+'=>77,'='=>78,'\\'=>79,'|'=>80,'{'=>81,'}'=>82,'['=>83,']'=>84,';'=>85,':'=>86,'"'=>87,'\''=>88,','=>89,'<'=>90,'.'=>91,'>'=>92,'?'=>93,'/'=>94,' '=>95,"\r"=>0);
$r=mysql_query('SELECT `id` FROM `hashes`');
$n=mysql_num_rows($r);
if ($n>0) {
    $re=mysql_query('SELECT uncompress(`id`) as `id` FROM `hashes` LIMIT '.($n-1).',1');
    $d=mysql_fetch_assoc($re);
    $d['id']=$d['id'];
    if (strlen($d['id'])<7) {
        while (strlen($d['id'])<7) {
              $d['id']="\r".$d['id'];
              }
        }
    $x=str_split($d['id'],1);
    $j[1]=$bchar[$x[0]];
    $j[2]=$bchar[$x[1]];
    $j[3]=$bchar[$x[2]];
    $j[4]=$bchar[$x[3]];
    $j[5]=$bchar[$x[4]];
    $j[6]=$bchar[$x[5]];
    $j[7]=(($bchar[$x[6]]+1)<95)?$bchar[$x[6]]+1:95;
    } else {
    $j[1]=0;
    $j[2]=0;
    $j[3]=0;
    $j[4]=1;
    $j[5]=1;
    $j[6]=1;
    $j[7]=1;
    }
unset($bchar);
$m=0;
$l=0;
$p=true;
$passgo=0;
$sleeper=0;
for ($i[1]=$j[1];$i[1]<15;$i[1]++) { //16.0655625GB Database
    for ($i[2]=$j[2];isset($char[$i[2]]);$i[2]++) {
        for ($i[3]=$j[3];isset($char[$i[3]]);$i[3]++) {
            for ($i[4]=$j[4];isset($char[$i[4]]);$i[4]++) {
                for ($i[5]=$j[5];isset($char[$i[5]]);$i[5]++) {
                    for ($i[6]=$j[6];isset($char[$i[6]]);$i[6]++) {
                        $m+=$l;
                        $l=0;
                        for ($i[7]=$j[7];isset($char[$i[7]]);$i[7]++) {
                            if ((!empty($i[6]) && empty($i[7])) ||  (!empty($i[5]) && (empty($i[6]) || empty($i[7]))) ||  (!empty($i[4]) && (empty($i[5]) || empty($i[6]) || empty($i[7]))) ||  (!empty($i[3]) && (empty($i[4]) || empty($i[5]) || empty($i[6]) || empty($i[7]))) ||  (!empty($i[2]) && (empty($i[3]) || empty($i[4]) || empty($i[5]) || empty($i[6]) || empty($i[7]))) ||  (!empty($i[1]) && (empty($i[2]) || empty($i[3]) || empty($i[4]) || empty($i[5]) || empty($i[6]) || empty($i[7])))) {
                            } else {
                            $v=$char[$i[1]].$char[$i[2]].$char[$i[3]].$char[$i[4]].$char[$i[5]].$char[$i[6]].$char[$i[7]];
                            $z=hash('sha1',$v);
                            $hash=mysql_real_escape_string(substr($z,0,4).hash('crc32',$z).hash('crc32b',$z));
                            $s=mysql_real_escape_string($v);
                            if (($l+$m)<8) {
                                $r=mysql_query('SELECT `id` FROM `hashes` WHERE `id`=compress("'.$s.'") AND `sha1`=compress("'.$hash.'")');
                                if (mysql_num_rows($r)==0) {
                                    mysql_query('INSERT INTO `hashes` SET `id`=compress("'.$s.'"), `crc32`=compress("'.mysql_real_escape_string(hash('crc32',$v)).'"), `crc32b`=compress("'.mysql_real_escape_string(hash('crc32b',$v)).'"), `sha1`=compress("'.$hash.'")');$l++;
                                    } else {
                                    if (($l+$m)>5) { $exits=true; break; }
                                    }
                                } else {
                                mysql_query('INSERT INTO `hashes` SET `id`=compress("'.$s.'"), `crc32`=compress("'.mysql_real_escape_string(hash('crc32',$v)).'"), `crc32b`=compress("'.mysql_real_escape_string(hash('crc32b',$v)).'"), `sha1`=compress("'.$hash.'")');$l++;
                                }
                            //if ($l==45){sleep(3);}
                            }
                            if(($l+$m)>10) {
                                $exits=true;
                                break;
                                }
                            $j[7]=0;
                            }
                        //sleep(4);
                        if ($exits==true) { break; }
                        $j[6]=0;
                        }
                    if ($exits==true) { break; }
                    $j[5]=0;
                    }
                if ($exits==true) { break; }
                $j[4]=0;
                }
            if ($exits==true) { break; }
            $j[3]=0;
            }
        if ($exits==true) { break; }
        $j[2]=0;
        }
    if ($exits==true) { break; }
    $j[1]=0;
    }
flush();
} //end server load if
?>

Please help. Thanks.

Recommended Answers

All 26 Replies

Could you explain what you're doing a bit please? The code is a bit hard to follow.

for ($i[1]=$j[1];$i[1]<15;$i[1]++) { //16.0655625GB Database
    for ($i[2]=$j[2];isset($char[$i[2]]);$i[2]++) {
        for ($i[3]=$j[3];isset($char[$i[3]]);$i[3]++) {
            for ($i[4]=$j[4];isset($char[$i[4]]);$i[4]++) {
                for ($i[5]=$j[5];isset($char[$i[5]]);$i[5]++) {
                    for ($i[6]=$j[6];isset($char[$i[6]]);$i[6]++) {

Do you really need that many nested loops?

Are you running this from the shell? or through the web server?

Try monitoring the progress by having the script write some output now and then, such as memory usage, execution time, point in execution, etc.

http://us2.php.net/manual/en/function.memory-get-usage.php
http://us2.php.net/manual/en/function.microtime.php

You can also try just a few entries, and use a tool like XDebug to profile it.
http://xdebug.org/

The thing is that it can spend minutes on possibly just the one mysql query. I have timed this script and took over 8 minutes and counting hogging all of my server recourses. It is almost as if it's in an infinit loop when it's not doing anything during that time. But I don't see any infinit loop in that script and haven't go a clue as to why it would take so long and so many recourses to use that script.

The thing is that it can spend minutes on possibly just the one mysql query. I have timed this script and took over 8 minutes and counting hogging all of my server recourses. It is almost as if it's in an infinit loop when it's not doing anything during that time. But I don't see any infinit loop in that script and haven't go a clue as to why it would take so long and so many recourses to use that script.

Take a look at computational complexity theory: http://en.wikipedia.org/wiki/Computational_complexity_theory

The main issue is the number of nested loops. The other is the mysql query used to count rows selects all the rows. You need to use:

select count(*) from table

This retrieves the number of rows and is cached until there is an insert on that table.

How can the number of loops I have be a factor in the speed because I set it so that all loops break after seven rounds. Also I modified the mysql_num_rows to mysql_fetch_assoc with the count(*) and now it completes 8 rows in 7 minutes. However my main problem is that when it inserts the row, it doesn't append at the end of the table. It chooses any spot to insert near the end at random. And this could cause problems. Also as for why I have so many loops, it is because I need that end up with that many digits with every possible combination.

So as for the current problem, speed and making the rows insert at the end of the table. My current script is as follows:

<?php
set_time_limit(90);
$load=file_get_contents('/proc/loadavg');
$load=explode(' ',$load);
echo 'Load='.$load[0];
if ($load[0]<=0.40) {
mysql_connect('localhost','username','password');
mysql_select_db('database');
$char=array('','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','1','2','3','4','5','6','7','8','9','0','~','`','!','@','#','$','%','^','&','*','(',')','-','_','+','=','\\','|','{','}','[',']',';',':','"','\'',',','<','.','>','?','/',' ');
$bchar=array(''=>0,'a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5,'f'=>6,'g'=>7,'h'=>8,'i'=>9,'j'=>10,'k'=>11,'l'=>12,'m'=>13,'n'=>14,'o'=>15,'p'=>16,'q'=>17,'r'=>18,'s'=>19,'t'=>20,'u'=>21,'v'=>22,'w'=>23,'x'=>24,'y'=>25,'z'=>26,'A'=>27,'B'=>28,'C'=>29,'D'=>30,'E'=>31,'F'=>32,'G'=>33,'H'=>34,'I'=>35,'J'=>36,'K'=>37,'L'=>38,'M'=>39,'N'=>40,'O'=>41,'P'=>42,'Q'=>43,'R'=>44,'S'=>45,'T'=>46,'U'=>47,'V'=>48,'W'=>49,'X'=>50,'Y'=>51,'Z'=>52,'1'=>53,'2'=>54,'3'=>55,'4'=>56,'5'=>57,'6'=>58,'7'=>59,'8'=>60,'9'=>61,'0'=>62,'~'=>63,'`'=>64,'!'=>65,'@'=>66,'#'=>67,'$'=>68,'%'=>69,'^'=>70,'&'=>71,'*'=>72,'('=>73,')'=>74,'-'=>75,'_'=>76,'+'=>77,'='=>78,'\\'=>79,'|'=>80,'{'=>81,'}'=>82,'['=>83,']'=>84,';'=>85,':'=>86,'"'=>87,'\''=>88,','=>89,'<'=>90,'.'=>91,'>'=>92,'?'=>93,'/'=>94,' '=>95,"\r"=>0);
$r=mysql_query('SELECT count(*) FROM `hash`');
$rrr=mysql_fetch_assoc($r);
if ($rrr['count(*)']>0) {
    $re=mysql_query('SELECT uncompress(`id`) as `id` FROM `hash` LIMIT '.($rrr['count(*)']-1).',1');
    $d=mysql_fetch_assoc($re);
    $d['id']=$d['id'];
    if (strlen($d['id'])<7) {
        while (strlen($d['id'])<7) {
              $d['id']="\r".$d['id'];
              }
        }
    $x=str_split($d['id'],1);
    $j[1]=$bchar[$x[0]];
    $j[2]=$bchar[$x[1]];
    $j[3]=$bchar[$x[2]];
    $j[4]=$bchar[$x[3]];
    $j[5]=$bchar[$x[4]];
    $j[6]=$bchar[$x[5]];
    $j[7]=$bchar[$x[6]];
    $j[7]+=1;
    if ($j[7]>95) {
        $j[6]+=1;
        $j[7]=0;
        }
    } else {
    $j[1]=0;
    $j[2]=0;
    $j[3]=0;
    $j[4]=0;
    $j[5]=0;
    $j[6]=0;
    $j[7]=0;
    }
unset($bchar);
$m=0;
$l=0;
$p=true;
$passgo=0;
$sleeper=0;
for ($i[1]=$j[1];$i[1]<15;$i[1]++) { //16.0655625GB Database
    for ($i[2]=$j[2];isset($char[$i[2]]);$i[2]++) {
        for ($i[3]=$j[3];isset($char[$i[3]]);$i[3]++) {
            for ($i[4]=$j[4];isset($char[$i[4]]);$i[4]++) {
                for ($i[5]=$j[5];isset($char[$i[5]]);$i[5]++) {
                    for ($i[6]=$j[6];isset($char[$i[6]]);$i[6]++) {
                        $m+=$l;
                        $l=0;
                        for ($i[7]=$j[7];isset($char[$i[7]]);$i[7]++) {
                            if ((!empty($i[6]) && empty($i[7])) ||  (!empty($i[5]) && (empty($i[6]) || empty($i[7]))) ||  (!empty($i[4]) && (empty($i[5]) || empty($i[6]) || empty($i[7]))) ||  (!empty($i[3]) && (empty($i[4]) || empty($i[5]) || empty($i[6]) || empty($i[7]))) ||  (!empty($i[2]) && (empty($i[3]) || empty($i[4]) || empty($i[5]) || empty($i[6]) || empty($i[7]))) ||  (!empty($i[1]) && (empty($i[2]) || empty($i[3]) || empty($i[4]) || empty($i[5]) || empty($i[6]) || empty($i[7])))) {
                            } else {
                            $v=$char[$i[1]].$char[$i[2]].$char[$i[3]].$char[$i[4]].$char[$i[5]].$char[$i[6]].$char[$i[7]];
                            $z=hash('sha1',$v);
                            $hash=mysql_real_escape_string(substr($z,0,4).hash('crc32',$z).hash('crc32b',$z));
                            $s=mysql_real_escape_string($v);
                            if (($l+$m)<8) {
                                $r=mysql_query('SELECT count(*) FROM `hash` WHERE `id`=compress("'.$s.'") AND `sha1`=compress("'.$hash.'")');
                                $rrrr=mysql_fetch_assoc($r);
                                if ($rrrr['count(*)']==0) {
                                    mysql_query('INSERT INTO `hash` SET `id`=compress("'.$s.'"), `crc32`=compress("'.mysql_real_escape_string(hash('crc32',$v)).'"), `crc32b`=compress("'.mysql_real_escape_string(hash('crc32b',$v)).'"), `sha1`=compress("'.$hash.'")');$l++;
                                    } else {
                                    if (($l+$m)>5) { $exits=true; break; }
                                    }
                                } else {
                                mysql_query('INSERT INTO `hash` SET `id`=compress("'.$s.'"), `crc32`=compress("'.mysql_real_escape_string(hash('crc32',$v)).'"), `crc32b`=compress("'.mysql_real_escape_string(hash('crc32b',$v)).'"), `sha1`=compress("'.$hash.'")');$l++;
                                usleep(250000);
                                    //1000000
                                }
                            //if ($l==45){sleep(3);}
                            }
                            if(($l+$m)>7) {
                                $exits=true;
                                break;
                                }
                            $j[7]=0;
                            }
                        //sleep(4);
                        if ($exits==true) { break; }
                        $j[6]=0;
                        }
                    if ($exits==true) { break; }
                    $j[5]=0;
                    }
                if ($exits==true) { break; }
                $j[4]=0;
                }
            if ($exits==true) { break; }
            $j[3]=0;
            }
        if ($exits==true) { break; }
        $j[2]=0;
        }
    if ($exits==true) { break; }
    $j[1]=0;
    }
flush();
} //end server load if
?>

Also as for why I have so many loops, it is because I need that end up with that many digits with every possible combination.

Do you only have only one instance of this script running, or many?

There really isn't such as thing at the bottom of the table in SQL. When you view an SQL table (select * from table), you're just viewing the default order, which is ASC by the primary index. The is no guaranteed order based on insert time.

I only have one instance of the script at any given time and when I try to sort the 446,600 rows in the table by ascending order, it takes ages to display the last couple of rows with the limit parameter. It never displayed but I exit the display script after about 7 minutes. So is there any way around this like there was for the count(*) function. Because it seems mysql is having trouble handling huge tables (43MB compressed).

I only have one instance of the script at any given time and when I try to sort the 446,600 rows in the table by ascending order, it takes ages to display the last couple of rows with the limit parameter. It never displayed but I exit the display script after about 7 minutes. So is there any way around this like there was for the count(*) function. Because it seems mysql is having trouble handling huge tables (43MB compressed).

What are you sorting by? That makes a lot of difference.

You need an index on what ever column you plan to sort by. Also index any column you use in a conditional expression.

Eg:

select * from table where column = 'example';

In the example you'd want an index on "column".

http://www.informit.com/articles/article.aspx?p=377652

Also, are you sure the compression is what you need. If you're compressing things such as SHA1 hashes, then most likely, you're not saving much, if not adding more by compressing.

Try working with a small subset of your data, say 10 000 rows. Then benchmark each query. See if indexing outweighs the storage increase, etc. See if compression actually makes a difference.

ie; for compression, save the compressed and uncompressed then get the average length of each to see how much space you saved;

select (AVG(LENGTH(compressed)) - AVG(LENGTH(uncompressed))) as space_saved FROM table;

For speed do queries on the compressed, and uncompressed benchmarking each and viewing your speed difference.

That way you have a better weighting of the options you have.

MySQL optimization:
http://dev.mysql.com/doc/refman/5.0/en/optimization.html

I did the following mysql query and got 11.9401

[B]SELECT[/B] (
AVG( LENGTH( `sha1` ) ) - AVG( LENGTH( uncompress( `sha1` ) ) ) 
) AS space_saved
FROM `hash`

Currently my database structure is a column for the original string and three hash columns. Should I add another column that's an integer to sort by asc and uncompress the id column. To help explain the following is my database now:

id    [B]-[/B]blob-compressed
crc32 -blob-compressed
crc32b-blob-compressed
sha1  -blob-compressed

And should it be converted to
row   -int //sort by to display entire database or get last row
id    -text //for dehasher generator to check if entry exists
crc32 -blob-compressed
crc32b-blob-compressed
sha1  -blob-compressed

Thanks for the replies.

If you add a primary ID, and make it auto increment. The database will automatically add the values to each row.

From your query:

SELECT (
AVG( LENGTH( `sha1` ) ) - AVG( LENGTH( uncompress( `sha1` ) ) ) 
) AS space_saved
FROM `hash`

And the result:

11.9401

That means you're adding about 12 chars each time you compress.
So compression is actually taking up more space. This is because SHA hardy has an repeating patterns and thus would not gain from compression.

The same would go for CRC functions.

Should I add another column that's an integer to sort by asc and uncompress the id column.

Given that compression isn't working, you should not compress anything.

Then I shall adjust the code with no compression and add the new field to sort by and see how it goes. Hopefully this will be faster and I will see if I can eventually make a custom php compression like I did for the sha1 column. I will let you know how the results go but may take some time to populate the database.

I managed to get it working with it generating 30000 entries per minute but is there any way to convert numbers and letters into all symbols to make the length shorter? It only needs to be a one way encryption though.

I thought I would let you know that I am working on an algorithm that will convert a string to a unique number.

I managed to invent a compression algorithm using my own formulas. Below is my current script:

<?php
set_time_limit(90);
$load=file_get_contents('/proc/loadavg');
$load=explode(' ',$load);
echo 'Load='.$load[0];
if ($load[0]<=0.60) {
mysql_connect('localhost','user','pass');
mysql_select_db('database');
function compress_string($string) {
    $charconvert=array('a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5,'f'=>6,'g'=>7,'h'=>8,'i'=>9, 'j'=>10,'k'=>11,'l'=>12,'m'=>13,'n'=>14,'o'=>15,'p'=>16,'q'=>17, 'r'=>18,'s'=>19,'t'=>20,'u'=>21,'v'=>22,'w'=>23,'x'=>24,'y'=>25, 'z'=>26,'1'=>27,'2'=>28,'3'=>29,'4'=>30,'5'=>31,'6'=>32,'7'=>33, '8'=>34,'9'=>35,'0'=>36);
    $num=1;
    for ($i=0;isset($string[$i]);$i++){
        $num*=$charconvert[$string[$i]];
        $num+=$charconvert[$string[$i]];
        }
    return str_replace(array('10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47','48','49','50','51','52','53','54','55','56','57','58','59','60','61','62','63','64','65','66','67','68','69'),array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','~','`','!','@','#','$','%','^','&','*','(',')','-','_','=','+','\\','|','[',']','{','}',';',':','"','\'','<',',','>','.','/','?',' '),$num);
    }

$char=array('','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','1','2','3','4','5','6','7','8','9','0','~','`','!','@','#','$','%','^','&','*','(',')','-','_','+','=','\\','|','{','}','[',']',';',':','"','\'',',','<','.','>','?','/',' ');
$bchar=array(''=>0,'a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5,'f'=>6,'g'=>7,'h'=>8,'i'=>9,'j'=>10,'k'=>11,'l'=>12,'m'=>13,'n'=>14,'o'=>15,'p'=>16,'q'=>17,'r'=>18,'s'=>19,'t'=>20,'u'=>21,'v'=>22,'w'=>23,'x'=>24,'y'=>25,'z'=>26,'A'=>27,'B'=>28,'C'=>29,'D'=>30,'E'=>31,'F'=>32,'G'=>33,'H'=>34,'I'=>35,'J'=>36,'K'=>37,'L'=>38,'M'=>39,'N'=>40,'O'=>41,'P'=>42,'Q'=>43,'R'=>44,'S'=>45,'T'=>46,'U'=>47,'V'=>48,'W'=>49,'X'=>50,'Y'=>51,'Z'=>52,'1'=>53,'2'=>54,'3'=>55,'4'=>56,'5'=>57,'6'=>58,'7'=>59,'8'=>60,'9'=>61,'0'=>62,'~'=>63,'`'=>64,'!'=>65,'@'=>66,'#'=>67,'$'=>68,'%'=>69,'^'=>70,'&'=>71,'*'=>72,'('=>73,')'=>74,'-'=>75,'_'=>76,'+'=>77,'='=>78,'\\'=>79,'|'=>80,'{'=>81,'}'=>82,'['=>83,']'=>84,';'=>85,':'=>86,'"'=>87,'\''=>88,','=>89,'<'=>90,'.'=>91,'>'=>92,'?'=>93,'/'=>94,' '=>95,"\r"=>0);
$r=mysql_query('SELECT count(*) FROM `hash`');
$rrr=mysql_fetch_assoc($r);
if ($rrr['count(*)']>0) {
    $re=mysql_query('SELECT `id`, `row` FROM `hash` ORDER BY `row` DESC LIMIT 1') or die(mysql_error());
    $d=mysql_fetch_assoc($re);
    $rownum=$d['row'];
    ($d['row']);
    $d['id']=$d['id'];
    if (strlen($d['id'])<7) {
        while (strlen($d['id'])<7) {
              $d['id']="\r".$d['id'];
              }
        }
    $x=str_split($d['id'],1);
    $j[1]=$bchar[$x[0]];
    $j[2]=$bchar[$x[1]];
    $j[3]=$bchar[$x[2]];
    $j[4]=$bchar[$x[3]];
    $j[5]=$bchar[$x[4]];
    $j[6]=$bchar[$x[5]];
    $j[7]=$bchar[$x[6]];
    $j[7]+=1;
    if ($j[7]>95) {
        $j[6]+=1;
        $j[7]=0;
        }
    } else {
    $rownum=0;
    $j[1]=0;
    $j[2]=0;
    $j[3]=0;
    $j[4]=0;
    $j[5]=0;
    $j[6]=0;
    $j[7]=0;
    }
unset($bchar);
$m=0;
$l=0;
$p=true;
$passgo=0;
$sleeper=0;
for ($i[1]=$j[1];$i[1]<15;$i[1]++) {
    for ($i[2]=$j[2];isset($char[$i[2]]);$i[2]++) {
        for ($i[3]=$j[3];isset($char[$i[3]]);$i[3]++) {
            for ($i[4]=$j[4];isset($char[$i[4]]);$i[4]++) {
                for ($i[5]=$j[5];isset($char[$i[5]]);$i[5]++) {
                    for ($i[6]=$j[6];isset($char[$i[6]]);$i[6]++) {
                        $m+=$l;
                        $l=0;
                        for ($i[7]=$j[7];isset($char[$i[7]]);$i[7]++) {
                            if ((!empty($i[6]) && empty($i[7])) ||  (!empty($i[5]) && (empty($i[6]) || empty($i[7]))) ||  (!empty($i[4]) && (empty($i[5]) || empty($i[6]) || empty($i[7]))) ||  (!empty($i[3]) && (empty($i[4]) || empty($i[5]) || empty($i[6]) || empty($i[7]))) ||  (!empty($i[2]) && (empty($i[3]) || empty($i[4]) || empty($i[5]) || empty($i[6]) || empty($i[7]))) ||  (!empty($i[1]) && (empty($i[2]) || empty($i[3]) || empty($i[4]) || empty($i[5]) || empty($i[6]) || empty($i[7])))) {
                            } else {
                            $v=$char[$i[1]].$char[$i[2]].$char[$i[3]].$char[$i[4]].$char[$i[5]].$char[$i[6]].$char[$i[7]];
                            $z=hash('sha1',$v);
                            $hash=mysql_real_escape_string(compress_string(substr($z,0,4).hash('crc32',$z).hash('crc32b',$z)));
                            $s=mysql_real_escape_string($v);
                            if (($l+$m)<8) {
                                $r=mysql_query('SELECT count(*) FROM `hash` WHERE `id`="'.$s.'" AND `sha1`="'.$hash.'"');
                                $rrrr=mysql_fetch_assoc($r);
                                if ($rrrr['count(*)']==0) {
                                    $rownum++;
                                    mysql_query('INSERT INTO `hash` SET `row`='.$rownum.', `id`="'.$s.'", `crc32`="'.mysql_real_escape_string(compress_string(hash('crc32',$v))).'", `crc32b`="'.mysql_real_escape_string(compress_string(hash('crc32b',$v))).'", `sha1`="'.$hash.'"');$l++;
                                    } else {
                                    if (($l+$m)>5) { $exits=true; break; }
                                    }
                                } else {
                                $rownum++;
                                mysql_query('INSERT INTO `hash` SET `row`='.$rownum.', `id`="'.$s.'", `crc32`="'.mysql_real_escape_string(compress_string(hash('crc32',$v))).'", `crc32b`="'.mysql_real_escape_string(compress_string(hash('crc32b',$v))).'", `sha1`="'.$hash.'"');$l++;
                                if (($rownum/3)==round($rownum/3)) {
                                    usleep(400);
                                     //1000000
                                  }
                                }
                            //if ($l==45){sleep(3);}
                            }
                            if(($l+$m)>50000) {
                                $exits=true;
                                break;
                                }
                            $j[7]=0;
                            }
                        //sleep(4);
                        if ($exits==true) { break; }
                        $j[6]=0;
                        }
                    if ($exits==true) { break; }
                    $j[5]=0;
                    }
                if ($exits==true) { break; }
                $j[4]=0;
                }
            if ($exits==true) { break; }
            $j[3]=0;
            }
        if ($exits==true) { break; }
        $j[2]=0;
        }
    if ($exits==true) { break; }
    $j[1]=0;
    }
flush();
} //end server load if
?>

And my compression algorithm which only works for string with numbers and letters:

function compress_string($string) {
    $charconvert=array('a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5,'f'=>6,'g'=>7,'h'=>8,'i'=>9, 'j'=>10,'k'=>11,'l'=>12,'m'=>13,'n'=>14,'o'=>15,'p'=>16,'q'=>17, 'r'=>18,'s'=>19,'t'=>20,'u'=>21,'v'=>22,'w'=>23,'x'=>24,'y'=>25, 'z'=>26,'1'=>27,'2'=>28,'3'=>29,'4'=>30,'5'=>31,'6'=>32,'7'=>33, '8'=>34,'9'=>35,'0'=>36);
    $num=1;
    for ($i=0;isset($string[$i]);$i++){
        $num*=$charconvert[$string[$i]];
        $num+=$charconvert[$string[$i]];
        }
    return str_replace(array('10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25', '26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41', '42','43','44','45','46','47','48','49','50','51','52','53','54','55','56','57', '58','59','60','61','62','63','64','65','66','67','68','69'),array('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x', 'y','z','~','`','!','@','#','$','%','^','&','*','(',')','-','_','=','+','\\','|','[',']', '{','}',';',':','"','\'','<',',','>','.','/','?',' '),$num);
    }

At the moment it seems to be working but can anybody see any floors in that algorithm that may cause multiple matches? I can't see any. Just thought I would ask.

Why not just use crc32() ?

Example:

echo base_convert(crc32(sha1('test')), 10, 36); // rn213k

base_convert() `compresses` it by using 36 chars as the base/radix, instead of 10. You could use the whole ASCII table characters as the base if you wanted, making the string much shorter.

You can check if you have duplicates with an SQL query. Something like: select count(distinct(column)) = count(column) from table; If you get a false or 0, then there are duplicates in "column".

I can't see how base_convert can convert a selected range of characters to a even larger range of characters with shorter length. For example I tried the following and it just removed a few characters:

echo base_convert('a1a', 10, 36); //displays   1
echo base_convert('a1b', 10, 36);  //displays   1

And is there something that I'm doing wrong or is that how it works.

I can't see how base_convert can convert a selected range of characters to a even larger range of characters with shorter length. For example I tried the following and it just removed a few characters:

echo base_convert('a1a', 10, 36); //displays   1
echo base_convert('a1b', 10, 36);  //displays   1

And is there something that I'm doing wrong or is that how it works.

You need a valid base. 10 is not a valid base for a string containing a-f.

With sha1() the string is a hex, base 16. ie: 0-9a-f

echo base_convert(sha('test'), 16, 36);

Here is a function that can convert a decimal to a base below 255, basically using all the ASCII characters.

/**
 * Convert Decimal to a base less then 255
 *
 * @param Int $num
 * @param Int $base (2-255)
 * @return ASCII String
 */
function ascii_base($num, $base = 255) {
	if ($num < 0) $num = -$num;
	$ret = array();
	while($num > $base) {
		$rem = $num%$base;
		$num = floor($num/$base);
		$ret[] = chr($rem);
	}
	$ret[] = chr($num);
	return implode('', array_reverse($ret));
}

Tests:

$sha = sha1('test');
$dec = base_convert($sha, 16, 10);
$crc = crc32(sha1('test'));

var_dump($sha);
echo "SHA1\n";
var_dump($dec, base_convert($dec, 10, 36), ascii_base($dec));
echo "CRC\n";
var_dump($crc, base_convert($crc, 10, 36), ascii_base($crc));

Results:

string(40) "a94a8fe5ccb19ba61c4c0873d391e987982fbbd3"
SHA1
string(48) "966482230667555200682260428404046286406804464248"
string(31) "jrwjerxielwcg8wo8kswgk08s4c08w8"
string(20) "¶[ãˆY1Y
CRC
int(-1671312656)
string(6) "rn213k"
string(4) "dÊ?G"

Using just ascii_base() you can get a sha1('test') from 40 chars to 20 chars.

Using crc32() first on the char, you get it down to 4 chars. However, crc32 cannot be reversed (lossy), and if you're worried about uniqueness, sha1 has more uniqueness then crc32.

I just tried your code and although it is excellent on the small scale I tried using it like the following and got a memory limit error:

function ascii_base($num, $base = 255) {
	if ($num < 0) $num = -$num;
	$ret = array();
	while($num > $base) {
		$rem = $num%$base;
		$num = floor($num/$base);
		$ret[] = chr($rem);
	}
	$ret[] = chr($num);
	return implode('', array_reverse($ret));
}
echo '<hr>';
ascii_base('0dfb4f4a'); //crc32
ascii_base('140e363f'); //crc32b
$sha1='05a79f06cf3f67f726dae68d18a2290f6c9a50c9';
ascii_base(substr($sha1,0,4).hash('crc32',$sha1).hash('crc32b',$sha1));

Also this function is meant to be used on 50000 mysql querys for 3 fields which makes the function used a total of 150000 times. So is there some way to free up the memory so it can be used 150000 times per page execution as it doesn't even work on 1 mysql query. Thanks for the replies.

I just discovered that when using ascii_base() on the crc32 hash of 9 and 0 they both end up with blank strings which is a bit of a bug. For now I might try and refine my compression function. I can't really do much to edit your function because it has so many elements I haven't seen before.

I just discovered that when using ascii_base() on the crc32 hash of 9 and 0 they both end up with blank strings which is a bit of a bug. For now I might try and refine my compression function. I can't really do much to edit your function because it has so many elements I haven't seen before.

You'll get blank strings in some cases. However, if you check the string length, you'll noticed it is comprised of chars. Not all characters in the ASCII table are visible. You however, still have bytes in the string.

However, I've noticed that the function does not work for very large integers due to PHP not being able to do arithmetic on them.

There are some work arounds to this in the comments on:
http://www.php.net/manual/en/function.base-convert.php

If you have bcmath enabled, you can rely on it to do the arithmetic correctly.

Below is the function modified to use BCMath.

if (!function_exists('bcdiv')) {
	//echo "No BC Math\n";
	function bcdiv($dividend, $divisor) {
		$quotient = floor($dividend/$divisor);
		return $quotient;
	}
	function bcmod($dividend, $modulo) {
		$remainder = $dividend%$modulo;
		return $remainder;
	}
} else {
	//echo "Using BC Math\n";
}

/**
 * Convert Decimal to a base less then 255 comprised of ASCII chars
 *
 * @param Int $num
 * @param Int $base (2-255)
 * @return ASCII String
 */
function base255($num, $base = 255) {
	if ($num < 0) $num = -$num;
	$ret = array();
	while($num > $base) {
		$rem = bcmod($num, $base);
		$num = bcdiv($num, $base);
		$ret[] = chr($rem);
	}
	$ret[] = chr($num);
	return implode('', array_reverse($ret));
}

I renamed it to base255 so it makes more sense. It should now give you correct values if you have bcmath.

I just profiled the function. It doesn't seem to use much memory at all. Just around 260Kb at the most. I tested both with and without BCMath. Are you sure it isn't something else?

I can't really do much to edit your function because it has so many elements I haven't seen before.

It is actually very basic.

The only odd operations used are:

% - modulo or remainder
chr() - return the character represented by a number in ASCII table
floor() - round down the float to an int

The modulo returns the remainder after dividing

eg: 5%2 = 1
ie: 5/2 = 2 remainder 1

chr(96) = a
The letter a is represented by the number 96 in ASCII

Something like:

$chr = array(96=>'a', 97=>'b' ... 255);
so chr(96) = $chr[96];

And floor just removes everything after the decimal point.

eg: 5/2 = 2.5
floor(2.5) = 2


Here is the function with comments:

/**
 * Convert Decimal to a base less then 255 comprised of ASCII chars
 *
 * @param Int $num
 * @param Int $base (2-255)
 * @return ASCII String
 */
function base255($num, $base = 255) {
 	// remove the negative sign by multiplying by -1 if $num is negative
	if ($num < 0) $num = -$num;
	// an array to hold the digits of the new number
	$ret = array();
	// while the number is larger then our base, we just keep dividing it by the base
	while($num > $base) {
		// get the remainder after dividing by the base
		$rem = bcmod($num, $base);
		// divide by the base to move up one unit
		$num = bcdiv($num, $base);
		// the remainders of each division, make up the new number
		// we save the character the remainder represents in ASCII so we only have to save one character, instead of the number
		$ret[] = chr($rem);
	}
	// since the number is less then the base, it is the remainder itself
	$ret[] = chr($num);
	// we reverse the order of chars, since we started calculating remainders from the smallest unit
	return implode('', array_reverse($ret));
}

I think its simplest to look at it when converting base 10 to base 10.

123 would be:

123/10 = 12 R 3
12/10 = 1 R 2
1
---------------
1 R2 R3 or 123

In order to do the first line with PHP: 123/10 = 12 R 3 We need to do:

$number = floor(123/10); // 12
$remainder = 123%10; // 3

I hope that helps.

commented: liking it +22

I managed to make a better function which doesn't have the gap symbols and is as follows:

function compress_string($string) {
    $str=array();
    $charconvert=array('a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5,'f'=>6,'1'=>7,'2'=>8,'3'=>9,'4'=>10,'5'=>11,'6'=>12,'7'=>13,'8'=>14,'9'=>15,'0'=>16);
    $arr=str_split($string,2);
    while (!empty($arr)) {
        for ($i=0;isset($arr[$i]);$i++) {
            $char=str_split($arr[$i],1);
            unset($arr[$i]);
            $v=($charconvert[$char[0]]*$charconvert[$char[1]])+32;
            if ($v<256) {
                $str[]=chr($v);
                } else {
                $str[]=chr($charconvert[$char[0]]);
                $arr[$i]=$char[1];
                $arr=implode('',$arr);
                }
            }
        }
    return implode('',$str);
    }

The above function I made compresses it to half the size and skips the first 32 characters on the ascii table which are useless to me. I will try this function for a few days and see how it works and hopefully this will be the function.

Previous post Edit:
I discovered my function had a few memory leeks and fixed it to end up being the following:

function compress_string($string) {
    $str=array();
    $charconvert=array('a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5,'f'=>6,'1'=>7,
    '2'=>8,'3'=>9,'4'=>10,'5'=>11,'6'=>12,'7'=>13,'8'=>14,'9'=>15,'0'=>16);
    $arr=str_split($string,2);
    while (!empty($arr[0]) || $arr[0]===0) {
        for ($i=0;isset($arr[$i]);$i++) {
            $char=str_split($arr[$i],1);
            $arr[$i]='';
            $v=($charconvert[$char[0]]*$charconvert[$char[1]])+32;
            if ($v<256) {
                $str[]=chr($v);
                } else {
                $str[]=$char[0];
                $arr[$i]=$char[1];
                $arrs=implode('',$arr);
                unset($arr);
                $arr=str_split($arrs,2);
                unset($arrs);
                }
            unset($v);
            }
        }
    unset($arr,$char,$charconvert);
    return implode('',$str);
    }

Previous post Edit:
I discovered my function had a few memory leeks and fixed it to end up being the following:

function compress_string($string) {
    $str=array();
    $charconvert=array('a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5,'f'=>6,'1'=>7,
    '2'=>8,'3'=>9,'4'=>10,'5'=>11,'6'=>12,'7'=>13,'8'=>14,'9'=>15,'0'=>16);
    $arr=str_split($string,2);
    while (!empty($arr[0]) || $arr[0]===0) {
        for ($i=0;isset($arr[$i]);$i++) {
            $char=str_split($arr[$i],1);
            $arr[$i]='';
            $v=($charconvert[$char[0]]*$charconvert[$char[1]])+32;
            if ($v<256) {
                $str[]=chr($v);
                } else {
                $str[]=$char[0];
                $arr[$i]=$char[1];
                $arrs=implode('',$arr);
                unset($arr);
                $arr=str_split($arrs,2);
                unset($arrs);
                }
            unset($v);
            }
        }
    unset($arr,$char,$charconvert);
    return implode('',$str);
    }

Multiplication is associative, so you'll get numerous collisions.

$v=($charconvert[$char[0]]*$charconvert[$char[1]])+32;

eg:

compress_string('42'); // p
compress_string('24'); // p

I'm not sure what you're after, so the alternatives I've given are generalizations.

Thanks for pointing that out but I should be able to code a reader that can filter the incorrect matches. So as you pointed out the string "aecd" would also have the same result as "eadc" but wouldn't "aecd" wouldn't match "aced". Also the normal work around if there were enough symbols is the following line.

$v=($charconvert[$char[0]]*$charconvert[$char[1]])+32-$tmp;

However I have another work around which is when pulling the the data from the database, to rehash the original data and to see if it matches what was requested. An example is as follows:

$_GET['q']=trim($_GET['q']);
        function compress_string($string) {
            $str=array();
            $charconvert=array('a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5,'f'=>6,'1'=>7,'2'=>8,'3'=>9,'4'=>10,'5'=>11,'6'=>12,'7'=>13,'8'=>14,'9'=>15,'0'=>16);
            $arr=str_split($string,2);
            while (!empty($arr[0]) || $arr[0]===0) {
                for ($i=0;isset($arr[$i]);$i++) {
                    $char=str_split($arr[$i],1);
                    $arr[$i]='';
                    if (empty($charconvert[$char[1]])) {
                    $tmp=1; } else {
                    $tmp=$charconvert[$char[1]];
                    }
                    
                    $v=($charconvert[$char[0]]*$tmp)+32;
                    if ($v<256) {
                        $str[]=chr($v);
                        } else {
                        $str[]=$char[0];
                        $arr[$i]=$char[1];
                        $arrs=implode('',$arr);
                        unset($arr);
                        $arr=str_split($arrs,2);
                        unset($arrs);
                        }
                    unset($v,$tmp);
                    }
                }
            unset($arr,$char,$charconvert);
            return implode('',$str);
            }
        if ($_GET['hash']>0) {
            $r=mysql_query('SELECT `id` FROM `hash` WHERE `'.$hash.'`="'.mysql_real_escape_string(compress_string($_GET['q'])).'"');
            } else {
            $r=mysql_query('SELECT `id` FROM `hash` WHERE `sha1`="'.mysql_real_escape_string(compress_string(substr($_GET['q'],0,4).hash('crc32',$_GET['q']).hash('crc32b',$_GET['q']))).'"');
            }
        if (mysql_num_rows($r)==0) {
            echo '<table border=0 cellpadding=3 cellspacing=0 bgcolor="#D0D0D0"><tr bgcolor="#D0D0D0"><td bgcolor="#D0D0D0"><b>No Results found for '.htmlentities($_GET['q'],ENT_QUOTES).'</b></td></tr></table>'."\r\n";
            } else {
            echo '<table border=1 cellpadding=2 cellspacing=0 style="border-top:1px; border-top-color:#FFFFFF"><tr bgcolor="#D0D0D0" style="font-family:arial; font-weight:bolder; border-top:1px; border-top-color:#FFFFFF"><td bgcolor="#D0D0D0">Tanslation</td><td bgcolor="#D0D0D0">SHA1</td><td bgcolor="#D0D0D0">Crc32</td><td bgcolor="#D0D0D0">Crc32b</td></tr>'."\r\n";
            while ($data=mysql_fetch_assoc($r)) {
                if ($_GET['q']==hash($hash,$data['id'])) {
                    echo '<tr><td bgcolor="#D0FFFF"><textarea style="width:'.((strlen($data['id'])*10)).'px; height:16px; overflow-y:hidden;" scrolling=no>'.$data['id'].'</textarea></td><td>'.hash('sha1',$data['id']).'</td><td>'.hash('crc32',$data['id']).'</td><td>'.hash('crc32b',$data['id'])."</td></tr>\r\n";
                    }
                }
            echo "</table>\r\n";
            }

Also I did a test and for some reason my script does not always suffer from that bug or at least on my test. But I did alter the function a to the following:

function compress_string($string) {
            $str=array();
            $charconvert=array('a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5,'f'=>6,'1'=>7,'2'=>8,'3'=>9,'4'=>10,'5'=>11,'6'=>12,'7'=>13,'8'=>14,'9'=>15,'0'=>16);
            $arr=str_split($string,2);
            while (!empty($arr[0]) || $arr[0]===0) {
                for ($i=0;isset($arr[$i]);$i++) {
                    $char=str_split($arr[$i],1);
                    $arr[$i]='';
                    if (empty($charconvert[$char[1]])) {
                    $tmp=1; } else {
                    $tmp=$charconvert[$char[1]];
                    }
                    
                    $v=($charconvert[$char[0]]*$tmp)+32;
                    if ($v<256) {
                        $str[]=chr($v);
                        } else {
                        $str[]=$char[0];
                        $arr[$i]=$char[1];
                        $arrs=implode('',$arr);
                        unset($arr);
                        $arr=str_split($arrs,2);
                        unset($arrs);
                        }
                    unset($v,$tmp);
                    }
                }
            unset($arr,$char,$charconvert);
            return implode('',$str);
            }

So I guess I'm just lucky that bug doesn't happen on my all the time but still will add that second validator. I also calculated that in 13 days I can fill 30 GB of dehashing data calculating to at least 5 digits. So the script works for my needs even with it's bug of reverse characters having same match. Because mysql can still filter the results from millions of rows to a few dozen it should still do the job.

Thanks for pointing that out but I should be able to code a reader that can filter the incorrect matches. So as you pointed out the string "aecd" would also have the same result as "eadc" but wouldn't "aecd" wouldn't match "aced". Also the normal work around if there were enough symbols is the following line.

$v=($charconvert[$char[0]]*$charconvert[$char[1]])+32-$tmp;

However I have another work around which is when pulling the the data from the database, to rehash the original data and to see if it matches what was requested. An example is as follows:

$_GET['q']=trim($_GET['q']);
        function compress_string($string) {
            $str=array();
            $charconvert=array('a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5,'f'=>6,'1'=>7,'2'=>8,'3'=>9,'4'=>10,'5'=>11,'6'=>12,'7'=>13,'8'=>14,'9'=>15,'0'=>16);
            $arr=str_split($string,2);
            while (!empty($arr[0]) || $arr[0]===0) {
                for ($i=0;isset($arr[$i]);$i++) {
                    $char=str_split($arr[$i],1);
                    $arr[$i]='';
                    if (empty($charconvert[$char[1]])) {
                    $tmp=1; } else {
                    $tmp=$charconvert[$char[1]];
                    }
                    
                    $v=($charconvert[$char[0]]*$tmp)+32;
                    if ($v<256) {
                        $str[]=chr($v);
                        } else {
                        $str[]=$char[0];
                        $arr[$i]=$char[1];
                        $arrs=implode('',$arr);
                        unset($arr);
                        $arr=str_split($arrs,2);
                        unset($arrs);
                        }
                    unset($v,$tmp);
                    }
                }
            unset($arr,$char,$charconvert);
            return implode('',$str);
            }
        if ($_GET['hash']>0) {
            $r=mysql_query('SELECT `id` FROM `hash` WHERE `'.$hash.'`="'.mysql_real_escape_string(compress_string($_GET['q'])).'"');
            } else {
            $r=mysql_query('SELECT `id` FROM `hash` WHERE `sha1`="'.mysql_real_escape_string(compress_string(substr($_GET['q'],0,4).hash('crc32',$_GET['q']).hash('crc32b',$_GET['q']))).'"');
            }
        if (mysql_num_rows($r)==0) {
            echo '<table border=0 cellpadding=3 cellspacing=0 bgcolor="#D0D0D0"><tr bgcolor="#D0D0D0"><td bgcolor="#D0D0D0"><b>No Results found for '.htmlentities($_GET['q'],ENT_QUOTES).'</b></td></tr></table>'."\r\n";
            } else {
            echo '<table border=1 cellpadding=2 cellspacing=0 style="border-top:1px; border-top-color:#FFFFFF"><tr bgcolor="#D0D0D0" style="font-family:arial; font-weight:bolder; border-top:1px; border-top-color:#FFFFFF"><td bgcolor="#D0D0D0">Tanslation</td><td bgcolor="#D0D0D0">SHA1</td><td bgcolor="#D0D0D0">Crc32</td><td bgcolor="#D0D0D0">Crc32b</td></tr>'."\r\n";
            while ($data=mysql_fetch_assoc($r)) {
                if ($_GET['q']==hash($hash,$data['id'])) {
                    echo '<tr><td bgcolor="#D0FFFF"><textarea style="width:'.((strlen($data['id'])*10)).'px; height:16px; overflow-y:hidden;" scrolling=no>'.$data['id'].'</textarea></td><td>'.hash('sha1',$data['id']).'</td><td>'.hash('crc32',$data['id']).'</td><td>'.hash('crc32b',$data['id'])."</td></tr>\r\n";
                    }
                }
            echo "</table>\r\n";
            }

Also I did a test and for some reason my script does not always suffer from that bug or at least on my test. But I did alter the function a to the following:

function compress_string($string) {
            $str=array();
            $charconvert=array('a'=>1,'b'=>2,'c'=>3,'d'=>4,'e'=>5,'f'=>6,'1'=>7,'2'=>8,'3'=>9,'4'=>10,'5'=>11,'6'=>12,'7'=>13,'8'=>14,'9'=>15,'0'=>16);
            $arr=str_split($string,2);
            while (!empty($arr[0]) || $arr[0]===0) {
                for ($i=0;isset($arr[$i]);$i++) {
                    $char=str_split($arr[$i],1);
                    $arr[$i]='';
                    if (empty($charconvert[$char[1]])) {
                    $tmp=1; } else {
                    $tmp=$charconvert[$char[1]];
                    }
                    
                    $v=($charconvert[$char[0]]*$tmp)+32;
                    if ($v<256) {
                        $str[]=chr($v);
                        } else {
                        $str[]=$char[0];
                        $arr[$i]=$char[1];
                        $arrs=implode('',$arr);
                        unset($arr);
                        $arr=str_split($arrs,2);
                        unset($arrs);
                        }
                    unset($v,$tmp);
                    }
                }
            unset($arr,$char,$charconvert);
            return implode('',$str);
            }

So I guess I'm just lucky that bug doesn't happen on my all the time but still will add that second validator. I also calculated that in 13 days I can fill 30 GB of dehashing data calculating to at least 5 digits. So the script works for my needs even with it's bug of reverse characters having same match. Because mysql can still filter the results from millions of rows to a few dozen it should still do the job.

The reason you don't get duplicates is that a SHA1 is very unique. Thus even with the redundancy introduced by the function, it still does not collide with others.

It is the same as just cutting the SHA1 in half, and keeping the first half. You will will have a low probability of collisions.

So a function like:

function compress($sha) {
   $parts = str_split($sha, 20);
   return $parts[0];
}

would achieve the same.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.