Good day,

I wish to print out the content of the files uploaded by the user from a input file type. It is passed to backend using post method.
I have done some research but unforturnately I still haven't found a solution that satisfy me. The answer that are close to what I search for is http://be.php.net/manual/vote-note.php?id=91051&page=function.mb-detect-encoding&vote=up
I can get the encoding type of the files and decode it relatively but in the end, other language such as chinese char content won't be shown correctly.

Hopefully someone having same expirience with me previously can lend a hand to me. Thanks in advance.

Member Avatar

diafol

Show your display page - have you set meta tag or php header to UTF-8?

below is the code I done to dump out the content:

if(isset($_FILES['files'])){
    header('Content-Type: text/html; charset=utf-8');
    // Unicode BOM is U+FEFF, but after encoded, it will look like this.
    define ('UTF32_BIG_ENDIAN_BOM'   , chr(0x00) . chr(0x00) . chr(0xFE) . chr(0xFF));
    define ('UTF32_LITTLE_ENDIAN_BOM', chr(0xFF) . chr(0xFE) . chr(0x00) . chr(0x00));
    define ('UTF16_BIG_ENDIAN_BOM'   , chr(0xFE) . chr(0xFF));
    define ('UTF16_LITTLE_ENDIAN_BOM', chr(0xFF) . chr(0xFE));
    define ('UTF8_BOM'               , chr(0xEF) . chr(0xBB) . chr(0xBF));

    $file = ($_FILES['files']);

    $filename = ($file['tmp_name']);
    $text = file_get_contents($filename);
    echo "<pre>";
    $first2 = substr($text, 0, 2);
    $first3 = substr($text, 0, 3);
    $first4 = substr($text, 0, 3);

    if ($first3 == UTF8_BOM){
        echo str_replace(UTF8_BOM, "", $text);
        $code = "UTF-8";
    }elseif ($first4 == UTF32_BIG_ENDIAN_BOM){
        echo str_replace(UTF32_BIG_ENDIAN_BOM, "", $text);
        $code = "UTF-32";
    }elseif ($first4 == UTF32_LITTLE_ENDIAN_BOM){
        echo str_replace(UTF32_LITTLE_ENDIAN_BOM, "", $text);
        $code = "UTF-32";
    }elseif ($first2 == UTF16_BIG_ENDIAN_BOM){
        echo str_replace(UTF16_BIG_ENDIAN_BOM, "", $text);
        $code = "UTF-16";
    }elseif ($first2 == UTF16_LITTLE_ENDIAN_BOM){
        echo str_replace(UTF16_LITTLE_ENDIAN_BOM, "", $text);
        $code = "UTF-16";
    }else{
        echo $text;
    }
    echo "</pre>";
}
Member Avatar

diafol

I have little experience with chinese text, although I've spent a lot of time with other encoding issues. Could you link to a typical file so that we could try to replicate the problem? No point posting the contents as that may not include the BOM, if there is one.