Good day,

I wish to print out the content of the files uploaded by the user from a input file type. It is passed to backend using post method.
I have done some research but unforturnately I still haven't found a solution that satisfy me. The answer that are close to what I search for is http://be.php.net/manual/vote-note.php?id=91051&page=function.mb-detect-encoding&vote=up
I can get the encoding type of the files and decode it relatively but in the end, other language such as chinese char content won't be shown correctly.

Hopefully someone having same expirience with me previously can lend a hand to me. Thanks in advance.

Edited by lps

3 Years
Discussion Span
Last Post by lps

below is the code I done to dump out the content:

    header('Content-Type: text/html; charset=utf-8');
    // Unicode BOM is U+FEFF, but after encoded, it will look like this.
    define ('UTF32_BIG_ENDIAN_BOM'   , chr(0x00) . chr(0x00) . chr(0xFE) . chr(0xFF));
    define ('UTF32_LITTLE_ENDIAN_BOM', chr(0xFF) . chr(0xFE) . chr(0x00) . chr(0x00));
    define ('UTF16_BIG_ENDIAN_BOM'   , chr(0xFE) . chr(0xFF));
    define ('UTF16_LITTLE_ENDIAN_BOM', chr(0xFF) . chr(0xFE));
    define ('UTF8_BOM'               , chr(0xEF) . chr(0xBB) . chr(0xBF));

    $file = ($_FILES['files']);

    $filename = ($file['tmp_name']);
    $text = file_get_contents($filename);
    echo "<pre>";
    $first2 = substr($text, 0, 2);
    $first3 = substr($text, 0, 3);
    $first4 = substr($text, 0, 3);

    if ($first3 == UTF8_BOM){
        echo str_replace(UTF8_BOM, "", $text);
        $code = "UTF-8";
    }elseif ($first4 == UTF32_BIG_ENDIAN_BOM){
        echo str_replace(UTF32_BIG_ENDIAN_BOM, "", $text);
        $code = "UTF-32";
    }elseif ($first4 == UTF32_LITTLE_ENDIAN_BOM){
        echo str_replace(UTF32_LITTLE_ENDIAN_BOM, "", $text);
        $code = "UTF-32";
    }elseif ($first2 == UTF16_BIG_ENDIAN_BOM){
        echo str_replace(UTF16_BIG_ENDIAN_BOM, "", $text);
        $code = "UTF-16";
    }elseif ($first2 == UTF16_LITTLE_ENDIAN_BOM){
        echo str_replace(UTF16_LITTLE_ENDIAN_BOM, "", $text);
        $code = "UTF-16";
        echo $text;
    echo "</pre>";

I have little experience with chinese text, although I've spent a lot of time with other encoding issues. Could you link to a typical file so that we could try to replicate the problem? No point posting the contents as that may not include the BOM, if there is one.

This article has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.