1,105,169 Community Members

Display encoded file uploaded

Member Avatar
lps
Posting Whiz in Training
208 posts since Jul 2011
Reputation Points: 3 [?]
Q&As Helped to Solve: 43 [?]
Skill Endorsements: 2 [?]
 
0
 

Good day,

I wish to print out the content of the files uploaded by the user from a input file type. It is passed to backend using post method.
I have done some research but unforturnately I still haven't found a solution that satisfy me. The answer that are close to what I search for is http://be.php.net/manual/vote-note.php?id=91051&page=function.mb-detect-encoding&vote=up
I can get the encoding type of the files and decode it relatively but in the end, other language such as chinese char content won't be shown correctly.

Hopefully someone having same expirience with me previously can lend a hand to me. Thanks in advance.

Member Avatar
diafol
Where are my eyes?
12,963 posts since Oct 2006
Reputation Points: 1,821 [?]
Q&As Helped to Solve: 1,847 [?]
Skill Endorsements: 92 [?]
Moderator
Featured
Sponsor
 
0
 

Show your display page - have you set meta tag or php header to UTF-8?

Member Avatar
lps
Posting Whiz in Training
208 posts since Jul 2011
Reputation Points: 3 [?]
Q&As Helped to Solve: 43 [?]
Skill Endorsements: 2 [?]
 
0
 

below is the code I done to dump out the content:

if(isset($_FILES['files'])){
    header('Content-Type: text/html; charset=utf-8');
    // Unicode BOM is U+FEFF, but after encoded, it will look like this.
    define ('UTF32_BIG_ENDIAN_BOM'   , chr(0x00) . chr(0x00) . chr(0xFE) . chr(0xFF));
    define ('UTF32_LITTLE_ENDIAN_BOM', chr(0xFF) . chr(0xFE) . chr(0x00) . chr(0x00));
    define ('UTF16_BIG_ENDIAN_BOM'   , chr(0xFE) . chr(0xFF));
    define ('UTF16_LITTLE_ENDIAN_BOM', chr(0xFF) . chr(0xFE));
    define ('UTF8_BOM'               , chr(0xEF) . chr(0xBB) . chr(0xBF));

    $file = ($_FILES['files']);

    $filename = ($file['tmp_name']);
    $text = file_get_contents($filename);
    echo "<pre>";
    $first2 = substr($text, 0, 2);
    $first3 = substr($text, 0, 3);
    $first4 = substr($text, 0, 3);

    if ($first3 == UTF8_BOM){
        echo str_replace(UTF8_BOM, "", $text);
        $code = "UTF-8";
    }elseif ($first4 == UTF32_BIG_ENDIAN_BOM){
        echo str_replace(UTF32_BIG_ENDIAN_BOM, "", $text);
        $code = "UTF-32";
    }elseif ($first4 == UTF32_LITTLE_ENDIAN_BOM){
        echo str_replace(UTF32_LITTLE_ENDIAN_BOM, "", $text);
        $code = "UTF-32";
    }elseif ($first2 == UTF16_BIG_ENDIAN_BOM){
        echo str_replace(UTF16_BIG_ENDIAN_BOM, "", $text);
        $code = "UTF-16";
    }elseif ($first2 == UTF16_LITTLE_ENDIAN_BOM){
        echo str_replace(UTF16_LITTLE_ENDIAN_BOM, "", $text);
        $code = "UTF-16";
    }else{
        echo $text;
    }
    echo "</pre>";
}
Member Avatar
diafol
Where are my eyes?
12,963 posts since Oct 2006
Reputation Points: 1,821 [?]
Q&As Helped to Solve: 1,847 [?]
Skill Endorsements: 92 [?]
Moderator
Featured
Sponsor
 
0
 

I have little experience with chinese text, although I've spent a lot of time with other encoding issues. Could you link to a typical file so that we could try to replicate the problem? No point posting the contents as that may not include the BOM, if there is one.

Member Avatar
lps
Posting Whiz in Training
208 posts since Jul 2011
Reputation Points: 3 [?]
Q&As Helped to Solve: 43 [?]
Skill Endorsements: 2 [?]
 
0
 

The example file I use for testing is this: https://dl.dropbox.com/u/95553471/little%20endian%2016.txt

You
This article has been dead for over three months: Start a new discussion instead
Post:
Start New Discussion
View similar articles that have also been tagged: