header : [u'ID', u'Hindi', u'Telugu']

1 .[u'hhd1', u'\u0924\u093e\u091c\u093e\JJ \u0938\u093e\u0901\u0938\u0947\u0902\N_NN \u0914\u0930\CC_CCD \u091a\u092e\u091a\u092e\u093e\u0924\u0947\JJ \u0926\u093e\u0901\u0924\N_NN \u0906\u092a\u0915\u0947\PR_PRP \u0935\u094d\u092f\u0915\u094d\u0924\u093f\u0924\u094d\u0935\N_NN \u0915\u094b\PSP \u0928\u093f\u0916\u093e\u0930\u0924\u0947\V_VM \u0939\u0948\u0902\V_VAUX \u0964\RD_PUNC', u'\u0c24\u0c3e\u0c1c\u0c3e\u0c36\u0c4d\u0c35\u0c3e\u0c38\JJ \u0c2e\u0c30\u0c3f\u0c2f\u0c41\CC_CCD \u0c2e\u0c3f\u0c32\u0c2e\u0c3f\u0c32\JJ \u0c2e\u0c46\u0c30\u0c3f\u0c38\u0c47\V_VM_VNF \u0c26\u0c66\u0c24\u0c3e\u0c32\u0c41\N_NN \u0c2e\u0c40\PR_PRP \u0c35\u0c4d\u0c2f\u0c15\u0c4d\u0c24\u0c3f\u0c24\u0c4d\u0c35\u0c3e\u0c28\u0c4d\u0c28\u0c3f\N_NN \u0c35\u0c3f\u0c15\u0c38\u0c3f\u0c66\u0c2a\u0c1c\u0c47\u0c38\u0c4d\u0c24\u0c3e\u0c2f\u0c3f\V_VM_VF .\RD_PUNC']

Data explanation

The first column is the id of the sentence

The second column is the sentence in hindi language

The third column is the sentence in telugu language

Can someone help me how to convert this unicode in to respective language?

Thank you

Edited 1 Year Ago by jamesjohnson25

I think your data is not valid python code. For example to read the hindi part, I had to replace all the \N with \ \N (double backslash) otherwise python would not read the unicode string.

Here is what I get when I print the hindi part:

ताजा\JJ साँसें\N_NN और\CC_CCD चमचमाते\JJ दाँत\N_NN आपके\PR_PRP व्यक्तित्व\N_NN को\PSP निखारते\V_VM हैं\V_VAUX ।\R

does it have something to do with what you want ?

Your question doesn't mean much. The unicode does not have to be converted to the target language. It is already in that language. What you want is probably to display the unicode with the language's human glyphs, which is something else. For this, the print function should work, for example I don't know hindi but

>>> y = u'\u0924\u093e\u091c\u093e'
>>> print(y)
ताजा
>>> 

It would be great if you could attach a small file to a post containing your exact data (for example a csv file in UTF8. You can zip the file to attach it to a post).

Edit: it's too bad, we still have this bug with the unicode characters in the forum!

Edited 1 Year Ago by Gribouillis

Attachments term.png 87.66 KB

I first started using Unicode back around 1989-1990 before it became a standard in order to support our Korean customers and provide help and other system messages in Hangul. Not simple! We contracted with another company that had already created the code to deal with this stuff - "wide" characters in C/C++ didn't yet exist at that time! In the end, it worked out well as the company we contracted with had a lot of experience with far eastern languages such as Chinese and Hangul. It was still a difficult process to provide real internationalization of text.

This question has already been answered. Start a new discussion instead.