Need help in unicode

Question

jamesjohnson25 0 Newbie Poster

10 Years Ago

header : [u'ID', u'Hindi', u'Telugu']

1 .[u'hhd1', u'\u0924\u093e\u091c\u093e\JJ \u0938\u093e\u0901\u0938\u0947\u0902\N_NN \u0914\u0930\CC_CCD \u091a\u092e\u091a\u092e\u093e\u0924\u0947\JJ \u0926\u093e\u0901\u0924\N_NN \u0906\u092a\u0915\u0947\PR_PRP \u0935\u094d\u092f\u0915\u094d\u0924\u093f\u0924\u094d\u0935\N_NN \u0915\u094b\PSP \u0928\u093f\u0916\u093e\u0930\u0924\u0947\V_VM \u0939\u0948\u0902\V_VAUX \u0964\RD_PUNC', u'\u0c24\u0c3e\u0c1c\u0c3e\u0c36\u0c4d\u0c35\u0c3e\u0c38\JJ \u0c2e\u0c30\u0c3f\u0c2f\u0c41\CC_CCD \u0c2e\u0c3f\u0c32\u0c2e\u0c3f\u0c32\JJ \u0c2e\u0c46\u0c30\u0c3f\u0c38\u0c47\V_VM_VNF \u0c26\u0c66\u0c24\u0c3e\u0c32\u0c41\N_NN \u0c2e\u0c40\PR_PRP \u0c35\u0c4d\u0c2f\u0c15\u0c4d\u0c24\u0c3f\u0c24\u0c4d\u0c35\u0c3e\u0c28\u0c4d\u0c28\u0c3f\N_NN \u0c35\u0c3f\u0c15\u0c38\u0c3f\u0c66\u0c2a\u0c1c\u0c47\u0c38\u0c4d\u0c24\u0c3e\u0c2f\u0c3f\V_VM_VF .\RD_PUNC']

Data explanation

The first column is the id of the sentence

The second column is the sentence in hindi language

The third column is the sentence in telugu language

Can someone help me how to convert this unicode in to respective language?

Thank you

python

Edited 10 Years Ago by jamesjohnson25

3 Contributors
3 Replies
267 Views
2 Days Discussion Span
Latest Post 10 Years Ago Latest Post by rubberman

All 3 Replies

Gribouillis 1,391 Programming Explorer

10 Years Ago

I think your data is not valid python code. For example to read the hindi part, I had to replace all the \N with \ \N (double backslash) otherwise python would not read the unicode string.

Here is what I get when I print the hindi part:

ताजा\JJ साँसें\N_NN और\CC_CCD चमचमाते\JJ दाँत\N_NN आपके\PR_PRP व्यक्तित्व\N_NN को\PSP निखारते\V_VM हैं\V_VAUX ।\R

does it have something to do with what you want ?

Your question doesn't mean much. The unicode does not have to be converted to the target language. It is already in that language. What you want is probably to display the unicode with the language's human glyphs, which is something else. For this, the print function should work, for example I don't know hindi but

>>> y = u'\u0924\u093e\u091c\u093e'
>>> print(y)
ताजा
>>>

It would be great if you could attach a small file to a post containing your exact data (for example a csv file in UTF8. You can zip the file to attach it to a post).

Edit: it's too bad, we still have this bug with the unicode characters in the forum!

Edited 10 Years Ago by Gribouillis

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

jamesjohnson25 0 Newbie Poster · Answer 1 · 2015-04-20T17:06:41+00:00

The output what you have shown as screenshot is what i wanted exaclty

Thank you

rubberman 1,355 Nearly a Posting Virtuoso Featured Poster · Answer 2 · 2015-04-23T04:04:05+00:00

I first started using Unicode back around 1989-1990 before it became a standard in order to support our Korean customers and provide help and other system messages in Hangul. Not simple! We contracted with another company that had already created the code to deal with this stuff - "wide" characters in C/C++ didn't yet exist at that time! In the end, it worked out well as the company we contracted with had a lot of experience with far eastern languages such as Chinese and Hangul. It was still a difficult process to provide real internationalization of text.

Need help in unicode

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers