954,541 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Need help with decoding a text file

a part of my script will read line by line of a large text file (about 120 lines) and needs to pick out the title. This is done by selecting everything between the 2 quotes. it will work for most of the file, then it will not. its seems to be giving me Unicode

this is what i get:

here is my open statement
f = codecs.open(f_str, encoding='utf-8')

line
u'\u201cGlobalHUB: A Virtual Community For Global Engineering Education, Research, And Collaboration,"'

but when i print the line
print line
“GlobalHUB: A Virtual Community For Global Engineering Education, Research, And Collaboration,"

it shows up correct.

i am not sure what the problem is. I have tried encoding and decoding till the cows come home. I have tried using a simple open command, but then i was left with hex code (i think).

Any suggestion?

danimal132
Newbie Poster
18 posts since Jul 2009
Reputation Points: 10
Solved Threads: 0
 

\u201c is the unicode code for smart quotes which is why it slants. They come out like that because of how the file was originally written and won't change no matter how you read it. You can replace them with regular quotes using replace.

burgercho
Light Poster
47 posts since Jul 2008
Reputation Points: 26
Solved Threads: 11
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You