0

a part of my script will read line by line of a large text file (about 120 lines) and needs to pick out the title. This is done by selecting everything between the 2 quotes. it will work for most of the file, then it will not. its seems to be giving me Unicode

this is what i get:

here is my open statement
f = codecs.open(f_str, encoding='utf-8')

line
u'\u201cGlobalHUB: A Virtual Community For Global Engineering Education, Research, And Collaboration,"'

but when i print the line
print line
“GlobalHUB: A Virtual Community For Global Engineering Education, Research, And Collaboration,"

it shows up correct.

i am not sure what the problem is. I have tried encoding and decoding till the cows come home. I have tried using a simple open command, but then i was left with hex code (i think).

Any suggestion?

2
Contributors
1
Reply
3
Views
8 Years
Discussion Span
Last Post by burgercho
0

\u201c is the unicode code for smart quotes which is why it slants. They come out like that because of how the file was originally written and won't change no matter how you read it. You can replace them with regular quotes using replace.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.