Whenever I open a page with urllib or urllib2 (file = urllib.urlopen(urllinkhere)) and when I print it, I get this:


See all the square boxes? Unknown characters or something...

Well they are and represent the returns (new lines). If I saved this to a text file, all the boxes would still be there. Then if I were to delete them, it would appear that they are gone and it's fixed, but once I save the file again, all the new lines, returns etc. are gone. Everything is clumped together on one line.

All I need is one solution :)

Solution #1: This is the easy way, and all I need to do is var = line.replace("boxcharacter", "\n")
All I need is what character is that lil box :P

Solution #2: Any other solution that works!!!!! :P

Thanks for any help!!!!! Ask me if you need more details. The source is simply:

f = urllib2.urlopen(anyurlhere)
print f.read()

Thanks for any/all help!


8 Years
Discussion Span
Last Post by Shadow14l

It's probably a \r character. In any event, you can determine its value like this:

snip = data[:100]  (high enough number to include at least one offending character)
for char in snip:
    print char, ord(char)

That'll give you the ASCII value of your character, and then you can replace with

data.replace(chr(bad_char_value), "\n")



Thank you very much Jeff, that really helped me. Now the only thing I am concerned with is remembering to replace the "\r".

Thanks again!


This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.