Whenever I open a page with urllib or urllib2 (file = urllib.urlopen(urllinkhere)) and when I print it, I get this:

[IMG]http://i121.photobucket.com/albums/o229/Shadow14l/boxes1.gif[/IMG]

See all the square boxes? Unknown characters or something...

Well they are and represent the returns (new lines). If I saved this to a text file, all the boxes would still be there. Then if I were to delete them, it would appear that they are gone and it's fixed, but once I save the file again, all the new lines, returns etc. are gone. Everything is clumped together on one line.

All I need is one solution :)

Solution #1: This is the easy way, and all I need to do is var = line.replace("boxcharacter", "\n")
All I need is what character is that lil box :P

Solution #2: Any other solution that works!!!!! :P

Thanks for any help!!!!! Ask me if you need more details. The source is simply:

f = urllib2.urlopen(anyurlhere)
print f.read()

Thanks for any/all help!

~Shadow14l

Recommended Answers

All 2 Replies

It's probably a \r character. In any event, you can determine its value like this:

snip = data[:100]  (high enough number to include at least one offending character)
for char in snip:
    print char, ord(char)

That'll give you the ASCII value of your character, and then you can replace with

data.replace(chr(bad_char_value), "\n")

Jeff

Thank you very much Jeff, that really helped me. Now the only thing I am concerned with is remembering to replace the "\r".

Thanks again!

-Shadow14l

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.