User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the Python section within the Software Development category of DaniWeb, a massive community of 427,101 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 2,208 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Python advertiser: Programming Forums
Views: 246 | Replies: 2
Reply
Join Date: May 2008
Posts: 35
Reputation: Shadow14l is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 1
Shadow14l Shadow14l is offline Offline
Light Poster

urllib problems

  #1  
May 11th, 2008
Whenever I open a page with urllib or urllib2 (file = urllib.urlopen(urllinkhere)) and when I print it, I get this:

[IMG]http://i121.photobucket.com/albums/o229/Shadow14l/boxes1.gif[/IMG]

See all the square boxes? Unknown characters or something...

Well they are and represent the returns (new lines). If I saved this to a text file, all the boxes would still be there. Then if I were to delete them, it would appear that they are gone and it's fixed, but once I save the file again, all the new lines, returns etc. are gone. Everything is clumped together on one line.

All I need is one solution

Solution #1: This is the easy way, and all I need to do is var = line.replace("boxcharacter", "\n")
All I need is what character is that lil box

Solution #2: Any other solution that works!!!!!

Thanks for any help!!!!! Ask me if you need more details. The source is simply:

f = urllib2.urlopen(anyurlhere)
print f.read()

Thanks for any/all help!

~Shadow14l
AddThis Social Bookmark Button
Reply With Quote  
Join Date: Jul 2006
Posts: 562
Reputation: jrcagle is on a distinguished road 
Rep Power: 4
Solved Threads: 72
jrcagle jrcagle is offline Offline
Posting Pro

Re: urllib problems

  #2  
May 12th, 2008
It's probably a \r character. In any event, you can determine its value like this:

  1. snip = data[:100] (high enough number to include at least one offending character)
  2. for char in snip:
  3. print char, ord(char)

That'll give you the ASCII value of your character, and then you can replace with

data.replace(chr(bad_char_value), "\n")

Jeff
Reply With Quote  
Join Date: May 2008
Posts: 35
Reputation: Shadow14l is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 1
Shadow14l Shadow14l is offline Offline
Light Poster

Re: urllib problems

  #3  
May 12th, 2008
Thank you very much Jeff, that really helped me. Now the only thing I am concerned with is remembering to replace the "\r".

Thanks again!

-Shadow14l
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.

DaniWeb Python Marketplace
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 

Thread Tools Display Modes

Other Threads in the Python Forum

All times are GMT -4. The time now is 5:39 pm.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC