So, this is my hello world program for python and I have been able to solve all the problems except 1.

When I grab the page data from WoWArmory in python, non-ascii characters like "ø" are printing as "ø" and "é" as "é" and so on.

So, I went over to WowArmory and viewed the page source in Firefox. I copied it all and pasted it as a XML file on my own server.

I then used urllib2 to grab the XML file on my site and it printed perfectly.

I dont understand why the same XML file at WoWArmory is returning with strange characters vs. the copy/paste on my own server returning just fine.

import urllib2

url = 'http://www.wowarmory.com/guild-info.xml?r=Dethecus&gn=Delegated+Authority'
header = { 'User-Agent' : 'Mozilla/5.0 Gecko/20070219 Firefox/2.0.0.2'}
req = urllib2.Request(url, '', header)

print urllib2.urlopen(req).read()
import urllib2

url = 'http://delegatedauth.com/test.xml'
req = urllib2.Request(url)

print urllib2.urlopen(req).read()

The top grabs wowarmory and the bottom grabs it from my site.
The only difference is my site will not accept any header so I had to take that out.

I am out of ideas =/

Recommended Answers

All 2 Replies

Looks like it depends what your encoding type is set to. Your server or your editor may do the encoding for you.

When I use your 'grab the page data from WoWArmory' code with the DrPython IDE, I get the proper special characters you want in its output window.

My other standby IDE ConText screws the special characters up.

Perfect!

I was using NetBeans Python IDE. I tryed it in IDLE and it printed just fine.

Thanks a bunch.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.