So, this is my hello world program for python and I have been able to solve all the problems except 1.

When I grab the page data from WoWArmory in python, non-ascii characters like "ø" are printing as "ø" and "é" as "é" and so on.

So, I went over to WowArmory and viewed the page source in Firefox. I copied it all and pasted it as a XML file on my own server.

I then used urllib2 to grab the XML file on my site and it printed perfectly.

I dont understand why the same XML file at WoWArmory is returning with strange characters vs. the copy/paste on my own server returning just fine.

import urllib2

url = 'http://www.wowarmory.com/guild-info.xml?r=Dethecus&gn=Delegated+Authority'
header = { 'User-Agent' : 'Mozilla/5.0 Gecko/20070219 Firefox/'}
req = urllib2.Request(url, '', header)

print urllib2.urlopen(req).read()
import urllib2

url = 'http://delegatedauth.com/test.xml'
req = urllib2.Request(url)

print urllib2.urlopen(req).read()

The top grabs wowarmory and the bottom grabs it from my site.
The only difference is my site will not accept any header so I had to take that out.

I am out of ideas =/

7 Years
Discussion Span
Last Post by rithera

Looks like it depends what your encoding type is set to. Your server or your editor may do the encoding for you.

When I use your 'grab the page data from WoWArmory' code with the DrPython IDE, I get the proper special characters you want in its output window.

My other standby IDE ConText screws the special characters up.



I was using NetBeans Python IDE. I tryed it in IDLE and it printed just fine.

Thanks a bunch.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.