| | |
Python help, accessing wikipedia pages
Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved |
•
•
Join Date: Jul 2008
Posts: 5
Reputation:
Solved Threads: 0
Hi,
I've been trying to create a program that gets wikipedia pages, and lists all the links found in the page source. I've used the urllib.urlopen() method to do this, and unfortunately I've run into a little problem. Instead of getting the actual page like say the main page, or any article, i get something else. Which can be summarized by:
"<p>Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please <a href="http://en.wikipedia.org/wiki/Leg_spin" onclick="RefreshPage(); return false">try again</a> in a few minutes.</p>
<p>You may be able to get further information in the <a href="irc://chat.freenode.net/wikipedia">#wikipedia</a> channel on the <a href="http://www.freenode.net">Freenode IRC network</a>.</p>"
Unfortunately, this keeps on happening regardless of the time or article. Is there any way i can fix this?
I've been trying to create a program that gets wikipedia pages, and lists all the links found in the page source. I've used the urllib.urlopen() method to do this, and unfortunately I've run into a little problem. Instead of getting the actual page like say the main page, or any article, i get something else. Which can be summarized by:
"<p>Our servers are currently experiencing a technical problem. This is probably temporary and should be fixed soon. Please <a href="http://en.wikipedia.org/wiki/Leg_spin" onclick="RefreshPage(); return false">try again</a> in a few minutes.</p>
<p>You may be able to get further information in the <a href="irc://chat.freenode.net/wikipedia">#wikipedia</a> channel on the <a href="http://www.freenode.net">Freenode IRC network</a>.</p>"
Unfortunately, this keeps on happening regardless of the time or article. Is there any way i can fix this?
Last edited by Tommy_101; Sep 17th, 2009 at 8:08 pm. Reason: grammar mistakes
•
•
Join Date: Jul 2008
Posts: 5
Reputation:
Solved Threads: 0
Alright guys, thanks to anyone that even looked at my post to see if they could help. I used urllib2 (well, urllib in python 3.0, same thing) and it gave me a runtime error with the message "html error 403, access is forbidden" or something along those lines. Did some research and realized that some websites don't want you to access their content without a browser. Which leads to my next problem of having to simulate a browser for wiki.
Thanks guys
Thanks guys
![]() |
Similar Threads
- Starting Python (Python)
- Python MySQL and Database Programming (Python)
- Web Page Construction in Python (Python)
- accessing redirected pages through java crawler (Java)
- Trouble accessing some secure pages (Web Browsers)
- Cannot find server or DNS error (Viruses, Spyware and other Nasties)
Other Threads in the Python Forum
- Previous Thread: First attempt at raw_input
- Next Thread: Mechanics of a python backend
| Thread Tools | Search this Thread |
Tag cloud for accessdenied, python, wikipedia
abrupt address advanced aliased arax avogadro beginner c++ calling censorship class client code console convert corners csv cturtle curves data edit education enter enterprise2.0 examples excel file filename function funding google government gui hints input itunes java knol linux list lists maze microsoft mouse movingimageswithpygame mysql newb news numbers obexftp opensource path programming projects py2exe pygame pygtk pyopengl python random read recursive redirect reference return ruby script search simple slicenotation socialcomputing socket software sqlite string strings sum syntax table terminal thread threading tkinter tlapse tooltip tricks tutorial ubuntu update urllib urllib2 variable verify web web-scrape wiki wikipedia wordgame wxpython xlwt






