Python3.1.1 can't assign function result to variable

Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved

Join Date: Oct 2008
Posts: 15
Reputation: Kezoor is an unknown quantity at this point 
Solved Threads: 0
Kezoor Kezoor is offline Offline
Newbie Poster

Python3.1.1 can't assign function result to variable

 
0
  #1
Oct 28th, 2009
Hello,

I begin in Python, and I have the following problem: I retrieve an excerpt of a HTML webpage from the web, and then want the result to be hold in a variable (before being processed by a reg-exp).

The function do get the HTML source, but when I assign the function to the variable t_main_page, the interpretor tells the variable is a None type.

Here is the code:

  1. #/usr/bin/env/ python
  2.  
  3.  
  4. # Script to fetch and parse the specific web page of PPI for Manufactured Goods
  5. # on http://www.stats.gov.cn/english/ .
  6.  
  7. import urllib.request, re
  8. from html.parser import HTMLParser
  9.  
  10.  
  11. def fetch_main_page():
  12. """
  13. Open the web page and retrieve the HTML code.
  14.  
  15. Returns: string UTF-8
  16. """
  17.  
  18. main_page = ''
  19.  
  20. try:
  21. main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/").read(20000).decode('gb2312')
  22. except (UnicodeDecodeError, urllib.error.URLError) as e:
  23. fetch_main_page()
  24. else:
  25. return main_page
  26.  
  27. t_main_page = fetch_main_page()
  28.  
  29. print(t_main_page)
  30.  
  31. """
  32. relevant_links = re.findall('<a href=(.*?)>PPI of Main Manufactured Goods.*?</a>', t_main_page)
  33.  
  34. for link in relevant_links:
  35. print(link)
  36.  
  37. """

Can someone tell me how to put the string returned by a function in a variable callable by the regexp ?

Thanks !
Reply With Quote Quick reply to this message  
Join Date: Aug 2008
Posts: 184
Reputation: snippsat is an unknown quantity at this point 
Solved Threads: 53
snippsat's Avatar
snippsat snippsat is offline Offline
Junior Poster
 
0
  #2
Oct 28th, 2009
  1. >>> import urllib.request, re
  2. >>> from html.parser import HTMLParser
  3. >>> main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/").read(20000).decode('gb2312')
  4. Traceback (most recent call last):
  5. File "<pyshell#2>", line 1, in <module>
  6. main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/").read(20000).decode('gb2312')
  7. UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 1-2: illegal multibyte sequence
  8.  
It will return none because of error with decode('gb2312')
So the Chinese data encoded GB2312 format give som problem.
Search google python decode('gb2312')
  1. >>> main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/").read(20000)
  2. >>> main_page
  3. <now it work>
If this give you what you need i am not sure.
Reply With Quote Quick reply to this message  
Join Date: Oct 2008
Posts: 15
Reputation: Kezoor is an unknown quantity at this point 
Solved Threads: 0
Kezoor Kezoor is offline Offline
Newbie Poster
 
0
  #3
Oct 29th, 2009
Thanks for the reply !

But with read() without decode(), it returns only bytecode, and I can't parse that with the regexp.

I made a try-except-else function because the urlopen can't catch the page content immediatly ; I have to make a recursion within the exception for unicode.decode.error and urlerror, just to force the function to retrieve the content by trial and error (sometimes it takes 5 minutes, but I can always have it in th end).

Then, when I put a print() instead of return, it works.

This works:
  1. def fetch_main_page():
  2. """
  3. Open the web page and retrieve the HTML code.
  4.  
  5. Returns: string UTF-8
  6. """
  7.  
  8. main_page = ''
  9.  
  10. try:
  11. main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/").read(20000).decode('gb2312')
  12. except (UnicodeDecodeError, urllib.error.URLError) as e:
  13. fetch_main_page()
  14. else:
  15. print(main_page)
  16.  
  17. fetch_main_page()

How can I have the result of a funtion put back to the script main flow ?? If I put a return statement in my function(), shouldn't I be able to do like this ???
  1. variable = function()
Last edited by Kezoor; Oct 29th, 2009 at 4:06 am.
Reply With Quote Quick reply to this message  
Join Date: Oct 2008
Posts: 15
Reputation: Kezoor is an unknown quantity at this point 
Solved Threads: 0
Kezoor Kezoor is offline Offline
Newbie Poster

Inelegant solution ?

 
0
  #4
Oct 29th, 2009
Okay, I have the function result in a global variable ...

I find the solution "not very pythonic" though: I commented the steps.

  1. #/usr/bin/env/python
  2.  
  3.  
  4. # Script to fetch and parse the specific web page of PPI for Manufactured Goods
  5. # on http://www.stats.gov.cn/english/ .
  6.  
  7. import urllib.request, re
  8. from html.parser import HTMLParser
  9.  
  10. # Declare an empty global variable to act as a container
  11. t_main_page = ''
  12.  
  13. def fetch_main_page():
  14. """
  15. Open the web page and retrieve the HTML code.
  16.  
  17. Returns: string UTF-8
  18. """
  19.  
  20. main_page = ''
  21.  
  22. try:
  23. main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/")\
  24. .read(20000).decode('gb2312')
  25. except (UnicodeDecodeError, urllib.error.URLError) as e:
  26. fetch_main_page()
  27. else:
  28. global t_main_page # call the global variable (can't
  29. t_main_page = main_page # assign it on the same statement)
  30. return t_main_page # THEN assign THEN return
  31.  
  32. fetch_main_page()
  33. # now t_main_page is containing the string

So it is solved, but is there another solution to do it ??
Reply With Quote Quick reply to this message  
Join Date: Aug 2009
Posts: 56
Reputation: willygstyle is an unknown quantity at this point 
Solved Threads: 6
willygstyle willygstyle is offline Offline
Junior Poster in Training
 
0
  #5
Oct 29th, 2009
Unless I'm missing something all you are doing with the variable t_main_page is copying the contents of the variable main_page which already has the information you want. Just return main_page.
  1. #/usr/bin/env/python
  2.  
  3.  
  4. # Script to fetch and parse the specific web page of PPI for Manufactured Goods
  5. # on http://www.stats.gov.cn/english/ .
  6.  
  7. import urllib.request, re
  8. from html.parser import HTMLParser
  9.  
  10. def fetch_main_page():
  11. """
  12. Open the web page and retrieve the HTML code.
  13.  
  14. Returns: string UTF-8
  15. """
  16.  
  17. main_page = ''
  18.  
  19. try:
  20. main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/")\
  21. .read(20000).decode('gb2312')
  22. except (UnicodeDecodeError, urllib.error.URLError) as e:
  23. fetch_main_page()
  24. else:
  25. return main_page # return
  26.  
  27. t_main_page = fetch_main_page()
  28. # now t_main_page is containing the string
Reply With Quote Quick reply to this message  
Join Date: Oct 2008
Posts: 15
Reputation: Kezoor is an unknown quantity at this point 
Solved Threads: 0
Kezoor Kezoor is offline Offline
Newbie Poster
 
0
  #6
Oct 29th, 2009
That is what I have done, but when I do it it returns a Nonetype object !!

I am running the script from IDLE, and I can't obtain anything if I do that way.

Just to be sure, I added a print(t_main_page) after the function call and I ran your code, and here is what the interpretor shows:

******
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
None
>>> type(t_main_page)
<class 'NoneType'>
>>>
******
Is Python3 "doing the right thing" or what ?

Is that a gotcha or something is going wrong with Python ? (Sincerely, I think the problem should be with me, but why everything on the web is coded like you do ?)

Originally Posted by willygstyle View Post
Unless I'm missing something all you are doing with the variable t_main_page is copying the contents of the variable main_page which already has the information you want. Just return main_page.
  1. #/usr/bin/env/python
  2.  
  3.  
  4. # Script to fetch and parse the specific web page of PPI for Manufactured Goods
  5. # on http://www.stats.gov.cn/english/ .
  6.  
  7. import urllib.request, re
  8. from html.parser import HTMLParser
  9.  
  10. def fetch_main_page():
  11. """
  12. Open the web page and retrieve the HTML code.
  13.  
  14. Returns: string UTF-8
  15. """
  16.  
  17. main_page = ''
  18.  
  19. try:
  20. main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/")\
  21. .read(20000).decode('gb2312')
  22. except (UnicodeDecodeError, urllib.error.URLError) as e:
  23. fetch_main_page()
  24. else:
  25. return main_page # return
  26.  
  27. t_main_page = fetch_main_page()
  28. # now t_main_page is containing the string
Last edited by Kezoor; Oct 29th, 2009 at 9:57 pm.
Reply With Quote Quick reply to this message  
Join Date: Oct 2008
Posts: 15
Reputation: Kezoor is an unknown quantity at this point 
Solved Threads: 0
Kezoor Kezoor is offline Offline
Newbie Poster

Fall back on python 2.6

 
0
  #7
Nov 3rd, 2009
I installed python 2.6.4 to try the original code.

It works INSTANTLY (e.g.: no 5 minutes waiting to retrieve the html source of the page). Without the fuzzy global variable.

So I won't code in Python 3 anymore, and it should solve a lot of headaches.

This works perfectly in Python 2.6.4, and IS what is said to be the good way of writing it.

  1. #/usr/bin/env/python
  2.  
  3.  
  4. # Script to fetch and parse the specific web page of PPI for Manufactured Goods
  5. # on http://www.stats.gov.cn/english/ .
  6.  
  7. import urllib, re
  8.  
  9.  
  10. def fetch_main_page():
  11. """
  12. Open the web page and retrieve the HTML code.
  13.  
  14. Returns: string
  15. """
  16.  
  17. try:
  18. main_page = urllib.urlopen("http://www.stats.gov.cn/english/")\
  19. .read(20000).decode('gb2312')
  20. except (UnicodeDecodeError, urllib.error.URLError) as e:
  21. fetch_main_page()
  22. else:
  23. return main_page # THEN assign THEN return
  24.  
  25. t_main_page = fetch_main_page()

Now I will listen when they say that we better wait before programming for Python3K ...
Reply With Quote Quick reply to this message  
Reply

Tags
beginner, python

This thread has been marked solved.
Perhaps start a new thread instead?
Message:




Views: 749 | Replies: 6
Thread Tools Search this Thread



Tag cloud for beginner, python
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC