| | |
Python3.1.1 can't assign function result to variable
Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved |
•
•
Join Date: Oct 2008
Posts: 15
Reputation:
Solved Threads: 0
Hello,
I begin in Python, and I have the following problem: I retrieve an excerpt of a HTML webpage from the web, and then want the result to be hold in a variable (before being processed by a reg-exp).
The function do get the HTML source, but when I assign the function to the variable t_main_page, the interpretor tells the variable is a None type.
Here is the code:
Can someone tell me how to put the string returned by a function in a variable callable by the regexp ?
Thanks !
I begin in Python, and I have the following problem: I retrieve an excerpt of a HTML webpage from the web, and then want the result to be hold in a variable (before being processed by a reg-exp).
The function do get the HTML source, but when I assign the function to the variable t_main_page, the interpretor tells the variable is a None type.
Here is the code:
python Syntax (Toggle Plain Text)
#/usr/bin/env/ python # Script to fetch and parse the specific web page of PPI for Manufactured Goods # on http://www.stats.gov.cn/english/ . import urllib.request, re from html.parser import HTMLParser def fetch_main_page(): """ Open the web page and retrieve the HTML code. Returns: string UTF-8 """ main_page = '' try: main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/").read(20000).decode('gb2312') except (UnicodeDecodeError, urllib.error.URLError) as e: fetch_main_page() else: return main_page t_main_page = fetch_main_page() print(t_main_page) """ relevant_links = re.findall('<a href=(.*?)>PPI of Main Manufactured Goods.*?</a>', t_main_page) for link in relevant_links: print(link) """
Can someone tell me how to put the string returned by a function in a variable callable by the regexp ?
Thanks !
0
#2 Oct 28th, 2009
Python Syntax (Toggle Plain Text)
>>> import urllib.request, re >>> from html.parser import HTMLParser >>> main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/").read(20000).decode('gb2312') Traceback (most recent call last): File "<pyshell#2>", line 1, in <module> main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/").read(20000).decode('gb2312') UnicodeDecodeError: 'gb2312' codec can't decode bytes in position 1-2: illegal multibyte sequence
So the Chinese data encoded GB2312 format give som problem.
Search google python decode('gb2312')
Python Syntax (Toggle Plain Text)
>>> main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/").read(20000) >>> main_page <now it work>
•
•
Join Date: Oct 2008
Posts: 15
Reputation:
Solved Threads: 0
0
#3 Oct 29th, 2009
Thanks for the reply !
But with read() without decode(), it returns only bytecode, and I can't parse that with the regexp.
I made a try-except-else function because the urlopen can't catch the page content immediatly ; I have to make a recursion within the exception for unicode.decode.error and urlerror, just to force the function to retrieve the content by trial and error (sometimes it takes 5 minutes, but I can always have it in th end).
Then, when I put a print() instead of return, it works.
This works:
How can I have the result of a funtion put back to the script main flow ?? If I put a return statement in my function(), shouldn't I be able to do like this ???
But with read() without decode(), it returns only bytecode, and I can't parse that with the regexp.
I made a try-except-else function because the urlopen can't catch the page content immediatly ; I have to make a recursion within the exception for unicode.decode.error and urlerror, just to force the function to retrieve the content by trial and error (sometimes it takes 5 minutes, but I can always have it in th end).
Then, when I put a print() instead of return, it works.
This works:
python Syntax (Toggle Plain Text)
def fetch_main_page(): """ Open the web page and retrieve the HTML code. Returns: string UTF-8 """ main_page = '' try: main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/").read(20000).decode('gb2312') except (UnicodeDecodeError, urllib.error.URLError) as e: fetch_main_page() else: print(main_page) fetch_main_page()
How can I have the result of a funtion put back to the script main flow ?? If I put a return statement in my function(), shouldn't I be able to do like this ???
python Syntax (Toggle Plain Text)
variable = function()
Last edited by Kezoor; Oct 29th, 2009 at 4:06 am.
•
•
Join Date: Oct 2008
Posts: 15
Reputation:
Solved Threads: 0
Okay, I have the function result in a global variable ...
I find the solution "not very pythonic" though: I commented the steps.
So it is solved, but is there another solution to do it ??
I find the solution "not very pythonic" though: I commented the steps.
python Syntax (Toggle Plain Text)
#/usr/bin/env/python # Script to fetch and parse the specific web page of PPI for Manufactured Goods # on http://www.stats.gov.cn/english/ . import urllib.request, re from html.parser import HTMLParser # Declare an empty global variable to act as a container t_main_page = '' def fetch_main_page(): """ Open the web page and retrieve the HTML code. Returns: string UTF-8 """ main_page = '' try: main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/")\ .read(20000).decode('gb2312') except (UnicodeDecodeError, urllib.error.URLError) as e: fetch_main_page() else: global t_main_page # call the global variable (can't t_main_page = main_page # assign it on the same statement) return t_main_page # THEN assign THEN return fetch_main_page() # now t_main_page is containing the string
So it is solved, but is there another solution to do it ??
•
•
Join Date: Aug 2009
Posts: 56
Reputation:
Solved Threads: 6
0
#5 Oct 29th, 2009
Unless I'm missing something all you are doing with the variable t_main_page is copying the contents of the variable main_page which already has the information you want. Just return main_page.
python Syntax (Toggle Plain Text)
#/usr/bin/env/python # Script to fetch and parse the specific web page of PPI for Manufactured Goods # on http://www.stats.gov.cn/english/ . import urllib.request, re from html.parser import HTMLParser def fetch_main_page(): """ Open the web page and retrieve the HTML code. Returns: string UTF-8 """ main_page = '' try: main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/")\ .read(20000).decode('gb2312') except (UnicodeDecodeError, urllib.error.URLError) as e: fetch_main_page() else: return main_page # return t_main_page = fetch_main_page() # now t_main_page is containing the string
•
•
Join Date: Oct 2008
Posts: 15
Reputation:
Solved Threads: 0
0
#6 Oct 29th, 2009
That is what I have done, but when I do it it returns a Nonetype object !!
I am running the script from IDLE, and I can't obtain anything if I do that way.
Just to be sure, I added a print(t_main_page) after the function call and I ran your code, and here is what the interpretor shows:
******
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
None
>>> type(t_main_page)
<class 'NoneType'>
>>>
******
Is Python3 "doing the right thing" or what ?
Is that a gotcha or something is going wrong with Python ? (Sincerely, I think the problem should be with me, but why everything on the web is coded like you do ?)
I am running the script from IDLE, and I can't obtain anything if I do that way.
Just to be sure, I added a print(t_main_page) after the function call and I ran your code, and here is what the interpretor shows:
******
Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
None
>>> type(t_main_page)
<class 'NoneType'>
>>>
******
Is Python3 "doing the right thing" or what ?
Is that a gotcha or something is going wrong with Python ? (Sincerely, I think the problem should be with me, but why everything on the web is coded like you do ?)
•
•
•
•
Unless I'm missing something all you are doing with the variable t_main_page is copying the contents of the variable main_page which already has the information you want. Just return main_page.
python Syntax (Toggle Plain Text)
#/usr/bin/env/python # Script to fetch and parse the specific web page of PPI for Manufactured Goods # on http://www.stats.gov.cn/english/ . import urllib.request, re from html.parser import HTMLParser def fetch_main_page(): """ Open the web page and retrieve the HTML code. Returns: string UTF-8 """ main_page = '' try: main_page = urllib.request.urlopen("http://www.stats.gov.cn/english/")\ .read(20000).decode('gb2312') except (UnicodeDecodeError, urllib.error.URLError) as e: fetch_main_page() else: return main_page # return t_main_page = fetch_main_page() # now t_main_page is containing the string
Last edited by Kezoor; Oct 29th, 2009 at 9:57 pm.
•
•
Join Date: Oct 2008
Posts: 15
Reputation:
Solved Threads: 0
I installed python 2.6.4 to try the original code.
It works INSTANTLY (e.g.: no 5 minutes waiting to retrieve the html source of the page). Without the fuzzy global variable.
So I won't code in Python 3 anymore, and it should solve a lot of headaches.
This works perfectly in Python 2.6.4, and IS what is said to be the good way of writing it.
Now I will listen when they say that we better wait before programming for Python3K ...
It works INSTANTLY (e.g.: no 5 minutes waiting to retrieve the html source of the page). Without the fuzzy global variable.
So I won't code in Python 3 anymore, and it should solve a lot of headaches.
This works perfectly in Python 2.6.4, and IS what is said to be the good way of writing it.
Python Syntax (Toggle Plain Text)
#/usr/bin/env/python # Script to fetch and parse the specific web page of PPI for Manufactured Goods # on http://www.stats.gov.cn/english/ . import urllib, re def fetch_main_page(): """ Open the web page and retrieve the HTML code. Returns: string """ try: main_page = urllib.urlopen("http://www.stats.gov.cn/english/")\ .read(20000).decode('gb2312') except (UnicodeDecodeError, urllib.error.URLError) as e: fetch_main_page() else: return main_page # THEN assign THEN return t_main_page = fetch_main_page()
Now I will listen when they say that we better wait before programming for Python3K ...
![]() |
Similar Threads
- how to get function lookup variable using vb.net (VB.NET)
- How can I insert values into the cells of a table using a for loop? (JavaScript / DHTML / AJAX)
- How to assign the value in a text field to a variable in jsp using Java (JSP)
- Date Function not returning variable in main (C++)
- Function not calling a variable (C++)
- How to make a php function contain mysql commands? (PHP)
- php syntax error (PHP)
- Syntax errorL can not assign to a function call (Python)
- Problem with string variable in void prnt function (C++)
Other Threads in the Python Forum
- Previous Thread: pygame and livewires, small problem
- Next Thread: loop problem
Views: 749 | Replies: 6
| Thread Tools | Search this Thread |
Tag cloud for beginner, python
accessdenied advice ajax anti arax asp.net backend bash beginner bluetooth book c++ calculator calling class code college console coordinates curved delete dictionary digital dynamic embed examples excel file filename function google gui halp hints http input itunes jaunty java keycontrol leftmouse library line linux list lists loop memory mouse mysql opensource php prime program programming projects py2exe pygame pygtk python rails random recursion recursive remote return ruby script search server skinning slicenotation software source sprite string strings sudokusolver syntax table terminal text threading tkinter tlapse tooltip tricks tutorial ubuntu unicode url urllib urllib2 variable vb.net ventrilo verify visual webdevelopemnt wxpython






