Search DaniWeb - urljoin

What is the usage of urljoin? 10 Years Ago by Niloofar24 … example: >>> from urlparse import urljoin >>> url = urljoin('http://python.org/','about.html') >>> url… take the href part which is `/about/` here and use `urljoin` to join this string (of course with `.html`) to the… Re: What is the usage of urljoin? 10 Years Ago by vegaseat …/jobs/"] # to create a full url use ... new_url = urlparse.urljoin(base, path_list[2]) print(new_url) # https://www.python.org/community… Re: What is the usage of urljoin? 10 Years Ago by Anders_2 It's a function from standard module urlparse in Python2. You will find mor info and syntax in python docs at the official site Python.org. Follow link -> [Click Here](https://docs.python.org/2/library/urlparse.html) Re: What is the usage of urljoin? 10 Years Ago by Niloofar24 Thank you @Anders 2. Thank you @Vegaseat. How to pass the parameters to a script? 17 Years Ago by balance …('a') if link.string in (arch, 'all')] archlink = urlparse.urljoin(url,archlinks[0]['href']) mirrorpage = urllib.urlopen(archlink).read() mirrorsoup…dt.img['src'] == _depgif] for link in deplinks: get_debs(urlparse.urljoin(url,link['href']), packages=packages) return packages if __name__ == '__main__… Data Mining 15 Years Ago by ccandillo …import urllib2 from urllib2 import HTTPError from urlparse import urljoin, urlsplit import httplib from optparse import OptionParser import re… proper URL ''' paths = [] for link in links: paths.append(urljoin(self.url, link)) self.validate_links(paths) def validate_links(self, links… Parse HTML to get text from webpages 16 Years Ago by pocnib ….urls=[] for u in local_list: try: split_url= urlparse.urlsplit(urlparse.urljoin(ur,u)) if split_url.scheme == "http": u = urlparse… help with web crawler 16 Years Ago by leegeorg07 …) for link in find_links(html): #Handle relative links link = urlparse.urljoin(url, link) self.log("Checking: " +url) #make sure… Variable scope 15 Years Ago by Aeronobe …: for attribute in attrs: if 'href' in attribute : link = urlparse.urljoin(base,attribute[1]) print link[/ICODE] Everything works fine, but… Having problem Fetching hyperlinks from url due to proxy (i am new to Pyth) 9 Years Ago by Shailang … tag in soup.findAll('a', href=True): tag['href'] = urlparse.urljoin(url, tag['href']) print tag['href'] file.write('\n') file… web scraping with beautiful soup 8 Years Ago by Geethu_2 … tag in soup.findAll('a',href=True): tag['href']=urlparse.urljoin(url,tag['href']) if url in tag['href'] and tag… Re: How to pass the parameters to a script? 17 Years Ago by G-Do Hi balance, How are you running it? The error occurs because the "downloader" module, urllib, has a urlopen() function which doesn't recognize the URL being passed in. If you step backwards, you see that the error is in line 15 of UbuntuPackageGrabber.py, the main script. That line says:[CODE]source = urllib.urlopen(url).read()[/… Re: How to pass the parameters to a script? 17 Years Ago by balance First of all thank you for your kind reply! Unfortunately I don't have the documentation of the script! You're right I should pass an url to the script. For example it should be possible to pass an url like this: [URL="http://packages.ubuntu.com/gutsy/base/adduser"]http://packages.ubuntu.com/gutsy/base/adduser[/URL] My problem … Re: How to pass the parameters to a script? 17 Years Ago by G-Do Hi balance, You should open a command prompt, navigate to the directory where the script is, and invoke it by saying:[CODE]python UbuntuPackageGrabber.py URL_GOES_HERE[/CODE]Substitute the package URL you have in mind. But I still think it will misfire, unless you change sys.argv[0] to sys.argv[1]. Try it both ways and see what happens. Re: Parse HTML to get text from webpages 16 Years Ago by woooee First, if this is just a one-time thing for you, you can use Links to download and save the page as text only. [url]http://www.jikos.cz/~mikulas/links/download/binaries/[/url] Also, I assume you know about BeautifulSoup and that is more than you want. To answer your questions 1. Stops after 10 iterations (as the saying goes, this is too … Re: help with web crawler 16 Years Ago by lllllIllIlllI You could just do a quick count function to see if enough relevant words are in your page. So for example: [code=python] body = "The Royal Air Force (RAF) is the United Kingdom's air force, the oldest independent air force in the world.[2] Formed on 1 April 1918,[3] the RAF has taken a significant role in British military history ever since, … Re: help with web crawler 16 Years Ago by leegeorg07 thanks ill give it a go when i get home!!! Re: help with web crawler 16 Years Ago by leegeorg07 just wondering... how could i use it with the code i posted above? Re: help with web crawler 16 Years Ago by lllllIllIlllI What is does, is it counts the number of occurrences of the words RAF, Air Force, and History. If they do count 1 or more, then the count() function will return the amount counted, So if you had a bit of text with only RAF and History, then you wouldn't get a match because when something is not counted it returns -1. So you could make it more… Re: Variable scope 15 Years Ago by Aeronobe (this post can be deleted) Re: Variable scope 15 Years Ago by TrustyTony That is one value for all class, maybe you would be better of by only putting [CODE]self.url=url[/CODE] in your __init__? Nice coding with inheritance, good job! Re: Variable scope 15 Years Ago by Aeronobe Aah i see, it should indeed be better the way you say :) Thanks ! Re: Having problem Fetching hyperlinks from url due to proxy (i am new to Pyth) 9 Years Ago by rproffitt https://docs.python.org/2/library/urllib.html notes more work for you when proxies are involved. Re: Having problem Fetching hyperlinks from url due to proxy (i am new to Pyth) 9 Years Ago by snippsat >A must read Or a better [read ](http://docs.python-requests.org/en/latest/user/advanced/#proxies) You should really look into and use [Requests](http://docs.python-requests.org/en/latest/) Shailang. Re: Having problem Fetching hyperlinks from url due to proxy (i am new to Pyth) 9 Years Ago by Shailang Thanks guys i will take a look @snippit @rproffit