Well, i stil have problem with this program, what should i type in line 11, for the update function?!
All my code that should be updated every seconds, are in the build function in the CountdownApp class.
Well, i stil have problem with this program, what should i type in line 11, for the update function?!
All my code that should be updated every seconds, are in the build function in the CountdownApp class.
That's ok @AleMonteiro.
I didn't find the answer on those 3 links unfortunatelly. So i have to use a host instead of on my localhost. Thank you anyway.
@iJunkie22, can you explain your last post please? about str.istitle()
, and would be good if give me a little example.
Well, it seems my question is basically wrong.
Forget that question, sorry!
I wanted to post this question into a new Discussion but as it was related to this discussion so i will ask here. My code:
from bs4 import BeautifulSoup
import urllib2
mylist = []
url = 'http://www.niloofar3d.ir/try.html'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
tag_li = soup.find_all('li')
for tag in tag_li:
if tag.text.startswith('A'):
mylist.append(tag.text)
if 'A' in mylist[0]:
if 'A' in mylist[1]:
if 'A' in mylist[2]:
print mylist
else:
'sorry!'
The output must be the else message but it print this output:
[u'Apple', u'Age', u'Am']
What is the problem? I want the script to check if the first 3 words (indexes) of mylist start with the letter 'A', print the list, but if not, print 'sorry!'
. But as you can see here, it has printed even the index[4]!
And one more question, how i can remove those u
letters that has printed into output?
Thank you @snippsat. Your example was exactly what i was looking for.
And thank you @Slavi for your answer and explanation.
I tried this for testing:
>>> import urllib2
>>> import re
>>> html = 'https://www.daniweb.com/software-development/python/threads/492669/how-to-print-only-the-content-of-all-tags-from-a-url-page'
>>> re.findall(r'<p>(.+),/p>', html)
But the output was:
[]
I tried other tags too but all outputs was []
, what's the problem?
Hi.
How i can ask my crawler to print only the text of all <li></li> tags in a url page?
I want to save the text of all <li></li> tags in a text file (without<li></li>
words.)
Thank you @DaniUserJS for introducing those websites. Of course i can't use pay sites because of payment systems problems, so i have to look for a free tutorial. I'm familiar with wordpress as i'm using it for my personal 3D website. But i want to learn how to create my own CMS. I know HTML, CSS and JavaScript too.
Hello.
I'm trying to create a simple CMS, and looking for a good tutorial, any idea?!
Well, ok i will try this way, but i think there should be an other more simple way that we have not find it yet.
Well, @David W, i can check every word in that page to see if the words startwords that starts with latter "A", are the names of singers or any other word that starts with "A"??!!
How my crawler should recognize human names fom any other word starts with letter "A"?!
What do you mean by look it up in a dictionary of just names? In wich dictionary you mean?
Hello.
I have a homework. I have asked to create a web crawler that be able to enter into a music website and then for the first step, collect the name of singers that their names starts with the letter "A".
Now i need a little help for this step. How my crawler should understand wich words in that page are the singers names?! The crawler should find their names in a special tag, correct?! But what kind of tag?! Their names could be in any tag like <h4></h4> for example or in a single <p></p> tag or in a <b></b> or in <ul></ul> or any other tag!
So i just need a hint to find the way, any idea?!
So why it works with 'http://www.python.org'
as the main url i give to programe, but when i tried it with another url, the result was what i posted in my previous post?!
Hello my friends.
Look at this please:
>>> from bs4 import BeautifulSoup
>>> import urllib2
>>> url = urllib2.urlopen('https://duckduckgo.com/?q=3D&t=canonical&ia=meanings')
>>> soup = BeautifulSoup(url)
>>> links = soup('a')
>>> print links
[<a class="header__logo-wrap" href="/?t=canonical" tabindex="-1"><span class="header__logo">DuckDuckGo</span></a>, <a class="search__dropdown" href="javascript:;" id="search_dropdown" tabindex="4"></a>, <a href="https://duckduckgo.com/html/?q=3D">here</a>]
>>>
I used this https://duckduckgo.com/?q=3D&t=canonical&ia=meanings
as the url, i thought the code above shoud do like this:
Find all the links in that page of the internet, but you can see the result! As there are many links to different websites on that url page, so why it didn't print the url of each website into output?!
Thank you @Anders 2.
Thank you @Vegaseat.
Was completely clear, Thank you @Slyte!
Hello @Slyte, thank you for your explanation and your other ideas!
Well, let me ask you some more question.
And also ask you to make some parts more clear for me, because my English is not very well, so sometimes i need more clear explanation, so will be happy if you help me understand better the parts i didn't get well! Thank you in advance :)
The second paragraph (record text found....); can you explain it more please? What kind of word i should record when i visit a wabpage for example? What do you mean by in a dictionary with individual words as keys and values? And what is it's usage?
And about your other ideas:
Can you explaine the first idea more clear please? I did'nt understand your purpose exactly but it seems interesting idea to me.
And the second idea; i did'nt understand it, what do you mean?!
Your explanation and ideas, made my mind to start some other new idea :)
Thank you @Grebouillis.
Thank you @snippsat.
Hello.
I'm trying to create a web crawler. I've read about web crawler's duty and about how it works and what he does.
But just need more information. Could you please tell me what does a web crawler can do? What kind of duty i can define for my web crawler? What can i ask it to do?
@Slyte, what is that dt
in line 10 for?
Hi everybody.
What is the usage of urljoin
?
An example:
>>> from urlparse import urljoin
>>> url = urljoin('http://python.org/','about.html')
>>> url
'http://python.org/about.html'
I think the answer is that when we take a link from here 'http://www.python.org/
for example , it looks like this <a href="/about/>about</a>
.
So if i take the href part which is /about/
here and use urljoin
to join this string (of course with .html
) to the main url which is 'http://python.org/'
here. Correct?!
Of course i should delete those /
and /
from /about/
first.
Hello, me again :)
With this code:
>>> from BeautifulSoup import BeautifulSoup
>>> import urllib2
>>> url = urllib2.urlopen('http://www.python.org').read()
>>> soup = BeautifulSoup(url)
>>> links = soup('a')
>>> print links
A list of links printed into the terminal. I want to send the list into a text file, i tried this:
>>> with open('python-links.txt.', 'w') as f:
... f.write(links)
But there was an error:
File "<stdin>", line 2, in <module>
TypeError: expected a character buffer object
And one more question; as that list looks like this: (I will copy only small part of the list)
[<a href="#content" title="Skip to content">Skip to content</a>, <a id="close-python-network" class="jump-link" href="#python-network" aria-hidden="true">
<span aria-hidden="true" class="icon-arrow-down"><span>▼</span></span> Close
</a>, <a href="/" title="The Python Programming Language" class="current_item selectedcurrent_branch selected">Python</a>, <a href="/psf-landing/" title="The Python Software Foundation">PSF</a>,
So how can i drop each link into a new line?
I tried this:
>>> text = '\n'.join(links)
But i got this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected string, Tag found
How can i do that?
I've test the code again. i gave it a url and then the output was:
**********************************************************************
Scanning depth 1 web
**********************************************************************
**********************************************************************
Scanning depth 2 web
**********************************************************************
**********************************************************************
Scanning depth 3 web
**********************************************************************
**********************************************************************
Scanning depth 4 web
**********************************************************************
**********************************************************************
Scanning depth 5 web
**********************************************************************
****************************************
RESULTS
****************************************
http:// the main url i gave to the programe in line 47 / was found 1 time.
But there were many linkes on that page of the website, so why none of them printed into the terminal??!! Is it a web crawler?!
I thought a web crawler enter into a page that we gave its url first, then it will find all linkes on that page and will print those links then will enter into each linkes again to do all again, but here i got a different result.
oopss! Forgot to send the code!
# -*- coding: utf-8 -*-
from HTMLParser import HTMLParser
from urllib2 import urlopen
class Spider(HTMLParser):
def __init__(self, starting_url, depth, max_span):
HTMLParser.__init__(self)
self.url = starting_url
self.db = {self.url: 1}
self.node = [self.url]
self.depth = depth # recursion depth max
self.max_span = max_span # max links obtained per url
self.links_found = 0
def handle_starttag(self, tag, attrs):
if self.links_found < self.max_span and tag == 'a' and attrs:
link = attrs[0][1]
if link[:4] != "http":
link = '/'.join(self.url.split('/')[:3])+('/'+link).replace('//','/')
if link not in self.db:
print "new link ---> %s" % link
self.links_found += 1
self.node.append(link)
self.db[link] = (self.db.get(link) or 0) + 1
def crawl(self):
for depth in xrange(self.depth):
print "*"*70+("\nScanning depth %d web\n" % (depth+1))+"*"*70
context_node = self.node[:]
self.node = []
for self.url in context_node:
self.links_found = 0
try:
req = urlopen(self.url)
res = req.read()
self.feed(res)
except:
self.reset()
print "*"*40 + "\nRESULTS\n" + "*"*40
zorted = [(v,k) for (k,v) in self.db.items()]
zorted.sort(reverse = True)
return zorted
if __name__ == "__main__":
spidey = Spider(starting_url = 'http://www.python.org', depth = 5, max_span = 10)
result = spidey.crawl()
for (n,link) in result:
print "%s was found %d time%s." %(link,n, "s" if n is not 1 else "")
Hello.
I was looking for a tutorial or any example of creating web crawler that i found this code somewhere and copied and pasted to test it:
First, it is a web crawler, right? Because when i gave it a url of a website, the output was some linkes were published on the terminal.
Second, if you test it yourself, you will see that linkes will divided into some parts with the title Scanning depth 1 web
and so on (the number will change). What is that for? What does it mean? What does depth number web means?
Third, i want to send exactly everything i see that will be printed into terminal, into a textfile, so where should i put this code:
with open('file.txt', 'w') as f:
f.write()
And what shoul i type in the ( )
?
and finally i have a request.
could you explain each line of code for me please, if you are familiar with any line? Even afew lines of code explanation will be really helpful because i don't understand it clear and i want to learn it well. It's a request only and will be happy if you help me with understanding it.
Thank you in advance :)
Thank you @Schol-R-LEA.
Thank you @Slyte, but unfortunatelly when i checked the link, i faced with error 403 :(
Thank you @vegaseat, was helpful.
Thank you @Andrae.
Thank you @snippsat.
Hi friends!
I want to create a countdown program. Here is my code:
from kivy.app import App
from kivy.uix.boxlayout import BoxLayout
from kivy.uix.label import Label
import datetime
class CountdownApp(App):
def build(self):
delta = datetime.datetime(2015, 3, 21, 2, 15, 11) - datetime.datetime.now()
days = delta.days
days = str(days)
self.label_days = Label(text=days + " days")
hour_string = str(delta).split(', ')[1]
hours = hour_string.split(':')[0]
self.label_hours = Label(text=hours + " hours")
minuts = hour_string.split(':')[1]
self.label_minuts = Label(text=minuts + " minuts")
seconds = hour_string.split(':')[2]
self.label_seconds = Label(text=seconds + " seconds")
b = BoxLayout(orientation="vertical")
b.add_widget(self.label_days)
b.add_widget(self.label_hours)
b.add_widget(self.label_minuts)
b.add_widget(self.label_seconds)
return b
if __name__ == "__main__":
CountdownApp().run()
I want the program to update itself every seconds and then the label wich shows seconds should be updated every seconds....
How can i do that?
Yes it works, thank you @vegaseat.
Hi again.
I want to create a robot or spider or crawler with python urllib. Still couldn't find any good tutorial. Any suggestion?!
Hi friends!
import urllib
url = 'http://www.python.org'
text = urllib.urlopen(url).read()
I have typed the code above on the terminal and in the next line with print text
an html file printed there.
I want to send it to a text file, how can i do that?
@AleMonteiro, i didn't ask you google for me.
I know how to install sqlite3 for php but the problem was thatwhen i want to connect it with PDO i still get the error "can't find Driver".
When i have a problem, i always search for it on the net first, then if i couldn't find my answer, i will ask for help here in DANIWEB, maybe users know the answer. I just asked if you know how i can install the driver on my os, but didn't ask you to google it for me.
Anyway, thank you for all answers.
Well, i'm not sure if i installed it correctly or not.
How can i install python urllib in my LInux Ubuntu?
Hello.
I want to learn python urllib. I have installed it and now looking for a good tutorial, any suggestion?
Well, thank you @vegaseat, it works.
But still don't know how to set my code wich has some different labels, here with your code i can print only one label with return label, but i have some values that i need to put in different labels.
@AleMonteiro, as we see in the Requirements part of that page you gaved the link, PDO Driver for SQLite 3.x is needed.
Well, can you tell me how can i install that Driver please?! I don't know waht to do. I'm using Linux Ubuntu.
I checked and it was int. I changed it into str and chenged afew parts and clear some extra part of code, it's more clear now:
from kivy.app import App
from kivy.uix.boxlayout import BoxLayout
from kivy.uix.label import Label
def timer(self):
delta = datetime.datetime(2015, 3, 21, 2, 15, 11) - datetime.datetime.now()
days = delta.days
new_days = str(days)
self.l_days.text = new_days
class CountdownApp(App):
def build(self):
b = BoxLayout()
self.l_days = Label(text = "days")
b.add_widget(self.l_days)
return b
if __name__ == "__main__":
CountdownApp().run()
As you see here:
new_days = str(days)
self.l_days.text = new_days
I can't set the label text with a varible.
I want to exchange line 18 text with line 23 lable text, but here line 18 doesn't work in my code? Any idea?!
from kivy.app import App
from kivy.uix.boxlayout import BoxLayout
from kivy.uix.label import Label
def timer():
delta = datetime.datetime(2015, 3, 21, 2, 15, 11) - datetime.datetime.now()
days = delta.days
hour_string = str(delta).split(', ')[1]
hours = hour_string.split(':')[0]
minuts = hour_string.split(':')[1]
seconds = hour_string.split(':')[2]
seconds_1 = hour_string.split(':')[2].split('.')[0]
#print ("%s days" % days)
#print ("%s hours" % hours)
#print ("%s minuts" % minuts)
#print ("%s seconds" % seconds)
self.l_days.text = days
class CountdownApp(App):
def build(self):
b = BoxLayout()
l_days = Label(text = "days")
l_hours = Label(text = "hours")
l_minuts = Label(text = "minuts")
l_seconds = Label(text = "seconds")
b.add_widget(l_days)
b.add_widget(l_hours)
b.add_widget(l_minuts)
b.add_widget(l_seconds)
return b
if __name__ == "__main__":
CountdownApp().run()
I wanted to create db with sqlite3 which i always use. Can i connect to it with PDO?!
Hi friends.
With datetime.datetime.now()
or datetime.datetime.today()
i can get the current date (English calendar) for my program but what about if i want to get the current date (from Persian calendar) for my program; then what should i do? As my pc os date is set to English calendar so what can i do?
Hello.
I want to create a database on my pc localhost and then want to use PDO to connect with that database to creating table and so on...
What should i do?
I understood, thank you @fonzali.
Hello.
I have copied these 2 files of code from a website.
main.py:
from kivy.app import App
from kivy.uix.label import Label
from kivy.uix.boxlayout import BoxLayout
from kivy.clock import Clock
from kivy.properties import StringProperty
import datetime
class Counter_Timer(BoxLayout):
def update(self, dt):
delta = datetime.datetime(2015, 9, 13, 3, 5) - datetime.datetime.now()
self.days = delta.days
hour_string = str(delta).split(',')[1]
self.hours = hours_string.split(':')[0]
self.minuts = hours_string.split(':')[1]
self.seconds = hours_string.split(':')[2].split('.')[0]
return days, hours, minuts, seconds
class Counter(App):
def build(self):
counter = Counter_Timer
Clock.schedule_interval(counter.update, 1.0)
days = StringProperty()
hours = StringProperty()
minutes = StringProperty()
seconds = StringProperty()
return
if __name__ == "__main__":
Counter().run()
There was written:
"Let's add them to the Counter_timer class:"
days = StringProperty()
hours = StringProperty()
minutes = StringProperty()
seconds = StringProperty()
So i added it into the def build. correct place,right?
And i add it myself, but i'm not sure if it's correct or not:from kivy.properties import StringProperty
And here is the next file counter.kv:
<Counter_Timer>:
orientation: 'vertical'
Label:
text: 'Vacation starts in:'
font_size: '46dp'
Label:
text: root.days + ' Days'
font_size: '46dp'
Label:
text: root.hours + ' Hours'
font_size: '38dp'
Label:
text: root.minuts + ' Minuts'
font_size: '30dp'
Label:
text: root.seconds + ' Seconds'
font_size: '22dp'
When i run it, i get this error:
[INFO ] Kivy v1.8.0
Purge log fired. Analysing...
Purge 4 log files
Purge finished !
[INFO ] [Logger ] Record log in /home/niloofar/.kivy/logs/kivy_15-02-24_5.txt
[INFO ] [Factory ] 157 symbols loaded
[DEBUG ] [Cache ] register <kv.lang> with …
And with this:
hours = hour_string.split(':')[0]
print hours
I get this: 18
But with this:
hours = hour_string.split(':')
print hours
I get this: ['18', '19', '45.495552']
So it means divid everything we have in the string 200 days, 18:37:08.889568
to indexes of the list according to :
sign, means split words where ever meet :
sign on the string, correct?!
@vegaseat, i have a question.
With this:
delta = datetime.datetime(2015, 9, 13, 3, 5) - datetime.datetime.now()
print delta
I will get this: 200 days, 18:37:08.889568
And then with this:
hour_string_2 = str(delta).split(', ')
print hour_string_2
I will get this: ['200 days, 18:37:08.889568']
I tryed this too:
hour_string_3 = str(delta).split()
print hour_string_3
And i got this: ['200', 'days,', '18:37:08.889568']
Why? What ,
does exactly do in .split(', ')
?
Why i get 3 indexes in the list with .split()
but i get 2 indexes in the list with .split(', ')
?
hi @fonzali and thank you for explanation.
So .format(differ.seconds)
calculate only 18:53:17.230488 and doesn't attention to the days, correct? But differ.total_seconds()
calculate days too, right?
Thank you @vegaseat. Of course the code you typed calculate seconds, but not counting down, but it's ok, i can do it myself, the code was helpful.