how to print every word starts with "A" in a urlpage

Question

Niloofar24 15 Posting Whiz

10 Years Ago

Hello!
How can ask my scipt to print every word in a url page that starts with the letter "A" in this case?
This is my code:

from bs4 import BeautifulSoup
import urllib2

url = 'http://www.thefamouspeople.com/singers.php'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html)
for word in soup.text:
    if soup.text.startswith('A'):
        print soup.text

But it doesn't print anything for output.

python

5 Contributors
10 Replies
491 Views
1 Week Discussion Span
Latest Post 10 Years Ago Latest Post by vegaseat

All 10 Replies

snippsat 661 Master Poster

10 Years Ago

So now i want my script to find every word that start with "A" in that url page >and print it for me. Then i should find a way to ask my crawle just save those >words starting with "A" that are singers names.
Very difficult!!!

That would be nightmare,and you would have to clean up a lot of rubbish text.
This is just one word that start with A Ajax.Request(,
that i saw when i quickly looked at source you get from that page.

You have to find a tag that give you info you want.
so tag <img> with <title>,will give you a fine list.

from bs4 import BeautifulSoup
import urllib2

url = 'http://www.thefamouspeople.com/singers.php'
html = urllib2.urlopen(url) #Do not use read()
soup = BeautifulSoup(html)
link_a = soup.find_all('img')
for link in link_a:
    try:
        print link['title']
    except KeyError:
        pass

"""Ouptput--> here just 3 names befor it changes to B
Ashlee Simpson
Avril Ramona Lavigne
Axl Rose
Barbra Streisand
Barry Manilow
Barry White
"""

Edited 10 Years Ago by snippsat

snippsat 661 Master Poster

10 Years Ago

print link['title'] what is title? The alt of the <img> tag?

Have you looked at Firebug or Chrome DevTools?
Links i gave you in post.
Then is easy to so see what <title> of the <img> is.

except KeyError: what does keyError mean here? What the keyword keyError is for?

Not all <img> on this page has a title tag,so it trow a keyError.
This is source code that gets called in BeautifulSoup.

def __getitem__(self, key):
    """tag[key] returns the value of the 'key' attribute for the tag,
       and throws an exception if it's not there."""
    return self.attrs[key]

With except KeyError: pass,just ignore those images.
Can be more specific with class="col-md-6,
so it only search for names on images we need.
Then it will not trow an error.

from bs4 import BeautifulSoup
import urllib2

url = 'http://www.thefamouspeople.com/singers.php'
html = urllib2.urlopen(url) #Don not use read()
soup = BeautifulSoup(html)
tag_row = soup.find_all('div', {'class':'col-md-6'})
for item in tag_row:
    print item.find('img')['title']

Edited 10 Years Ago by snippsat

vegaseat 1,735 DaniWeb's Hypocrite

10 Years Ago

@snippsat, nice solution!

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 1 · 2015-03-04T07:21:52+00:00

Gribouillis 1,391 Programming Explorer

10 Years Ago

Isn'it if word.startswith('A') ?

Niloofar24 15 Posting Whiz · Answer 2 · 2015-03-04T07:34:12+00:00

Well i changed it into this:

for word in soup.text:
    if word.startswith('A'):
        print word

But now the output is this: (many A letter)

A
A
A
A
A
.
.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 3 · 2015-03-04T08:20:30+00:00

What is the type of soup.text ? try

print(type(soup.text))

if it is a string (type str), you could try

import re
for word in re.findall(r'\b\w+\b', soup.text):
    ...

Niloofar24 15 Posting Whiz · Answer 4 · 2015-03-04T09:04:41+00:00

Niloofar24 15 Posting Whiz

10 Years Ago

I tried and it's type was <type 'unicode'>.

Niloofar24 15 Posting Whiz · Answer 5 · 2015-03-04T09:12:12+00:00

I want the script to find all the names of singers that starts with the latter "A" in a url page. First i asked my script to check the content of all <il> and <td> tags but now i want to change it.
I want the script to check the url page i gave to it and then find and save all the names of singers on that page that starts with the letter "A". It's my homework and i don't know how to do that. Friends suggest me using dictionary to check words with it to find if they are names of human or not. But the homework asks me not to do that.
So now i want my script to find every word that start with "A" in that url page and print it for me. Then i should find a way to ask my crawle just save those words starting with "A" that are singers names.
Very difficult!!!

Slavi 94 Master Poster Featured Poster · Answer 6 · 2015-03-04T09:14:42+00:00

Go to google and check around for google python class babynames, they have a solution presented as well. The idea is to extract people names from a table from a webpage. Seems kind of the same thing, so check it out

Niloofar24 15 Posting Whiz · Answer 7 · 2015-03-04T14:11:01+00:00

Thank you @snippsat. I have 2 questions:

print link['title'] what is title? The alt of the <img> tag?

except KeyError: what does keyError mean here? What the keyword keyError is for?

how to print every word starts with "A" in a urlpage

Recommended Answers Collapse Answers

All 10 Replies

Recommended Answers