0

First post so thanks in advance for any help. I am looking to pull the anchor text from a series of links in some html. I am doing this with find() and rfind():

linkend=users.find("</a>:")
linkstart=users.rfind(">",0,linkend)

My question is that once I have found the first link, how do I then continue to move on from that point? If I were to run this in a for loop 50 times it would just give me the first link 50 times, rather than finding all 50 links on the page, which is what I'm after. Thanks for your help everybody.

Edited by Reverend Jim: Fixed formatting

3
Contributors
3
Replies
5
Views
8 Years
Discussion Span
Last Post by mn_kthompson
0

Supply a starting index for str.find() and update it each iteration. Example:

users = '''Users
<a>User 1</a>
<a>User 2</a>
<a>User 3</a>
<a>User 4</a>
<a>User 5</a>
'''

userList = []
idx = 0
while True:
    linkstart=users.find("<a>",idx)
    linkend=users.find("</a>", idx)
    if -1 in [linkstart,linkend]:
        break
    else:
        userList.append(users[linkstart+3:linkend])
        idx = linkend+4

print userList

Output:

>>> ['User 1', 'User 2', 'User 3', 'User 4', 'User 5']
0

Is this for a homework assignment? I just want to know if you're allowed to use any Python tool you want, or if you're limited to what you've covered in class.

I'm sure there is a way to do it with find, but why don't you check out this tutorial on HTMLParser. It includes an example of how to do what you're trying to do.
http://cis.poly.edu/cs912/parsing.txt

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.