First post so thanks in advance for any help. I am looking to pull the anchor text from a series of links in some html. I am doing this with find() and rfind():


My question is that once I have found the first link, how do I then continue to move on from that point? If I were to run this in a for loop 50 times it would just give me the first link 50 times, rather than finding all 50 links on the page, which is what I'm after. Thanks for your help everybody.

Supply a starting index for str.find() and update it each iteration. Example:

users = '''Users
<a>User 1</a>
<a>User 2</a>
<a>User 3</a>
<a>User 4</a>
<a>User 5</a>

userList = []
idx = 0
while True:
    linkend=users.find("</a>", idx)
    if -1 in [linkstart,linkend]:
        idx = linkend+4

print userList


>>> ['User 1', 'User 2', 'User 3', 'User 4', 'User 5']

Is this for a homework assignment? I just want to know if you're allowed to use any Python tool you want, or if you're limited to what you've covered in class.

I'm sure there is a way to do it with find, but why don't you check out this tutorial on HTMLParser. It includes an example of how to do what you're trying to do.