First post so thanks in advance for any help. I am looking to pull the anchor text from a series of links in some html. I am doing this with find() and rfind():

linkend=users.find("</a>:")
linkstart=users.rfind(">",0,linkend)

My question is that once I have found the first link, how do I then continue to move on from that point? If I were to run this in a for loop 50 times it would just give me the first link 50 times, rather than finding all 50 links on the page, which is what I'm after. Thanks for your help everybody.

Recommended Answers

All 3 Replies

Supply a starting index for str.find() and update it each iteration. Example:

users = '''Users
<a>User 1</a>
<a>User 2</a>
<a>User 3</a>
<a>User 4</a>
<a>User 5</a>
'''

userList = []
idx = 0
while True:
    linkstart=users.find("<a>",idx)
    linkend=users.find("</a>", idx)
    if -1 in [linkstart,linkend]:
        break
    else:
        userList.append(users[linkstart+3:linkend])
        idx = linkend+4

print userList

Output:

>>> ['User 1', 'User 2', 'User 3', 'User 4', 'User 5']

Genius! Thanks so much.

Is this for a homework assignment? I just want to know if you're allowed to use any Python tool you want, or if you're limited to what you've covered in class.

I'm sure there is a way to do it with find, but why don't you check out this tutorial on HTMLParser. It includes an example of how to do what you're trying to do.
http://cis.poly.edu/cs912/parsing.txt

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.