Is there a way to find something from html code like <img and then find the src and alt of an image in the <img line? I can get the <img line out but i dont understand how to use find to find the start index and end index. For example, i use the blah.find('src="') which is the start index but how do i get the end index when it could always be different depending on the website?

Recommended Answers

All 3 Replies

Search for the next " after the one in src="

As a side note, you would be better off using regex or an HTML parser.

I was reading through the find documentation but I didnt understand how the you can find the start and end index.
I wrote x = s.find("<img","/>") it gave me an error saying TypeError: slice indices must be integers or None or have an __index__ method. How do i get an index of the end i can get the start just not the end.

After you perform x = s.find("<img") you will receive an index. Feed this index into the find function when looking for your closing parenthesis and it will give you the ending index. You can then use the two indexes to slice out the image source URL here's an example (off the cuff)

img_tag_idx = s.find("<img")
start_idx = s.find("\"", img_tag_idx + 1)
end_idx = s.find("\"", start_idx + 1)
url = s[ start_idx + 1 : end_idx ]

But really this is not a very good method, as you should use a regex or an HTML parsing module, as the many variations in how people code HTML could easily throw this off.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.