0

Is there a way to find something from html code like <img and then find the src and alt of an image in the <img line? I can get the <img line out but i dont understand how to use find to find the start index and end index. For example, i use the blah.find('src="') which is the start index but how do i get the end index when it could always be different depending on the website?

3
Contributors
3
Replies
4
Views
8 Years
Discussion Span
Last Post by jlm699
0

Search for the next " after the one in src="

As a side note, you would be better off using regex or an HTML parser.

0

I was reading through the find documentation but I didnt understand how the you can find the start and end index.
I wrote x = s.find("<img","/>") it gave me an error saying TypeError: slice indices must be integers or None or have an __index__ method. How do i get an index of the end i can get the start just not the end.

0

After you perform x = s.find("<img") you will receive an index. Feed this index into the find function when looking for your closing parenthesis and it will give you the ending index. You can then use the two indexes to slice out the image source URL here's an example (off the cuff)

img_tag_idx = s.find("<img")
start_idx = s.find("\"", img_tag_idx + 1)
end_idx = s.find("\"", start_idx + 1)
url = s[ start_idx + 1 : end_idx ]

But really this is not a very good method, as you should use a regex or an HTML parsing module, as the many variations in how people code HTML could easily throw this off.

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.