I have a file(newfolder.html). I want to do preprocessing on its content. Some operations like tokenization, deleting stop words, counting the number of words. I know how to do these operations if I have a text file(.txt) .but now I have to do it with a html file.
How can I do it?