ohnomis 36 Light Poster

I need to make a servlet that takes in a web page url, cleans up the HTML and spits back out a xml file which i need to make conform to a schema.

This is a class assignment and I have no idea how to go about to doing this. I am supposed to use tagsoup1.2.jar to clean up the html that I pull in have it clean up my html and convert it to xml in the servlet so I can do stuff with it.

Can someone guide me in the right direction? I have no idea where to start. So far, all I've done is made an empty servlet.