I need to make a servlet that takes in a web page url, cleans up the HTML and spits back out a xml file which i need to make conform to a schema.
This is a class assignment and I have no idea how to go about to doing this. I am supposed to use tagsoup1.2.jar to clean up the html that I pull in have it clean up my html and convert it to xml in the servlet so I can do stuff with it.
Can someone guide me in the right direction? I have no idea where to start. So far, all I've done is made an empty servlet.