Hey guys. I am new at programming and would appreciate any help I can get. I want to make a program that reads the HTML code of a web page, and writes a specific line into a document. So for example, I want my code to read the source of the www.daniweb.com homepage and write whats inbetween the <title></title> tags into a notepad file which should be "DaniWeb - Technology Publication Meets Social Media." The code below works, but returns the entire HTML source code of the page.

import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;

public class UrlReadPageDemo {
    public static void main(String[] args) {
        try {
            URL url = new URL("http://www.daniweb.com");

            BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
            BufferedWriter writer = new BufferedWriter(new FileWriter("data1.txt"));

            String line;
            while ((line = reader.readLine()) != "<title>") {
                System.out.println(line);
                writer.write(line);
                writer.newLine();
            }
            reader.close();
            writer.close();
        } catch (MalformedURLException e) {
            e.printStackTrace();
        }  catch (IOException e) {
            e.printStackTrace();
        }
    }
}

new

Try using this:

 while (!(line = reader.readLine()).equals("<title>")) 

Not sure that the logic you are using will get the desired results. Try thinking over it again.

Edited 4 Years Ago by anuj_sharma

A simple way would be to read the input stream until the starting tag is found and then save what is read until the ending tag is found.

String line;
String outLine;

while ((line = reader.readLine()) != null) {

if (line.contains("<title>")){
    outLine = line.substring(line.indexOf("<title>")+7, line.indexOf("</title>") );
    writer.write(outLine);
    writer.newLine();
    System.out.println(outLine);
    }     
This article has been dead for over six months. Start a new discussion instead.