Hi,

I am trying my hands on Java Regex. Here is my program below with the description of what I require it to do actually.

The thing is that this MyKeyword May occur multiple times in a file.
Also
My program works for a file like this:-

(\\S+)<tab>MyKeyWord<tab>(\\S+)<tab>(\\S+)
(\\S+)<tab>MyKeyWord<tab>(\\S+)<tab>(\\S+)

but if there is a file like this:-

(\\S+)<tab>MyKeyWord<tab>(\\S+)<tab>(\\S+)
(\\S+)<tab>OtherKeyWord<tab>(\\S+)<tab>(\\S+)

It doesn't work at all and gives runtime error

Exception in thread "main" java.lang.IllegalStateException: No match found
         at java.util.regex.Matcher.group(Matcher.java:485)

//at this line (out.write(m.group(1)+"\t"+m.group(2)+"\t.....))

Thanks

import java.util.regex.*;
import java.io.*;

public class DataMine {
    public static void main(String[] args)
                                 throws Exception {
        File fin = new File(args[0]);
        File fout = new File(args[1]);
        FileInputStream fis =
                          new FileInputStream(fin);
        FileOutputStream fos =
                        new FileOutputStream(fout);

        BufferedReader in = new BufferedReader(
                       new InputStreamReader(fis));
        BufferedWriter out = new BufferedWriter(
                      new OutputStreamWriter(fos));

        //Pattern p = Pattern.compile("(\\S+)\tMyKeyWord\t(\\d+)\t(\\S+)");
/* There could be some other possibilities like (\\S+)\tSomeOtherKeyWord\t(\\d+)\t(\\S+). When this type of pattern comes in my   program it gives a runtime error. It compiles OK. There are multiple types of patterns present but I want to mine only the patterns I want.  
*/
        String aLine = null;
        while((aLine = in.readLine()) != null) {
            Matcher m = p.matcher(aLine);
            m.find();
            out.write(m.group(1)+"\t"+MyWord"\t"+m.group(2)+"\t"+m.group(3));
            // I want only this type of pattern to be printed in my output file.

            out.newLine();
        }
        in.close();
        out.close();
    }
}

Edited 7 Years Ago by Web_Sailor: n/a

Hi made it work :) by putting the out.write printing block inside the
if(m.find(i)) { out.write(m.group(1)+"\t"+m.group(2)+..... }

Now it works but the other problem and a big one is that it damn slow :(

How to make it faster ?

Thanks

That's regex, but it shouldn't be that slow.

If you want to make it faster, search the Strings yourself, rather than using regex. That is, however, much more complicated.

Edit: And, even faster, would be to not bother creating the Strings at all, until you have found what you're looking for, and simply search the byte values as they are read (using a Stream, not a Reader).

Edited 7 Years Ago by masijade: n/a

This article has been dead for over six months. Start a new discussion instead.