import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class RegexTest {
	public static void main(String[] args) {
		String regex = "([a-zA-Z'-]*[ ]*)*";
		Pattern p = Pattern.compile(regex);
		String input = "Adarsh has been testing this issue for a very long time now!#!";
		Matcher m = p.matcher(input);
		System.out.println("Called");
        if(m.matches())
          System.out.println("Matched");
        else
        	System.out.println("Not Matched");
        if(input.matches(regex))
        	System.out.println("Matched");
        else
        	System.out.println("Not Matched");
        
	}

}

I use the above code in a form field and validation is done in the java class.(The above example is just an equivalent for the actual one) The maximum length for that textbox set is 80 chars. However i face a strange issue. The system validates a correct textbox entry. However when a wrong value is entered for strings of length greater than approximately 50. The

Pattern.compile()

takes a long time and also the matcher.match() does not execute . I also tried the default match() in String. But still have the same issue. Is there any problem with the regex library.? Or am i missing something? Note : CPU Usage has gone to 100%

Recommended Answers

All 6 Replies

What do you wnat to validate? I.e what's the validation?
Perhaps match() takes a long time due to the generic nature of the regEx you're using.
Pattern.compile() however shouldn't take any longer/shorter as it's not connected to the input at all.

Can you enhance the code you've posted so that I can reproduce the 100% CPU usage scenario with just this RegexTest class?
Have you done step-by-step debug already? Which function-call takes up 100% CPU?

You can see the 100% CPU usage scenario by providing an invalid string for the above example itself. I did get a work around for this issue by changing the regular expression to

([a-zA-Z\\'\\-\\ ])*

. It works fine here. But i'm curious why the others fail , when they are expressions for the same thing. It fails to provide an output when the string length is greater than 50. Following regular expressions fail to give an output

[ ]*([a-z]*[A-Z]*[']*[-]*[ ]*)*[ ]* 
(([ ][a-z][A-Z]['][-][ ])*)*

I'm just curious why it happens this way. Any help would be appreciated. Thanks.

Interesting..
Checking the code now. I suspected maybe it takes time due to it's effort in identifying the "capturing-groups". But doesn't seem to be the case as "(?:[a-zA-Z'-]*[ ]*)*" also takes long enough.

Modified to code a bit to make it more testable.

Would let you know if I find something more.. :)

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class RegexTest {
	public static void main(String[] args) {
		String input = "Adarsh has been testing this###aaaaaaaa";
		if(args.length == 2) {
			testPattern(args[0], args[1]);
			return;
		}

		testPattern("([a-zA-Z'-]*[ ]*)*", input);
		testPattern("(?:[a-zA-Z'-]*[ ]*)*", input);
	}
	
	public static void testPattern( String patt, String input) {
		System.out.println("\n\nTest string --> \"" + input + "\"");
		System.out.println("pattern = \"" + patt + "\"\n");
		long start = System.nanoTime();
		Pattern p1 = Pattern.compile(patt);
		System.out.println("Compile time:\t" + (System.nanoTime() - start) + " nanosecs");
		start = System.nanoTime();
		Matcher m1 = p1.matcher(input);
		if(m1.matches())
		  System.out.println("Matched");
		else
			System.out.println("Not Matched");
		System.out.println("Match time:\t" + (System.nanoTime() - start) + " nanosecs");
	}

}

Solution: Use "^...$" kinda pattern.

So I think I figured out what's wrong with your pattern.

Problem is that given the way you've grouped your pattern, matcher would try to match it with strings starting and ending with each char of your input string.
This blows up the number of combinations to match with almost exponentially. As you can see below the time taken goes down by almost half as I remove one char from the input string.

What I mean by starting and ending with each char is that if your input string is say "ab cd" then, matcher would have to check all following combinations against the pattern:

- 'ab cd'

- 'a'
- 'ab'
- 'ab '
- 'ab c'

- 'b cd'
- ' cd'
- 'cd'
- 'd'

- 'b c'
- 'b '
- 'b'

- ' cd'
- ' c'
- ' '
- 'cd'
- 'c'

And as number of chars increase number of such combinations go exponentially higher.

[B]root@forssa $[/B] java com.kash.TestMain "abcdefghikklmnopqrstuvwxyz1234567890"

Test string --> "abcdefghikklmnopqrstuvwxyz1234567890"

pattern = "^[a-zA-Z'-]*[ ]*$"
Match time:     206451 nanosecs

pattern = "([a-zA-Z'-]*[ ]*)*"
Match time:     17731438213 nanosecs

[B]root@forssa $[/B] java com.kash.TestMain "bcdefghikklmnopqrstuvwxyz1234567890"

Test string --> "bcdefghikklmnopqrstuvwxyz1234567890"

pattern = "^[a-zA-Z'-]*[ ]*$"
Match time:     191923 nanosecs

pattern = "([a-zA-Z'-]*[ ]*)*"
Match time:     8909746071 nanosecs

[B]root@forssa $[/B] java com.kash.TestMain "cdefghikklmnopqrstuvwxyz1234567890"

Test string --> "cdefghikklmnopqrstuvwxyz1234567890"

pattern = "^[a-zA-Z'-]*[ ]*$"
Match time:     190248 nanosecs

pattern = "([a-zA-Z'-]*[ ]*)*"
Match time:     4450727397 nanosecs

[B]root@forssa $[/B] java com.kash.TestMain "defghikklmnopqrstuvwxyz1234567890"

Test string --> "defghikklmnopqrstuvwxyz1234567890"

pattern = "^[a-zA-Z'-]*[ ]*$"
Match time:     189410 nanosecs

pattern = "([a-zA-Z'-]*[ ]*)*"
Match time:     2252398838 nanosecs

[B]root@forssa $[/B]

Also see the profiling results attached.

commented: JProfiler +8

Well .! Thanks a lot for that . I had my doubts that the number of combinations may be too high. But just thought that there may be some silly fault of mine. Anyways thanks for those pointers. They were helpful.

Out of curiosity. Did you use JProfiler for the profiling .?

Thanks.

Sure..
Profiler was YourKit

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.