Write a parser to turn the input into a regular expression and run that on the dataset.
jwenting
duckman
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
The regexp package is part of the core JRE distribution since 1.4.0 so there is no additional installation or download required.
jwenting
duckman
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
why can't you use regexp? It's the natural choice.
If not that you're going to have to use some form of system using indexOf() and keeping track of all the numbers that's returning from repeated calls.
Could probably be written quite efficiently using recursion.
jwenting
duckman
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
shoot the project lead for his decision making ;)
indexOf will tell you where a substring resides (if anywhere). Call repeatedly on the parts of the string that aren't wildcards and check whether the returned values are in order.
Should be quite efficient, leaving the biggest trick the splitting of the pattern into chunks to feed into your search function.
jwenting
duckman
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
say your pattern is something like "AB*cd*F*", you'd start by checking whether the string begins with "AB".
If it does, check whether in the substring you get when removing AB from the front of the data you have "cd" at some point.
If it does, remove everything up to and including that "cd" and check for an "F" in the rest.
If that's found, return true. If at any point indexOf() returns -1 (not found) return false.
Something like
public boolean matcher(String data, String[] pattern, boolean startsWith) {
int idx = data.indexOf(pattern[0]);
if (idx == -1 || (idx > 0 && startsWith)) {
return false;
}
String newdata = data.substring(idx);
String[] newPat = new String[pattern.length()-1];
for (int i=1; i<pattern.length; i++) {
newPat[i-1] = pattern[i];
}
return matcher(newdata, newPat, false);
}
should do the trick.
Mind I've not tested this, and I've excluded the needed bounds checking for brevity.
jwenting
duckman
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
yes, I check whether the pattern matches the string at all, not how many different matches there are.
That would mean taking the total number of permutations of the parts of input defined by the wildcards.
With 1 character that's 1 permutation, with 2 you're up to 2, with 3 it's 6, so with n you're up to n! options.
With long input that can quickly get very large indeed.
jwenting
duckman
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
What do you mean "With 1 character that's 1 permutation, with 2 you're up to 2, with 3 it's 6, so with n you're up to n! options."? I can not understand it. Could you please show your idea with some specific samples please?
Oh dear,
The are good reasons why using a dataset, rather then a brute force algorithm would be the optimal chose here.With 1 character that's 1 permutation, with 2 you're up to 2, with 3 it's 6, so with n you're up to n! options.
With long input that can quickly get very large indeed.
Jwenting has pointed out the reason why above.
Here's a quick example.
Say you have a string AB
the perms are:AB, and BA which is 2.
For ABC, there are 6.ABC
ACB
BAC
BCA
CBA
CAB
In maths the number of perms can be determined quickly, using the simple rule.
If the length of the string is N then the number of perms is given by:Nx (N-1) X (N-2) etc.
So that for a word of length four the number of perms is:4x3x2x1
The rate of growth is a problem, and in most cases, a quicker more efficient algorithm should be sort after.
Unfortunately, your project lead has other ideas. Sack him :eek:
iamthwee
Posting Expert
5,950 posts since Aug 2005
Reputation Points: 1,543
Solved Threads: 439
well, it's a rather simple means. Determining in advance what the rules are and finding a simple calculation will do is far easier than trying all possible options and seeing if they fit and coming to the same result ;)
Can't get faster too than that in this case.
jwenting
duckman
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
No, if the input string is ABC and the pattern is A*, you have 2 characters left over which means there are 2! = 1*2 = 2 possible strings that match.
If you have a string ABCDEF and you have a pattern *CD* you have 4! ((2+2)!) matches.
That's assuming that the matches need to be made up of the characters in the input string with the fixed ones as defined in the pattern.
If the input string can be anything as long as the pattern is in it the number of possible options is always infinity or zero depending on whether the fixed part(s) of the pattern exist or not.
jwenting
duckman
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
It should be n!, not (n!)^n.
If each of the n characters can be used in any of the positions irrespective of whether it was already used at another position it would be (m!)^m where m is the number of distinct characters in the total set of n characters, but I assume you have n distinct characters each of which has to be used only once.
To be completely accurate you have to calculate m here as well by removing duplicates.
To determine n, you only need the length of the input string minus the length of the fixed part(s).
Determining m is harder, as you'll have to take the non-fixed part of the input and determine the number of distinct characters in it.
Not hard, but takes a little work. Using a HashMap of Character objects will do nicely. m will be the size of the HashMap.
[quote]What means "If the input string can be anything as long as the pattern is in it"? Could you show an example please?[\quote]
If the pattern is ABC*, both ABC1234 and ABC123456 are valid.
Now if your input were ABC1234 you'd have 4! options if the valid string length is restricted to the length of the input string.
If however it is not restricted to that (7 long in this case), the longer string is also valid, which would yield a total number of 6!+5!+4! options for just that length and any length at least as long as ABC1234.
Logically then the number of options would be infinity as the maximum length of the valid resulting string would be infinity (of course the computer would be unable to handle such a long string, but we're talking mathematics here).
Your number of results would be the SIGMA(n!) where n between 4 and infinity inclusive.
jwenting
duckman
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337