Hello

I have a program I am trying to make which takes in a word pattern (eg; d?ll, h?l?o, go?dby? etc) & searches an 'array containing different words' to find any words in the array that match the input word pattern.

So for example; if I input 'd?ll', the program would output

dell
dill
doll
etc....

Or if I input 'ap??e':

apace
apple
etc.......

My Problem: I am having alot of trouble thinking up an algorithm (way) to search the array to find all the words that match the word pattern.

I know how to determine whether a word has the same 'word pattern' as the input word - see my function below

I am supposed to use a binary search - see my other function, which I know how to do, but could I get some help (algorithms) on how do I mesh the 2 functions to search an array & return all the words that match the input word pattern?

Any suggestions would be greatly appreciated :D

Compare words Function:

bool wordComparision(string s1, string s2) {

			 if(s1.length() != s2.length())
		return false;

	if(!s1[0])
		return false;

	if(s1 == s2) return true;

	for(int i = 0; i < (int)s1.length(); i++)
	{
		// check if all the letters that occur in s1 are
		// also present in s2 & visa versa
	if(s1[i]!='?')  {

		if(s1.find(s2[i]) == string::npos)
			return false;
		if(s2.find(s1[i]) == string::npos)
			return false;
	}
				}

	return true;
}

My Binary Search Function:

bool binarySearch(string array[], string word, int begin, int end) {

	int mid = end;

	while (begin <= end) {
		mid = (begin + end) / 2;

		if ( word == array[mid] ) {	 // if the loop gets to this point
			return true;				// then we have found the word we're looking for
		}
		else if (word > array[mid]) {
			begin = mid + 1;
		}
		else if (word < array[mid]) {
			end = mid - 1;
		}
	}

	return false;					   // if we get to here the word is not in the dictionary

}

In this scenario binary search will not do the job, simple go for linear search. Here is algorithm:

1. take pattern as patternString.
2. take one string from string array as strToCompare
3. If patternString and strToCompare are matching then print strToCompare.
4. go for next string in array and go to step 2.

I think you can just do the following :

get the user input. Find '?' character. Replace '?' character with
some other letter. Check for valid word. save it to result list.

I think you have to try use Regular expressions.
Info

So there is algorithm:
1. Get string
2. Get first word with '?' symbols
3. Get all words from the list, which have the same first letter and length
4. Through regular expressions does compare a word from a list with a word with character "?"
5. If it matches save word from list to a list of valid results

A lack of this method is the use of plenty of regular expressions.

You could still use a letter frequency counter algo here.

For example

d?ll

could be:

dlla
dllb
dllc
dlld
dlle
dllf
dllg
dllh
dlli
dllj
dllk
dlll
dllm
dlln
dllo
dllp
dllq
dllr
dlls
dllt
dllu
dllv
dllw
dllx
dlly
dllz

Of course the worse case scenario is when you get something like ????????? which would have a poor time complexity. You could improve the algo.

I am supposed to use a binary search ...

Binary Search would come into picture only iff you have a sorted input like dell, bell etc and corresponding sorted array to search for. Your examples don't have a sorted string in most of the cases.

But if we had to go without binary search, then the function would be something like this:

//indicator being the missing character '?' or '*' or something else 
//s1 being an element of the string array and s2 is the string 
//with indicator in it
bool wordComparison(string s1, string s2, char indicator) {
      if(s1.length() != s2.length())
          return false;
      if(!s1[0])
         return false;
      if(s1 == s2)
          return true;

      for (int i = 0; i < s1.length(); i++) {
          if (s2[i] != indicator &&  (s1[i] != s2[i]))
              return false;

      }
      return true;
 }

I think that we might basically sort your array according to the length of the word.

Then The Binary Search Would get you to one of the bounds of the SearchList containing the same length,

Then I guess You can take in the corresponding letters and compare to their equivallents, if all of them match. then its probably the word.

Or Else another alternate method of finding matches would be to first use the Binary Search to make a mini list of all the words with the probable length. And sort the mini array.

Then With the first known character....... for ex :: GIVEN WORD ?ar?

You could use the Binary Search to go through all the words to find the words with 'a' in the second position. Out of that mini list, you will next go for the next character until all the characters are exhausted.

This is something like a sieve. But far more complex.

Thanks for your help guys

I've got the major parts sorted out, identify if a word fits the word pattern, I just dont know how to search the Array of words efficiently(timewise).

I have an array that is 130,000 words, yes thats right, 130,000, & the words are all in alphabetical order. So would you guys suggest a binary search is the way to go, or is there a different way to search the array I dont know of.

Could anyone help me alter my search Function from a selection sort to a Binary search?

Below is my code which uses a selection sort/search method:

void search(string array[], int a_size, string w) {

// array[] = an array of words in alphabetical order
// a_size = array's size which is over 130,000

	for (int i=0; i<a_size; i++) {
		if (array[i].length() == w.length()) {
			if (wordPattern(w,array[i]))
				cout << array[i] << "\n";
		}
	}
}

bool wordPattern(string w1, string w2) {

	for (int i=0; i<(int)w1.size();i++)
	{
		if (w1[i]!='?') {
			size_t  val= w2.find(w1[i]);
			if(val==string::npos) { return false; }
			else                  { w2.erase(val,1); }
		}
	}
	return true;

}

Entire Program

#include <iostream>
#include <string>

using namespace std;

void to_lower(string& a);
void search(string array[], int a_size, string w);
bool wordPattern(string w1, string w2);

int main() {
	string word;
	string a[] = {"feig", "frab", "frae", "fram", "frap", "frat", "fray", "freedom", "fret", "fribalis"};

	cout << "Enter a word pattern: " << flush;
	cin >> word;

	to_lower(word);
	search(a,10,word);

	return 0;
}

void to_lower(string& s) {

	for (int i=0; i<(int)s.length(); i++) {
		s[i] = tolower(s[i]);
	}
}

void search(string array[], int a_size, string w) {

	for (int i=0; i<a_size; i++) {
		if (array[i].length() == w.length()) {
			if (wordPattern(w,array[i]))
				cout << array[i] << "\n";
		}
	}
}

bool wordPattern(string w1, string w2) {

	for (int i=0; i<(int)w1.size();i++)
	{
		if (w1[i]!='?') {
			size_t  val= w2.find(w1[i]);
			if(val==string::npos) { return false; }
			else                  { w2.erase(val,1); }
		}
	}
	return true;

}

If you are searching for a word, the fact that the array is alphabetically sorted would be useful and a binary search would be the way to go. But you're not searching for a word, you're searching for a regular expression within the word. You're also searching for ALL words that contain that regular expression, so it seems to me that you have to check every single word, whether the array is sorted or not. So don't bother sorting them and don't bother with a binary search. Or if you sort them, sort them by length, like Sky Diploma suggested. That might be of some use because you can immediately skip words that are too short. But I don't see any place for binary search.

From first post:

I am supposed to use a binary search

Why?

This article has been dead for over six months. Start a new discussion instead.