Hi,
I want to implement as search program which will be able to search through a given set of documents for a given string...I've heard that indexed searches are the best bet in these cases...(if anyone knows of something better i would appreciate it..)..now if i have a set of index keywords for a given document and all the keywords for the whole set of documents, my search then would be quite simple.The problem however is creating this index..that is, how do i scan a document and list out its keywords.

for eg in the sentence below:

The Velociraptor had a height of 6 feet.

here words like 'velociraptor' , height and '6 feet' would be keywords..
Well i have thought of basically removing the filler words like 'the' 'a' etc...
Is there a better method? Otherwise does anyone have any ideas on how I can refine the above method?
Thanks in advance..
Piyush

You could create a dictionary of ALL words. That would include 'the' 'had' 'a' and the like. If someone types in a search for "velociraptor run" that would exclude your sentence above because 'run' is not there, even though you might not consider 'run' a keyword.

Alternatively you could make a list of common words you want to exclude, like "the", "a", "and", "had", "have", etc.

see " Advanced string technique " it C++ tutorial .. btw ..
The C++ standard library provides a class called string. Its been used in several programs so far. To use it you need

#include <string>
using namespace std;

Note the (tiny) difference (there is also the using namespace std;. The components of the C++ standard library belong to a namespace called std. The using statement allows the code to refer to the components without having to write std:: in front, as in std::string x ) to the corresponding statement for C-style strings: for C-style strings you say
#include <string.h>.

You can create a string with

string s = "this is what I want";
You can navigate around a string with the array-indexing notation s, with the first character being s[0]. Despite appearances, C++ strings are neither arrays, nor pointers!. The fact you can do this is due to C++ feature known as operator-overloading. Suffice to say that the class is defined in such a way as to support the array indexing operation. There is also a navigation device called an iterator, functionally like a pointer, which we wont be going into.

string is a class, and a function or method m of the class is applied to an object x of the class with the x.m(...) syntax of object oriented programming. So in reading the following tables of functions bear this mind

For finding the size of a string x and searching in it, you need:

int x.size() size of string 
int x.find(char c) search for the character c, start at beginning 
int x.find(string s) search for string s, start at beginning 
int x.find(char c, int pos) search for the character c, start at pos 
int x.find(string s, int pos) search for s, start at pos 
string::npos value returned when search fails

The return values of the search functions are integers - often called an offset. If the x.find(...) returns offset i, then x is the character where the match begins. Also you can give an offset to specify the start of the search. Where the C-style search functions relied on the NULL pointer for exceptional situations, the C++ search functions rely on the cryptic (no position ?):

string::npos.
Here is an example of searching (compare with ):

#include <string>
using namespace std;

/* to count how often a char occurs */
string s = "aabbaa";
int count = 0;
int offset = 0;

while((offset = s.find('a',offset)) != string::npos) {
 count++;
 offset++;
}

/* to count how often a string occurs */
string s = "aabbaa";
int count = 0;
int offset = 0;

while((offset = s.find("aa",offset)) != string::npos) {
 count++;
 offset = offset+2;
}

There are lots of string-creating, and string-updating methods, and the most frequently used are given in the table below. They can be used without any of the qualms or headaches of the corresponding operations on C-style strings. These methods generally have at least 1 and sometimes 2 integer arguments. The first argument will be a start position o for the operation, and the second argument will specify the size l of the operation to be carried out.

string(char *s) create string from char pointer 
char * x.c_str(void) make char pointer from string 
string x.substr(int o) return substring from o onwards 
string x.substr(int o, int l) returns substring starting at o, of length l 
x.erase(int o, int l) remove l chars from o onwards 
x.insert(int o, string s) insert s at o 
s1 + s2 give string representing concatenation of s1 and s2

Note that in the above only erase and insert change the object that they are invoked on. substr returns a string object generated out of the unchanged string object it was invoked on.
I will give a substantial example of string processing in tutorial after vector has been introduced.

A final important facility provided by string is a function to obtain a whole line of input from a stream. Where file is an fstream and line a string, you can say:


getline(file,line)
Recall that the >> mechanism stops at white space. This getline function reads everything up to a new-line character. It does not put the new-line character into the line but it does take it out of the stream. It can be treated as returning false when the end of the file is reached.

This article has been dead for over six months. Start a new discussion instead.