RSS Forums RSS

searching for keywords in multiple text files

Please support our C++ advertiser: Programming Forums
Thread Solved
Reply
Posts: 7
Reputation: serhannn is an unknown quantity at this point 
Solved Threads: 0
serhannn serhannn is offline Offline
Newbie Poster

searching for keywords in multiple text files

  #1  
Jan 7th, 2009
Hello,
I'm working on a code for my project at college. The goal of the project is to find and extract keywords, and the sentences, which contain these keywords from many text files, which I have already downloaded from Internet using another code. These text files are actually source codes of different websites and my program needs to search for certain keywords, which I store in another text file called "keywords.txt". It should also search for all keywords in all text files. So I tried to do it using some while-loops. Although I got some results, my code only searchs for the first keywords, that lies on the top line of "keywords.txt" and other keywords are unfortunately not searched. I can search only this one keywords in all text files; I get the places and names of these text files from a file called "addresses.txt". Could you please look at my code and tell me what could be wrong about it and what should I do for my code to search for all keywords in "keywords.txt"?

I would also appreciate some hints about how I could manage this process using vector class, since using arrays is not really appropriate, because I have to change the size of arrays everytime when I add new addresses or keywords, manually. I have some knowledge of vectors, but I couldn't implement it into my code. Here is the code I have written:
--------------------------------------------------------------------------------
#include <iostream>
#include <fstream>
#include <string>
#include <sstream> // string stream class'ı
#include <vector>
using namespace std;

void Search_Keyword(void)
{
	int j = 0;
	int i = 0;
	size_t position = 0;
	string line;
	string keyword;
	string keyword_Array[10];
	string name_Array[13];
	string link_Array[13];
	string link;
	string name;
	ifstream addresses("addresses.txt");
	ifstream keywords("keywords.txt");
	keyword = "";

	while(keywords>>keyword) 
	{
		keyword_Array[j] = keyword;
		string search_Str = keyword_Array[j];
		
		while(addresses>>link>>name)
			{
			name_Array[i] = "URLS/";
			name_Array[i] += name;
			name_Array[i] += ".txt";
			link_Array[i] = link;
			ifstream url_txt(name_Array[i].c_str());
			while(getline(url_txt,line)) 
				{
					string word;
					istringstream myObj(line); 
					while(myObj>>word)
					{
						if ((word == search_Str))
							{
		cout<<"found "<<search_Str<<" in "<<name_Array[i]<<endl;
							}
					}
				}
			url_txt.close();
			if(i < 12)
			i++;
			}
		
		j++;
	}
addresses.close();
keywords.close();
}

int main()
{
	Search_Keyword();
	return 0;
}
--------------------------------------------------------------------------------

Thanks for your help
AddThis Social Bookmark Button
Reply With Quote  
Posts: 79
Reputation: MatEpp is an unknown quantity at this point 
Solved Threads: 12
MatEpp MatEpp is offline Offline
Junior Poster in Training

Re: searching for keywords in multiple text files

  #2  
Jan 7th, 2009
I'm not sure if this will work, but try

while(keywords>>keyword != NULL)
Reply With Quote  
Posts: 7
Reputation: serhannn is an unknown quantity at this point 
Solved Threads: 0
serhannn serhannn is offline Offline
Newbie Poster

Re: searching for keywords in multiple text files

  #3  
Jan 9th, 2009
Thanks for the reply, but it doesn't work..
Reply With Quote  
Posts: 5,133
Reputation: Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute 
Solved Threads: 634
Colleague
Salem's Avatar
Salem Salem is offline Offline
Void main'ers are DOOMed

Re: searching for keywords in multiple text files

  #4  
Jan 9th, 2009
> while(addresses>>link>>name)
Having got to the end of the addresses file once, where do you think you'll start off from with the second word from the keywords file?

Even for a short program, your indentation is mis-leading and needs work.
If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
UK Voter? Please send a message to Incapability Brown and the rest of Zanu-Labour
Up to 8Mb PlusNet broadband from only £5.99 a month!
Reply With Quote  
Posts: 391
Reputation: skatamatic will become famous soon enough skatamatic will become famous soon enough 
Solved Threads: 38
skatamatic skatamatic is offline Offline
Posting Whiz

Re: searching for keywords in multiple text files

  #5  
Jan 9th, 2009
Originally Posted by Salem View Post
> while(addresses>>link>>name)
Having got to the end of the addresses file once, where do you think you'll start off from with the second word from the keywords file?

Even for a short program, your indentation is mis-leading and needs work.


If you are using VS2005+ press ctrl+a+k then f. This will run a macro to format all of your indents perfectly =)
Reply With Quote  
Posts: 391
Reputation: skatamatic will become famous soon enough skatamatic will become famous soon enough 
Solved Threads: 38
skatamatic skatamatic is offline Offline
Posting Whiz

Re: searching for keywords in multiple text files

  #6  
Jan 9th, 2009
As for the actual problem, I would go with a simpler method. Simply push all the keywords into a vector of strings, then load in the file to check against (a txt file) into another vector of strings. Then filter out the keywords manually, or get a little more elegant with a function such as SetIntersection() to find the similarities between the two, pushed into another vector. You can then use the resultant vector to do whatever parsing you wish (add the url info, etc)

This will cut your code down by about 70%, and will get you brownie points with your teacher =)
Last edited by skatamatic : Jan 9th, 2009 at 1:37 pm.
Reply With Quote  
Posts: 7
Reputation: serhannn is an unknown quantity at this point 
Solved Threads: 0
serhannn serhannn is offline Offline
Newbie Poster

Re: searching for keywords in multiple text files

  #7  
Jan 10th, 2009
Originally Posted by skatamatic View Post
As for the actual problem, I would go with a simpler method. Simply push all the keywords into a vector of strings, then load in the file to check against (a txt file) into another vector of strings. Then filter out the keywords manually, or get a little more elegant with a function such as SetIntersection() to find the similarities between the two, pushed into another vector. You can then use the resultant vector to do whatever parsing you wish (add the url info, etc)

This will cut your code down by about 70%, and will get you brownie points with your teacher =)


Thanks for the explanation, but I still don't understand how I can load text files, which I need to search, into a string vector. Should I get them line by line and load into a vector using "getline" or is there another thing to do?
Reply With Quote  
Posts: 391
Reputation: skatamatic will become famous soon enough skatamatic will become famous soon enough 
Solved Threads: 38
skatamatic skatamatic is offline Offline
Posting Whiz

Re: searching for keywords in multiple text files

  #8  
Jan 10th, 2009
Originally Posted by serhannn View Post
Thanks for the explanation, but I still don't understand how I can load text files, which I need to search, into a string vector. Should I get them line by line and load into a vector using "getline" or is there another thing to do?


You can load the files in using the fstream.

  1. ifstream inFile;
  2. vector<string> data;
  3. inFile.open("File.txt");
  4. while (!inFile.eof())
  5. {
  6. string sString = inFile.getline();
  7. data.push_back(sString);
  8. }

Something like that should do the trick for filling a vector from a file. It might not be syntactically correct, since I didn't try to compile it.
Reply With Quote  
Posts: 717
Reputation: MosaicFuneral is just really nice MosaicFuneral is just really nice MosaicFuneral is just really nice MosaicFuneral is just really nice MosaicFuneral is just really nice 
Solved Threads: 80
MosaicFuneral's Avatar
MosaicFuneral MosaicFuneral is offline Offline
Master Poster

Re: searching for keywords in multiple text files

  #9  
Jan 10th, 2009
Originally Posted by skatamatic View Post
  1. ifstream inFile;
  2. vector<string> data;
  3. inFile.open("File.txt");
  4. while (!inFile.eof())
  5. {
  6. string sString = inFile.getline();
  7. data.push_back(sString);
  8. }


Don't bother with eof(), and don't put declarations in loops.
Simplified:
ifstream file("name");
vector<string> lines;
string str;

if(file.is_open())
{
    while(getline(file, str))
    {
         lines.push_back(str);
    }

    file.close();
}
else
{
    failed. set error events, logs, etc.
}
"Jedenfalls bin ich überzeugt, daß der Alte nicht würfelt."
"I became very sensitive to what will happen to all this and all of us." -Two geniuses named Albert
Reply With Quote  
Posts: 7
Reputation: serhannn is an unknown quantity at this point 
Solved Threads: 0
serhannn serhannn is offline Offline
Newbie Poster

Re: searching for keywords in multiple text files

  #10  
Jan 10th, 2009
thanks, everyone. I have solved the problem =) It was a basic loop error..
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.



Similar Threads
Other Threads in the C++ Forum
Views: 631 | Replies: 9 | Currently Viewing: 1 (0 members and 1 guests)

 

Thread Tools Display Modes
Forums | Blogs | Tutorials | Code Snippets | Whitepapers | RSS Feeds | Advertising
All times are GMT -4. The time now is 3:20 pm.
Newsletter Archive - Sitemap - Privacy Statement - Acceptable Use Policy - Contact Us
Forum system based on vBulletin Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC