Hi,
First post here.
I need to scan a file (NC milling output) with, potentially, tens of thousands of lines of text, looking for keywords.

Every time I find a keyword, say, a tool change command, I want to print out twenty lines - the ten lines above this line, and the ten lines below it, inclusive.

I would eventually need to save this output to a second file but that should be the easy part.

One approach:
I figured that I could scan the document twice -
On the first pass, store the line number for each line that gets a hit.
On the second pass, print the "hit" lines with its twenty accompanying before- and after- lines.

I have a program that does this using awk (unix) but we are moving our systems to Windows and was hoping to get this moved over and use it as Windows programming practice.
I was initially hoping to do it as a Windows Forms utility (for the users to run) but that seems a bit clumsier with RichTextBoxes not really seeing the text as lines. ??

Are there any other methods or logic that you would recommend to approach this? - something that doesn't require recursive bit shifting and magic potions, though - string handling in C/C++ already seems convoluted enough to me! ;)

Thank you in advance for any advice!

Recommended Answers

All 6 Replies

The double-scan technique would work well.
It might be better to do this as a command-line app -- that way, the user can redirect the output easily and manually where necessary.

Don't use Rich Text. Just read the fie in as normal text.

Read in as much of the file as you can.
Do your search.
Since you have most if not all of the file it's now a simple matter to back up 20 lines and start printing.

Thank you, both, for your advice.
I will have the whole file before starting the scanning/printing process.
Is there a specific amount of data/file size at which I should plan on the program not being able to read in anymore data?

Thank you

Well, you've got to have some expectation of the number of lines, so you can choose the largest (necessary) storage for the counter (or count in sections).

Is the max of an "unsigned long" large enough?

As far as your string array or list, it should be the size of "the before group" plus the size of "the after group" plus the size of "the one", if you have enough memory.

Keep in mind: you might need to adjust the captured line amount based on the relative position from the top or bottom of the file.

Now that you mention "String Array"...

What do you typically use to create string arrays?

The simplest way I found to create a list of terms to search (w/ string::find) was this:
char* keywordArray[] = {"TOOL","M06", "N201"};
vector<string> strvector(keywordArray, keywordArray + 3); //Convert to string array

I tried this:
vector<string> strVector = {"TOOL","M06", "N201"};
...but the compiler didn't care for my creative vectoring. Not supported??

Thank you

Getting a "Vector subscript out of range" error on line 33.

I was able to do the find and store for ONE word but now I am trying to expand it to work off of a LIST of keywords to find. Having problems with vectors, though.

Is it "legal " to create a vector of size_t type?

It seems to fail at this line:
foundVect[j] = line.find(keywordArray[j]);
Tried using foundVect.at(j) -> same "out of range" message.

Thank you for any help on that issue or any other programming eyesores you might find in my code.

#include <stdio.h>
#include <conio.h>
#include <time.h>
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <cstdlib>
#include <iomanip>
#include <vector>

using namespace std;

int main(int argc, char** argv) {
	ifstream myfile_stream ("C:\\OpenCV2.2\\TestParse.txt");
	string line;

	int linenumber = 0;		    // Line counter.
	int j = 0;  
		
	char* keywordArray[] = {"TOOL","M06", "N201"};  // words to look for.
	vector<string> strvector(keywordArray, keywordArray + 3); //convert to String array.
	vector<size_t> foundVect;  // To store find results.  Is it allowed?
	vector<int> myvector;      // To store line numbers.
							
	if (myfile_stream.is_open())  
	{
     while ( myfile_stream.good() ) 
      {
		getline (myfile_stream,line);  
			
				for (j=0; j<2; j++){  //Scroll through the keywords.
	   				foundVect[j] = line.find(keywordArray[j]);  
					if (foundVect[j] !=string::npos) {
						myvector.push_back (linenumber);  
						break;  // To avoid saving the same line number twice.
					} //END IF
	  			} //END FOR	
		linenumber ++ ;			 
	  } //END While

	 myfile_stream.close();
  }
  else cout << "Unable to open file"; 
return 0;
} //END main
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.