0

Hi everybody. I come here again to ask for your advise. Thank you at first for all your attention.

I have a big text file containing, say more than 10000 letters. I want to write a program, openning it, parsing out each word and compare them with a target word, finding out how frequent the word appears. After trying for a whole morning I have written something that cracks immediately after running. Pretty annoying and frustrating.

The key problem is, I don't exactly know how to read a text file into a unknown large buffer and handel it. Below is just one of the versions I have already tried. Doesn't work.

//open the file and find the word "Bacteria"

#include <iostream>
#include <string.h>
#include <ctype.h>
#include <fstream>
#include "getline.h"
using namespace std;
    
    char *tempFile="60_10.out";           //define the Temp file ,global variable
    bool GetWord();
    char * word;     //hold the word


    
int main()
{
    string Target="Bacteria";

       
    while (GetWord())
    {
       if (!Target.compare(word))
       {cout<<"The word"<<Target<<" is found!\n";}
       else
       {cout<<"Nop, the word"<<Target<<" is not there.\n";}
    }

    return 0;
}


bool GetWord()
{   
    char Buffer[256];             //reading the file in memory
    int wordOffset=0;            //start at the beginning
    getline(tempFile,Buffer);
    
    if (Buffer[wordOffset]==0)   //end of the string?
    return false;

    char *p1, *p2;
    p1=p2=Buffer+wordOffset;    //point to the next word

    //eat leading spaces
    for (int i=0; i<(int)strlen(p1) && !isalnum(p1[0]);i++)
    p1++;                      //!isalnum letter and number 0, else 1

    //see if you have a word
    if (!isalnum(p1[0]) && p1[0]!='.')
    return false;

    //p1 now points to start of the next word
    //point p2 there as well
    p2=p1;

    //march p2 to the end of word
    while (isalnum(p2[0]))
    {p2++;}

    //p2 is now at end of the word
    //p1 is at beginning of word
    //length of word is the difference
    int len=int(p2-p1);

    //copy the word into the buffer
    strncpy(word,p1,len);

    //null terminate it
    word[len]='\0';

    //now find the beginning of the next word
    for (int j=int(p2-Buffer);j<(int)strlen(Buffer) && !isalnum(p2[0]);j++)
    {
        p2++;
        }

    wordOffset=int(p2-Buffer);
    
    return true;
}

the input file 60_10.out looks something like this:
../fc1f02091-es_om1328.r9t_1_3 Bacteria_Cyanobacteria_Prochlorales_Prochlorococcus:marinus
fcb203-1k12.ff40_b1.SCF_-1_-2 Bacteria_Proteobacteria_Deltaproteobacteria_Desulfuromonadales_Geobacter_Geobacter:uraniumreducens
fcb205-1l2.rf40_b1.SCF_1_3 Bacteria_Planctomycetes_Planctomycetacia_PlanctomycetalesCandidatus_Kuenenia_Candidatus:Kuenenia
anke5gh01-es_ot7.s9t_1_2 Bacteria_Firmicutes_Clostridia
anke5gi01-es_ot7.s9t_1_2 Bacteria_Firmicutes_Clostridia
anke5ca06-es_ot7.s9t_1_3 Bacteria_Firmicutes_Clostridia
fcb205-1a17.ff40_b1.SCF_1_2 Bacteria_Proteobacteria_Deltaproteobacteria_Desulfobacterales_Desulfococcus_Candidatus:Desulfococcus

No idea why my program keeps crashing.

Thank you once more for your help.

4
Contributors
4
Replies
6
Views
9 Years
Discussion Span
Last Post by Aia
0

>No idea why my program keeps crashing.
I'd start here:

strncpy(word,p1,len);

And I'd start by recognizing that word doesn't point to anywhere meaningful yet. You're trying to write to an uninitialized pointer.

0

And I'd start by recognizing that word doesn't point to anywhere meaningful yet. You're trying to write to an uninitialized pointer.

don't quite get it. I have defined it as a char * word. what should I do? thanks

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.