Hi everybody. I come here again to ask for your advise. Thank you at first for all your attention.

I have a big text file containing, say more than 10000 letters. I want to write a program, openning it, parsing out each word and compare them with a target word, finding out how frequent the word appears. After trying for a whole morning I have written something that cracks immediately after running. Pretty annoying and frustrating.

The key problem is, I don't exactly know how to read a text file into a unknown large buffer and handel it. Below is just one of the versions I have already tried. Doesn't work.

//open the file and find the word "Bacteria"

#include <iostream>
#include <string.h>
#include <ctype.h>
#include <fstream>
#include "getline.h"
using namespace std;
    
    char *tempFile="60_10.out";           //define the Temp file ,global variable
    bool GetWord();
    char * word;     //hold the word


    
int main()
{
    string Target="Bacteria";

       
    while (GetWord())
    {
       if (!Target.compare(word))
       {cout<<"The word"<<Target<<" is found!\n";}
       else
       {cout<<"Nop, the word"<<Target<<" is not there.\n";}
    }

    return 0;
}


bool GetWord()
{   
    char Buffer[256];             //reading the file in memory
    int wordOffset=0;            //start at the beginning
    getline(tempFile,Buffer);
    
    if (Buffer[wordOffset]==0)   //end of the string?
    return false;

    char *p1, *p2;
    p1=p2=Buffer+wordOffset;    //point to the next word

    //eat leading spaces
    for (int i=0; i<(int)strlen(p1) && !isalnum(p1[0]);i++)
    p1++;                      //!isalnum letter and number 0, else 1

    //see if you have a word
    if (!isalnum(p1[0]) && p1[0]!='.')
    return false;

    //p1 now points to start of the next word
    //point p2 there as well
    p2=p1;

    //march p2 to the end of word
    while (isalnum(p2[0]))
    {p2++;}

    //p2 is now at end of the word
    //p1 is at beginning of word
    //length of word is the difference
    int len=int(p2-p1);

    //copy the word into the buffer
    strncpy(word,p1,len);

    //null terminate it
    word[len]='\0';

    //now find the beginning of the next word
    for (int j=int(p2-Buffer);j<(int)strlen(Buffer) && !isalnum(p2[0]);j++)
    {
        p2++;
        }

    wordOffset=int(p2-Buffer);
    
    return true;
}

the input file 60_10.out looks something like this:
../fc1f02091-es_om1328.r9t_1_3 Bacteria_Cyanobacteria_Prochlorales_Prochlorococcus:marinus
fcb203-1k12.ff40_b1.SCF_-1_-2 Bacteria_Proteobacteria_Deltaproteobacteria_Desulfuromonadales_Geobacter_Geobacter:uraniumreducens
fcb205-1l2.rf40_b1.SCF_1_3 Bacteria_Planctomycetes_Planctomycetacia_PlanctomycetalesCandidatus_Kuenenia_Candidatus:Kuenenia
anke5gh01-es_ot7.s9t_1_2 Bacteria_Firmicutes_Clostridia
anke5gi01-es_ot7.s9t_1_2 Bacteria_Firmicutes_Clostridia
anke5ca06-es_ot7.s9t_1_3 Bacteria_Firmicutes_Clostridia
fcb205-1a17.ff40_b1.SCF_1_2 Bacteria_Proteobacteria_Deltaproteobacteria_Desulfobacterales_Desulfococcus_Candidatus:Desulfococcus

No idea why my program keeps crashing.

Thank you once more for your help.

Recommended Answers

All 4 Replies

>No idea why my program keeps crashing.
I'd start here:

strncpy(word,p1,len);

And I'd start by recognizing that word doesn't point to anywhere meaningful yet. You're trying to write to an uninitialized pointer.

And I'd start by recognizing that word doesn't point to anywhere meaningful yet. You're trying to write to an uninitialized pointer.

don't quite get it. I have defined it as a char * word. what should I do? thanks

You probably need to read a pointer tutorial to find out what pointers are and how to use them.

don't quite get it. I have defined it as a char * word. what should I do? thanks

Here's a visual help to understanding pointers

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.