hi,
i am working on a code it has two parts:
for the first part i have 1500-4000 text files which have strings in them.some files are same and some have different strings and i have to find amongst those a subset of files which are 10-20 n number and they are covering all the strings in 1500 files. since the string are repeated alot in files, only some files can cover the variation of strings.
to implement this code i used the following logic:

1. Read all the coverage information from each test folder for Statement Coverage and put it into a String array, the array should contain only distinct element. Let this array be A

2. Now create a similar array B of boolean values of lenght A with all values False. This will be our array to check eveything is covered or not

3. Read coverage info one by one from test0 to test1590 and start including into the set. As you include one in the set, mark the corresponding elements in array B to be true.

4. Continue 3 till all the values in B is true.


my code has two parts,one of it which reads all the files and collects unique strings is as follows:
also the reference file is at first chosen as the file among 1500 files with maximum no of statements.

int comparefiles(const char *filename1,const char *filename2, int x);
int countlines(const char *filename);
#include <stdio.h>
#include <stdlib.h>
#include <string.h> 

int main()
{ 
int i=0,j=0,cmpx,nl1=0,nl2=0,dt; 
char buffer[25000]={},buffer1[25000]={}, buffero[25000]={};
FILE *output;
char buf[20000]={};
char buf1[20000]={};
char bufo[20000]={};
char string2[]="/COND.txt";
char string[]="rctcas.txt";
//char string[]="TEST";
for(i=0;i<1;i++)
{ 
sprintf(buf,"/home/csgrads/akhan015/desktop/programs/benchmarks/tcas/coverage/test1260%s",string2);
printf("reading reference file %s \n",buf);


for(j=0;j<=1589;j++)
{
sprintf(buf1,"/home/csgrads/akhan015/desktop/programs/benchmarks/tcas/coverage/test%d%s", j,string2);
printf("reading inner loop %s \n",buf1); //sending reference file and another file to compare
dt=comparefiles(buf,buf1,j);
}
}
return 0;

}



int countlines(const char *filename) //count no of lines
{
FILE *fm;
char line[1024];
int NumberOfLines = 0;
fm=fopen(filename, "r");
while( fgets(line,sizeof(line),fm) != NULL)
NumberOfLines++;
return(NumberOfLines);
fclose(fm);
}


int comparefiles(const char *filename1,const char *filename2, int x)
{
FILE * fref;
FILE *output;
FILE * myfile1;
char bufo[20000]={};
char cx1[10000]={} ,cx2[10000]={},cx3[10000]={};
int cmpx,cmpx1;
signed int s=-1;
int nl1,nl2,nl3;
//fflush(fref);
fref= fopen(filename1, "r");
nl1=countlines(filename1);
myfile1= fopen(filename2,"r");
nl2=countlines(filename2);

if((fref== NULL) || (myfile1== NULL))

printf("Error occurs in the file \n");

else
{
int j = 0, k=0;
rewind(myfile1);
first:
while((fgets(cx2 ,30 ,myfile1)!= NULL)) //choose strings and compare and stop when all the strings from //a file match reference file, i.e. no unique string to add
{j++;
int i = 0;
rewind(fref);
while((fgets(cx1 ,30, fref)!= NULL))
{
i++;
if((cmpx=strcmp(cx2 ,cx1))== 0) 
{
k++;
if(k==(nl2))
{printf("%d=%d FILES ARE SAME\n",k,nl2 );
return;
}

goto first;


}







}

if((j!=k)&&((cmpx=strcmp(cx2 ,cx1))!= 0)) //here a different statement is found //and checked whether it has been saved in rctcas during comparison with some other file
{

printf(" STATEMENT DOESNOT EXIST\n");

output=fopen("rctcas.txt","a+");
int l=0;
nl3=countlines("rctcas.txt");
second:
while((fgets(cx3 ,30, output)!= NULL))
{puts(cx2);
puts(cx3);
l++;
if((cmpx1=strcmp(cx2 ,cx3))== 0)
{

return 0;}
else 
if(l==nl3)
{fputs(cx2,output);
fclose(output);}
}


}
}

return 0;
}

i am having two problems in my code:
first after executing 255 times the code gives segmentation fault. this could be because of buffer overflow. maybe using malloc can work but i am not sure how to use malloc as i am a beginner.

second the code creates a reference file which compares a large number of files, e.g. 1500 or 4000 etc and it extracts all the unique strings from the files and store them.whenever the loop runs,reference file creates a checklist to check that all strings are covered. any file being compared is added on the condition that it has atleast one string not covered by previous files.according to the observation the refernce file might contain about 100 unique strings but my reference file is giving 17000 strings because the code has some error. please help me i have to submit the code tomorrow and i cant find the fault. while using fopen i gave 'a+' but that after running two times gives a weird memory table and says aborted.

I haven't studied your code in detail, but I should point out that you should never create static buffer array's with arbitrarily large sizes. This is especially true for a function that deals with files.

Instead, define a pointer and use malloc/realloc/free to manage its size dynamically so you can have confidence knowing that buffer overruns will not happen. It is only safe to use such static buffers when you are absolutely sure there is no way the buffer index will be exceeded.

You might have opened too many files, at one time - I mean perhaps you kept opening files, and not always closing the previous files, and continued opening more files. On most compilers, C has a limit of 256 files being opened at any time: 0 to 255. (if the operating system will allow it, of course, each OS installation may have a different upper limit, depending on it's resources.)

I'll study your code for awhile, but it's very late, and I'm going to sleep, soon. Without a small sample of the data, it's probably not going to be productive.

Your code is very difficult to study because you have not indented it to show subordinate lines of code, as indented, like you should have.

In comparefiles() you open two files and never close either of them. For the reference file, you can pass the FILE * (pointer), as a parameter, and not continually open and close, open and close, the file. Just rewind() it when you need to start at the beginning of the file again. For all other files that are opened in a function, they should be closed before the function returns.

Edited 5 Years Ago by Adak: n/a

As NIGHTS has stated, you run a HUGE risk of crashing the static sized arrays, by overstepping their boundaries, AND there is a risk that you will crash the program because it has run out of stack space - that is a small space set aside by C, but it is NOT the major space available. That major space is available by using malloc() or calloc().

Another problem with your code is that you never check whether the file you requested to be opened, has in fact, been opened. fopen returns a NULL pointer if a file can't be opened, but you never check for that. That can also crash your program.

this is more explained code.i tried closing the files every time but it still gives segmentation fault.secondly please tell me how to modify the code to add malloc.i havent used it before. i read abt it but i cant understand how to change this code to add malloc.please help me.

void comparefiles(const char *filename1,const char *filename2, int x);
int countlines(const char *filename);//these functions are to compare files passed as an argument and compare files.the countline function is to count the number of lines in the code.
#include <stdio.h>
#include <stdlib.h>
#include <string.h> 

int main()
{  
int i=0,j=0,cmpx,nl1=0,nl2=0,dt;   
char buf[20000]={};//this buffer has the path copied in it for the reference file
char buf1[20000]={};//this buffer has the path copied in it for the files to be   compared
char string2[]="/COND.txt";//the text file to be accessed
char string[]="rctcas.txt";//name of reference file

for(i=0;i<1;i++) //this loop opens reference file
{   
sprintf(buf,"/home/csgrads/akhan015/desktop/programs/benchmarks/tcas/coverage/test1260%s",string2);
printf("reading reference file %s \n",buf);


	for(j=0;j<=1589;j++)//this loop opens the 1500 files to be compared
	{
	sprintf(buf1,"/home/csgrads/akhan015/desktop/programs/benchmarks/tcas/coverage/test%d%s", j,string2);//variable paths
	printf("reading inner loop %s \n",buf1);
	comparefiles(buf,buf1,j);
        }
}
return 0;

}



int countlines(const char *filename)
{
FILE *fm;
char line[1024];
int NumberOfLines = 0;
fm=fopen(filename, "r");
while( fgets(line,sizeof(line),fm) != NULL)
   NumberOfLines++;
return(NumberOfLines);
fclose(fm);
}


void comparefiles(const char *filename1,const char *filename2, int x)
{
FILE * fref;
FILE *output;
FILE * myfile1;
char cx1[10000]={} ,cx2[10000]={},cx3[10000]={};
int cmpx,cmpx1;
signed int s=-1;
int nl1,nl2,nl3;
//fflush(fref);
fref= fopen(filename1, "r");//opening reference file
nl1=countlines(filename1);//counting no of lines in reference file
myfile1= fopen(filename2,"r");//opening file one of 1500 files
nl2=countlines(filename2);//counting lines

if((fref== NULL) || (myfile1== NULL))

printf("Error occurs in the file \n");


else
{
int j = 0, k=0;
rewind(myfile1);
first:
while((fgets(cx2 ,30 ,myfile1)!= NULL)) //pick 1 string from file to compare it with the refernce file
	{j++; //j tells the no. of times this loop executes
	int i = 0;
	rewind(fref);
	while((fgets(cx1 ,30, fref)!= NULL))//this loop compares a string with all the strings in reference file
	{
	
	i++;// no of times 2nd loop execute
		if((cmpx=strcmp(cx2 ,cx1))== 0) 
		{
		
		if(k==(nl2))// if all strins of a file are in the reference file then we go to another file and repeat the procedure
		{printf("%d=%d FILES ARE SAME\n",k,nl2 );
		return;
		}
		k++,//no of matches in both files
                  else
		goto first;//continue comparing strings
		
		
		}


		
	}
		
		if((j!=k)&&((cmpx=strcmp(cx2 ,cx1))!= 0))// if all the strings are picked and all match then j==k but here some string doesnt match
		{
	
		printf(" STATEMENT DOESNOT EXIST\n");
		
	        output=fopen("rctcas.txt","a+");

int l=0;
nl3=countlines("rctcas.txt");//count lines in the file 
second:
while((fgets(cx3 ,30, output)!= NULL))// here before writing the string to the file i want to check it is not appended again i.e if the same string was there in some other file but not in reference that it would written again and again
{puts(cx2);
puts(cx3);
l++;// l shows how many times loop is executed
if((cmpx1=strcmp(cx2 ,cx3))== 0)
{
fclose(myfile1);
fclose(fref);
return;}
else 
if(l==nl3)//if string does not exist and the whole file is checked
	        {fputs(cx2,output);
	        fclose(output);}}
		
		}
		
		
	}
}
fclose(myfile1);
fclose(fref);
return;
}

This is a complete example of how to use malloc with files opened with fopen()

// Define our file buffer. Notice how I don't hard code any size.
char *SafeBuffer = NULL;

// Open the file to load
FILE *FileHandle = fopen("somefile", "r+b");

// Make sure the file opened successfully
if (FileHandle == NULL) {
    printf("Failed to open the file");
    return;
}

// Point to the end of the file stream
fseek(FileHandle, 0L, SEEK_END);

// Get the size in bytes of the file that was just opened
int FileSize = ftell( FileHandle );

// This is how you dynamically allocate the buffer memory
SafeBuffer = malloc( FileSize  );

// Make sure the computer had enough memory for the job
if (SafeBuffer == NULL) {
    printf("Not enough memory for buffer");
    fclose( FileHandle );
    return;
}

// reset the file stream to the beginning
rewind( FileHandle );

// Load the file into the buffer
int Count = 0;
for (; !feof( FileHandle ); Count++) 
    SafeBuffer[Count] = (char) fgetc ( FileHandle );

/*** Do some work with the buffer here ***/

// Free resources
fclose( FileHandle );
free( SafeBuffer );

hi all,
thanks for your suggestions.
1)
i added fclose but it gives segmentation fault again.
2)
Please can u check the logic of the last part of the function compare files which is trying to eliminate identical strings to be added. because i am having a large reference file which contains unique strings from all the files. the reference file should contain only 100 strings because in all of 1500 files the unique strings are 100 - 200 but i am getting 17000 strings.
for choosing the reference file i take the file with maximum no. of strings among the 1500 files and compare it with all other files.
3)
Another question i am confused about is that for the last part of logic in compare files the program gives a weird error. it shows a table with memory addresses talks abt glib.c and says aborted at the end. this error comes when i use 'a+' in fopen instead of 'a'. but if i use 'a' i cant read the refence file because i have to compare the existing reference file so that i can eliminate identical strings. is my logic fine or i should try to make the code for comparing the referece file with string arrays.
i have a sample of data to give an idea how things go in my code.
my reference files initially is:

tcas.c:63:0x8048447:0
tcas.c:73:0x8048470:1
tcas.c:79:0x80484B7:0
tcas.c:79:0x80484C3:0
tcas.c:79:0x80484D0:0
tcas.c:91:0x8048502:1
tcas.c:97:0x804854A:0
tcas.c:97:0x8048553:0
tcas.c:97:0x8048560:1
tcas.c:118:0x80485B6:0
tcas.c:118:0x80485C2:0
tcas.c:118:0x80485CE:0
tcas.c:120:0x80485F5:0
tcas.c:120:0x80485FE:0
tcas.c:124:0x8048612:0
tcas.c:124:0x804861C:0
tcas.c:124:0x8048622:1
tcas.c:126:0x804863C:0
tcas.c:126:0x8048645:1
tcas.c:127:0x8048662:0
tcas.c:127:0x804866B:0
tcas.c:128:0x804867E:1
tcas.c:133:0x8048693:1
tcas.c:135:0x80486A2:0
tcas.c:148:0x80486D9:1

with this reference file i am comparing 1500 files.
sample of two files out of 1500 are:
first file:

tcas.c:63:0x8048447:0
tcas.c:73:0x8048470:0
tcas.c:75:0x8048480:0
tcas.c:75:0x8048489:0
tcas.c:75:0x8048496:0
tcas.c:91:0x8048502:0
tcas.c:93:0x8048512:0
tcas.c:93:0x804851E:0
tcas.c:93:0x804852B:0
tcas.c:118:0x80485B6:0
tcas.c:118:0x80485C2:0
tcas.c:118:0x80485CE:0
tcas.c:120:0x80485F5:1
tcas.c:124:0x8048612:0
tcas.c:124:0x804861C:1
tcas.c:124:0x8048628:0
tcas.c:126:0x804863C:1
tcas.c:127:0x8048662:0
tcas.c:127:0x804866B:1
tcas.c:128:0x804867E:1
tcas.c:133:0x8048693:1
tcas.c:135:0x80486A2:1
tcas.c:148:0x80486D9:1

2nd file:

tcas.c:63:0x8048447:0
tcas.c:73:0x8048470:0
tcas.c:75:0x8048480:0
tcas.c:75:0x8048489:0
tcas.c:75:0x8048496:1
tcas.c:91:0x8048502:0
tcas.c:93:0x8048512:0
tcas.c:93:0x804851E:0
tcas.c:93:0x804852B:1
tcas.c:118:0x80485B6:0
tcas.c:118:0x80485C2:0
tcas.c:118:0x80485CE:0
tcas.c:120:0x80485F5:1
tcas.c:124:0x8048612:0
tcas.c:124:0x804861C:1
tcas.c:124:0x8048628:0
tcas.c:126:0x804863C:0
tcas.c:126:0x8048645:0
tcas.c:127:0x8048662:1
tcas.c:128:0x804867E:0
tcas.c:128:0x8048684:1
tcas.c:133:0x8048693:0
tcas.c:148:0x80486D9:1

for the output the strings in the reference file are not repeated, but the strings that are different than the reference file are appended again and again. there is probably some error in the last part of the function compare files, after fopen("rctcas.txt","a+")

please suggest what i should do, i will be really obliged.

Edited 5 Years Ago by amnakhan786: n/a

I wrote a sweet little program for you over on the other board:
http://cboard.cprogramming.com/c-programming/143397-large-reference-file.html#post1070790

Check the LAST post in the thread.

I couldn't just correct your program. You will LOVE the new one, though. ;)

1) Once the base file name is assigned, it finds the whole set of numbered files, like:

data1.txt
data2.txt

etc. If you want to extend the number of files, just enlarge the filename char array.

2) It uses a binary search to eliminate all duplicate strings.

3) It uses an index[] array to do the sorting, so no strings are ever moved.

4) It works in memory, as much as possible, so it's quick. This is not the absolute fastest way to handle the whole job, but it is quick.

(The fastest way to do this job would require more knowledge of the system it's running on, how the files are named, how much resources are available, and testing.)

5) It needs no "reference" file. Any file can be the starting file.

6) It needs smaller resources (much smaller), so no malloc or really large static arrays, are needed.

It handles the above data in the blink of an eye. Sweet! ;)

Edited 5 Years Ago by Adak: n/a

This article has been dead for over six months. Start a new discussion instead.