hi all, i have a code which searches for a string in a file, the searching works well except that it will only return that the string is found if it is the first word in the file, otherwise it will return not found. help me please

#include<stdio.h>
#include<string.h>
#include<conio.h>
#include<stdlib.h>

void main()
{
	FILE *fp,*fout;
	int i=0,len_string;
	char SearchText[30];
	
	char temp[30];
	printf("Enter the string: ");
	scanf("%s", &SearchText);
	fp=fopen("log.txt","a+");
	
	rewind(fp); /* for going to start of file. */
	if(fp==NULL )
	{
		printf("File couldn't be opened ");
		exit(0);
	}
	len_string=strlen(SearchText);
	while(!feof(fp))
	{
		for(i=0;i<len_string;i++) temp[i]=fgetc(fp);
		temp[i]='\0';
		if(stricmp(SearchText,temp)==0) /* the stricmp() is used for comparing both string. */
		{
			printf("The string is found!");
			
			fflush(fp);
			fclose(fp);
			
			getch();
			exit(1);
		}
		else
		{
			printf("NOT FOUND");
			getch();
			exit(1);
		}
		fseek(fp,-(len_string-1),1);
	}
	fclose(fp);
	
}

after some testing, i found that it actually reads the first character or the first word. help me please

Why you have line 17 and 44 and why you signal success (exit 0) only when file could not be opened? Why void main? it should allways be int main and return 0 on success, no exit needed.

Edited 5 Years Ago by pyTony: n/a

Tony has a good point: whether or not you use command-line arguments for this program, you should get in the habit of defining your main() as:

int main (int argc, char *argv[])
{
    ...
}

It is also customary to return 0 from main on success, and non-zero on failure. The specific non-zero exit value would indicate what kind of error happened, but I that's probably better for programs that are intended to be run by other programs or the operating system, as opposed to programs run by users.

Your program doesn't find the string later in the file because, after copying the first N bytes into temp, you exit whether you found the string there or not. If you don't find the string, you'd want to try again starting at the next position, and only indicate 'NOT FOUND' after you've finished searching the file.

You will be much kinder to your disk, if instead of backing up the file N-1 bytes each time, and re-reading most of the same data, instead copy each character in your temp array down one position, and just read one more byte into the end:

0 <- 1 <- 2 <- ... <- n-1 <- byte_from_file

You no longer need the first character in the temp array (temp[0]), so you can just overwrite it. Don't forget to check fgetc() for EOF!

whether or not you use command-line arguments for this program, you should get in the habit of defining your main() as:

Why define parameters that aren't used? You're lying to readers of your code by saying that you'll use command line arguments and then not using them. Not only does this reduce readability, it might also introduce warnings about unused parameters. If you don't use command line arguments, define make as taking no parameters:

int main(void)
{
    ...
}

As for the original code, I can't help but offer a full review:

Best Practice

>#include<conio.h>
Note that <conio.h> is not a standard header, many compilers will not support it, and those that do will not all support the same things in the same ways. For example, Borland compilers support textcolor() while Microsoft compilers do not, yet both support getch(). I won't say not to use <conio.h> because there are legitimate uses (such as kbhit() and getch() for reading raw input), but most of the time I see stupid crap like this:

#include <conio.h>

int main(void)
{
    clrscr(); /* 1) Either stupid or anti-social */

    ...

    getch(); /* 2) Unnecessary destruction of code portability */

    return 0;
}
  1. Clearing the screen at the start of the program does one of two things:
    • Nothing: If you run the program from an IDE or other application where the console window is created exclusively for your program, there won't be any previous output to clear.
    • Clear previous program output: If you're running from a command line where other programs have been run previously and produced output, your program will wipe away all of that output. I don't know about you, but when a program I run does that, it's the last time I run that program. You don't own the output of previous programs, so you shouldn't act like you do. Clearing the screen at the beginning of a program is anti-social unless you own the screen.
  2. Calling getch() at the end of the program certainly gives you the "Press any key to continue" style of pausing, but you can get close enough to that with the standard getchar() function. This function is portable and will work everywhere; getch() has no such guarantee.

>void main()
raptr_dflo already touched on this, but main() is not required to support a return type of void. If the compiler supports it, fine, but you've tied yourself to that compiler. If the compiler doesn't support it, you've invoked undefined behavior, which is the kiss of death for a C program. There are exactly two standard blessed definitions of main(), one taking no parameters:

int main(void)
{
    ...
}

And one taking two parameters:

int main(int argc, char *argv[])
{
    ...
}

Equivalent things like typedef'ing any of the types or renaming the parameters are allowed, but ultimately the definitions must match the two above to be portable. Compilers are allowed to support further arguments (the envp parameter is a common one) as well as other return types (a la void), but those are strictly compiler extensions.

A note on return values (this also applies to calls to exit()). 0 means success and non-zero means failure. There are three portable return values:

  • return 0; : You'll see this most often, it's the canonical success return when one wants to return success or doesn't care about the return value.
  • return EXIT_SUCCESS; : EXIT_SUCCESS is a macro defined in <stdlib.h> and essentially expands to 0. So returning 0 and returning EXIT_SUCCESS are functionally identical.
  • return EXIT_FAILURE; : EXIT_FAILURE is the only portable failure code, which is also defined in <stdlib.h> and cannot be safely assumed to be a specific value.

Occasionally you'll see things like return -1; for generic failure, but -1 isn't a portable return code. Anything aside from the portable three will have implementation and platform dependent behavior.

>printf("Enter the string: ");
>scanf("%s", &SearchText);

I/O in C is buffered by default, so your prompt may or may not show up before scanf() blocks for input. If the output buffer isn't flushed before the input stream blocks for input, your user won't see a prompt and won't know what to do. The guideline in C is that if you need something to show up immediately, either print a '\n' character or call fflush(), both of which force the output stream to flush itself.

In this case you want the input to be on the same line as the prompt, so fflush() is the only option:

printf("Enter the string: ");
fflush(stdout);
...

>scanf("%s", &SearchText);
I have three comments on this call to scanf:

  • &SearchText does not have the correct type. The %s specifier expects a pointer to char. Passing in an unadorned SearchText would be correct as SearchText is an array, passing an array as a function argument is value context, and arrays are converted to a pointer to the first element in value context. However, adding the address-of operator changes things. &SearchText uses SearchText in object context, takes the address, and returns it. The address is still the same--which is why this bug typically works as expected--but the type of the object being passed to scanf() changes from char* to char(*)[30] . This type mismatch invokes undefined behavior.
  • The %s specifier is very dangerous without a field width because you're asking for buffer overflow. Your array can only hold 30 characters, so what happens if the user types 100? scanf() doesn't know when to stop, so it'll happily write to memory beyond the end of the array. A field width will tell scanf() when to stop writing characters (and don't forget that strings are terminated with '\0').
  • It's rarely a good idea not to check input functions for success. In this case you could end up with an uninitialized array, which I hope speaks for itself as you'd be invoking undefined behavior by accessing it for reading. scanf returns the number of successful conversions from the format string, of which there's only one in yours. So testing for failure is simple:
    if (scanf("%29s", SearchText) != 1) {
        /* Handle the error */
    }

>fp=fopen("log.txt","a+");
>rewind(fp);
>if(fp==NULL )

These three lines are out of order. If fopen() fails and returns NULL, rewind will bomb on it. You should immediately test for failure before trying to use the file. Also, I notice you're never writing to this file, so why open it in append-update mode? Just open it for reading and the rewind() call won't be necessary.

>printf("File couldn't be opened ");
>exit(0);

This buffering issue is similar to the prompt for input, but in this case you're terminating the program. When the program terminates, all open streams are flushed, so you'll definitely see this error message. What you might also see if running this program from the command line is the command line prompt on the same line:

File couldn't be opened C:\> _

That's because you didn't include a '\n' character at the end of the format string.

>while(!feof(fp))
This is typically a fencepost error. feof() only returns true after you've tried and failed to read from the file. Loops using feof() as the condition will typically execute once more than expected unless something in the body stops it. It's much easier to use a record input function and use the return value as your loop condition.

>for(i=0;i<len_string;i++) temp=fgetc(fp);
temp='\0';

What if there aren't len_string characters left in the file? What you're doing is poorly simulating the effect of fgets(), so you should just use fgets() in the first place.

>if(stricmp(SearchText,temp)==0)
Note that stricmp() is not a standard function. I'm not saying that you can't use it, but be aware that using it affects the portability of your code. Code portability corresponds to the number of changes required to compile successfully on another compiler. 100% portability (no changes at all) is the ideal.

>fflush(fp);
>fclose(fp);

Calling fflush() on an input stream or an update stream where the most recent operation was input invokes undefined behavior. Some standard libraries will support the flushing of input mode streams, but you can't depend on it. This call to fflush() should be removed entirely, especially since it wouldn't have any noticeable effect anyway. You're immediately closing the file and terminating...

Question: What's the point of putting this string search code in a loop if you terminate on both the found and not found cases?

>fseek(fp,-(len_string-1),1);
This is undefined behavior because the file is not open in binary mode. Arbitrary seeking on text mode streams is not allowed; you can only seek to a previously saved position using ftell() or the equivalent of rewind() ( fseek(fp, 0, SEEK_SET) ).

Design Issues

This is a very small program, so you won't have a large number of design issues. Really the only one that stands out is how you're reading substrings from the file and comparing them. Obviously that's a big deal because this is the entire purpose of your program. The algorithm is a bit confused because you read N characters and then jump the file position back N-1 characters.

When processing files, it's generally best to avoid any kind of seeking because that complicates one's mental image of the algorithm flow. Ideally we would load the whole file into memory and then call strstr() or something similar to perform the search, but for large files the memory cost could be prohibitive. For sheer simplicity, I would initially go with a sliding window buffer that holds the most recent N characters from the file, where N is the length of the search string. That way you can do a simple case insensitive comparison with the search string and the buffer at any given time. The benefits of this design are that it's quite straightforward and the file is only read sequentially character-by-character.

The downside is potential inefficiencies in performing the slide because all but one character will be shifted to make room for an append operation:

[a][c][d][e] <-- start
[c][d][e][e] <-- Shift left by 1 (first half of the slide)
[c][d][e][f] <-- Append 'f' (second half of the slide)

However, since this program is I/O bound, I'm less worried about CPU-based operations. Most likely time spent reading from the file will be the vast majority, but profiling can determine where any slowness occurs for potential optimization. There's no need to go out of your way with optimizations unless you see a clear and noticeable benefit from it.

Usually I finish up with my version of the code, which can be risky for cases where students just run off with my excellent code (;)) and turn it in as their own. I'll take that risk here as well, on the off chance that you'll learn a great deal from a higher quality variant of your program:

#include <ctype.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define nocase_equal(c1, c2) (tolower((unsigned char)c1) == tolower((unsigned char)c2))
#define nocase_less(c1, c2)  (tolower((unsigned char)c1) < tolower((unsigned char)c2))

static int nocase_compare(const char *a, const char *b)
{
    while (nocase_equal(*a, *b)) {
        if (*a == '\0')
            return 0;

        ++a;
        ++b;
    }

    return nocase_less(*a, *b) ? -1 : +1;
}

static int match_string(FILE *in, const char *search_key)
{
    size_t len = strlen(search_key);
    char *buf = (char*)calloc(len + 1, 1); /* +1 to allow for '\0' */
    char next;
    int diff;

    /* Fill the buffer for the first time */
    if (fread(buf, 1, len, in) != len)
        return 0;

    /* Slide the buffer by one character until finished */
    while ((diff = nocase_compare(buf, search_key)) != 0 
        && (next = fgetc(in)) != EOF) 
    {
        memmove(buf, buf + 1, len - 1);
        buf[len - 1] = next;
    }

    return diff == 0;
}

int main(int argc, char *argv[])
{
    char search_key[BUFSIZ];
    FILE *in;

    if (argc < 2) {
        fputs("usage: prog <filename>\n", stderr);
        
        return EXIT_FAILURE;
    }
    
    /* Prepare the search file */
    errno = 0;
    in = fopen(argv[1], "r");

    if (!in) {
        fprintf(stderr, "Unable to open the file '%s': %s\n", 
            argv[1], strerror(errno));

        return EXIT_FAILURE;
    }

    /* Prepare the search string */
    /* 
        Note: We're prompting for a search string rather than taking
        it as a command line argument to display proper user input practice
    */
    printf("Please enter a string to search for: ");
    fflush(stdout);

    if (fgets(search_key, sizeof search_key, stdin) == NULL) {
        fputs("Error reading search string\n", stderr);

        return EXIT_FAILURE;
    }

    search_key[strcspn(search_key, "\n")] = '\0'; /* Trim the newline */

    /* The real work of this program: perform a file-based string search */
    printf("The search string '%s' was %s\n", search_key,
        match_string(in, search_key) ? "found" : "not found");

    /* Clean up and terminate */
    fclose(in);

    return EXIT_SUCCESS;
}
Comments
Looks like a lot of hard work!!
This article has been dead for over six months. Start a new discussion instead.