1.11M Members

Program to remove all comments (comments start with \\)

 
0
 

I've been trying to write this program and so far this is the best I could come up with. It doesn't work though.

#include<stdio.h>
#include<conio.h>
void main()
{
    FILE *fp,*fp1;
    int i=0;
    char str[80],fname[]="a1.txt",fname1[]="a2.txt";
    clrscr();
    fp=fopen(fname,"r");
    fp1=fopen(fname1,"w");

    if(fp==NULL)
    {
        printf("\n Cannot Open File : %s",fname);
        getch();
        exit(1);
    }

    printf("\n File Data : \n");

    while(!feof(fp))
    {
        fgets(str,80,fp);
           //   printf("%s",str);
           //   fputs(str,fp1);
        if(strstr(str,"//"))
        {

            while(str[i]!='\0')
            {
                if(str[i]!='/'&&str[i+1]!='/')
                {
                    putc(str[i],fp1);
                    i++;
                }
                else
                {
                    putc('\0',fp);
                    i++;
                    break;
                }
            }   
        }
        else
        {
            fputs(str,fp1);
        }
    }

    fclose(fp);
    fclose(fp1);
 
0
 

There's some edge cases that could make it somewhat complex. Mostly because you could have lines bigger than your buffer. This will result in two types of reads: those that end in newlines, and those that do not. (and a newline happens to be exactly the symbol that marks the end of the section you want to ignore) You'll need to keep track of whether you're inside a comment or not. In addition you could have reads like "<text>/" followed by a next read that looks like "/<text>". You need to detect that too.

Below is a little example. A little sloppy due to time constraints but it shows the concept I think.

#include <stdio.h>
#include <string.h>
#include <stdbool.h>

#define BUFFER_SIZE (80)

int main(void)
{
    FILE *fp  = NULL,
         *fp1 = NULL;
    char *token_pos = NULL,
         *copy_pos  = NULL;
    const char *fname  = "a1.txt",
               *fname1 = "a2.txt";
    char str[BUFFER_SIZE] = {0},
         character        = 0;
    int buffer_length     = 0;
    bool inComment        = false;


    // Open the files.
    fp  = fopen(fname ,"r");

    // Opening of the input file failed.
    if(fp == NULL)
    {
        printf("Error on opening input file \"%s\".\n", fname);
    }
    else
    {
        fp1 = fopen(fname1,"w");

        // Opening of the output file failed.
        if (fp1 == NULL)
        {
            printf("Error on opening output file \"%s\".\n", fname1);
        }
        else
        {
            // Note that feof becomes true after attempting to read out of bounds.
            // An easier way is probably to use fgets as your loop condition.
            while (fgets(str, BUFFER_SIZE, fp) != NULL)
            {
                // Store the size of the buffer as it is possible it is not fully filled.
                buffer_length = strlen(str);

                // Look for a position to start copying from. If we were in a comment this
                // position starts after the first newline that is read, otherwise from the start.
                if (inComment)
                {
                    copy_pos = strchr(str, '\n');
                }
                else
                {
                    copy_pos = str;
                }

                // A position to copy was found.
                if (copy_pos != NULL)
                {
                    // We found a position to copy, which is not a comment section by definition.
                    inComment = false;

                    // Look for "//".
                    token_pos = strstr(copy_pos, "//");

                    // A "//" was found!
                    if (token_pos != NULL)
                    {
                        // Copy everything to our output until that point.
                        token_pos[0] = '\0';
                        fputs(copy_pos, fp1);

                        // If reading stopped because of a newline, include that one in the output too.
                        if (str[buffer_length - 1] == '\n')
                        {
                            fputc('\n', fp1);
                        }

                        // read didn't include line termination. Set our flag so we know next reads will be part of a comment.
                        else
                        {
                            inComment = true;
                        }
                    }
                    else
                    {
                        // It is possible our read ends ith a single '/'. This could be part of a next read.
                        if (buffer_length > 0 && str[buffer_length - 1] == '/')
                        {
                            // Look at the next character in the stream.
                            character = fgetc(fp);

                            // It will be part of a comment!
                            if (character == '/')
                            {
                                str[buffer_length - 1] = '\0';
                                inComment = true;
                            }

                            // Put it back. (not really needed)
                            ungetc(character, fp);
                        }

                        fputs(copy_pos, fp1);
                    }
                }
            }

            // Close the output file.
            fclose(fp1);
        }

        // Close the input file.
        fclose(fp);
    }


    return 0;
}

This should remove comment lines from a file, but it keeps the newline that follows it. I'm not sure if you wanted to remove this newline if the comment is the only thing the line consists of. You'd have to modify the code above (or your own) to achieve that if this is something you do want.

 
0
 

There's some edge cases that could make it somewhat complex.

Yes, but they have nothing to do with long lines. ;)

Mostly because you could have lines bigger than your buffer.

Actually, this is a good place for an extended fgets() that isn't restricted by a buffer size:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/*
    @description: 
        Reads a string from the given input stream up to and
        including one of the characters in a string of delimiters.
    @return:
        The length of the final string.
    @notes:
        The result string must be released with free() from stdlib.h.
*/
size_t readline(FILE *in, char **s, const char *delim)
{
#define READLINE_CHUNK 16

    size_t capacity = 0, size = 0; /* Buffer sizing */
    char *dst = NULL, *temp;       /* Buffer contents */
    int ch, done = 0;              /* Character processing */

    while (!done) {
        /* Resize with an extra chunk if necessary */
        if (size == capacity) {
            capacity += READLINE_CHUNK;

            if (!(temp = (char*)realloc(dst, capacity + 1)))
                break;

            dst = temp;
        }

        /* Fill in the newest chunk with string data */
        while (!done && size < capacity) {
            if ((ch = getc(in)) != EOF)
                dst[size++] = (char)ch;

            done = (ch == EOF || strchr(delim, ch));
        }
    }

    /* Finalize the string */
    if (dst)
        dst[size] = '\0';

    *s = dst; /* Save the string */

    return size;

#undef READLINE_CHUNK
}

As far as edge cases go, I'd be more worried about true edge cases for comments. Two that come to mind are // in places where they don't start a comment (such as inside a string) and line continuation:

// This is a comment \
this is still the same comment!

Though I suspect the OP's project doesn't need to be absolutely thorough in terms of all possible cases.

 
0
 

Yes, but they have nothing to do with long lines. ;)

They do in the context I posted that in; you cannot guarantee that a comment found within read data ends within that same data.

Actually, this is a good place for an extended fgets() that isn't restricted by a buffer size:

Yeah, that's probably useful.

As far as edge cases go, I'd be more worried about true edge cases for comments. Two that come to mind are // in places where they don't start a comment (such as inside a string) and line continuation. Though I suspect the OP's project doesn't need to be absolutely thorough in terms of all possible cases.

Adding that would introduce quite some complexity yes. I also don't think it's what the OP wants though, given his minimalistic description of his problem (When interpreted more freely his title may imply him wanting to remove comment blocks too (with this I mean "/* */" ) even though he only mentions a subset of what comments could be in the part added in parenthesis).

 
0
 

Unfortunately, it would have to be even more complicated that that. You have to be able to intelligently ignore // when it occurs in quotes, otherwise, you would end up cutting strings apart. Take the following code, for example:

#include <stdio.h>

int main(void)
{
    printf("I like to // eat apples.\n");
    return 0;
}

Without a check to make sure the double slashes aren't in quotes, you would end up with this after running your program:

#include <stdio.h>

int main(void)
{
    printf("I like to 
    return 0;
}

This would, of course, generate a compiler error.

 
0
 

Unfortunately, it would have to be even more complicated that that. You have to be able to intelligently ignore // when it occurs in quotes

That was already mentioned as an edge case. ;) But the example is helpful for making it clear what would happen.

You
This article has been dead for over six months: Start a new discussion instead
Post:
Start New Discussion
View similar articles that have also been tagged: