0

Dear Internets,

I am currently trying to write a program that can extract the "header" from a FASTA file. and then output a portion of the header to a new file. The part of the header i need is the "CONTIG_X" part. When I run this program it prints duplicates of the same "CONTIG_X" and then proceeds to the next.

Can you please tell me what I doint wrong?

#include <stdio.h>
int main(int argc, char *argv[]){
    char line[200];
    char part1[25], part2[25], part3[100], part4[25], part5[25];
    int c;
    FILE *file_in = fopen(argv[1], "r");
    FILE *file_out = fopen("header_chart1.txt", "w");

    while(fgets(line, 200, file_in) != NULL){

        if(line[0] == '>')  
            sscanf(line, "%s | %s | %s | %s | %s", part1, part2, part3, part4, part5);
            fprintf(file_out, "%s\n", part2);
    }
    fclose(file_out);
    fclose(file_in);
}

The output looks like this:
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_1
CONTIG_2
CONTIG_2
CONTIG_2
CONTIG_2
CONTIG_2
CONTIG_2
...

The file (argv[1]) the program opens looks like this: (EXCEPT with thousands of contigs (instead of just 2))

ACDU01000003 | CONTIG_3 | part of supercont3.1 of Allomyces macrogynus ATCC 38327 | [14502-15283] | 782 nt
CGTCGTGGCCATAATAGTCGTCCTTGTCATCATGGCCATAGCCATACGAGTCGTACGAAT
CATGGCCGTAGTAATCGTCCTTCTTGTCGTCATGACCATAGCCATACGAGTCGTACGAGT
CGTAGCTGTCGTAGTCATCCTTGTTGTCATAGCCATAGCCATACGAATCATACGAATCAT
ACDU01000001 | CONTIG_1 | part of supercont3.1 of Allomyces macrogynus ATCC 38327 | [1-833] | 833 nt
CTCCGACTCGCCAGAGTCAAATGGGCTTGCCGAGCGGACGCAGGGTGTGCTCAAGTCGAT
GGTGCGTGCGGCCATGACGGCCGCCAAGGCGCCGGATTCCCTCTGGCCAGAGTGTGTGCG
CGCGGCGTGCTATGTGCGCAACCGTGTGCCAAGTGACTCGCTCGATGGTCGCTCGCCATA

Edited by Slayer_Dude_420

2
Contributors
2
Replies
4
Views
4 Years
Discussion Span
Last Post by Slayer_Dude_420
1

Try enclosing the entire if body in brackets.

while(fgets(line, 200, file_in) != NULL){
   if(line[0] == '>') {
      sscanf(line, "%s | %s | %s | %s | %s", part1, part2, part3, part4, part5);
      fprintf(file_out, "%s\n", part2);
   }
}

What you currently have functions, instead, like

while(fgets(line, 200, file_in) != NULL){
   if(line[0] == '>') {
      sscanf(line, "%s | %s | %s | %s | %s", part1, part2, part3, part4, part5);
   }
   fprintf(file_out, "%s\n", part2);
}

Notice how, in the second example, no matter what the if statement results in you always print out the contents of part2.

0

OH WOW.

That was so trivial yet so helpful!

Thanks from the the bottom of my random access memmory.

This article has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.