I did some code to refresh my C. So many things have happened since 1987, but at least I started with ANSI instead of K&R. As I used the language so little, I think it is time to do some practise and even to learn some C++ to support my Python coding.

See my code and tell, where I am doing stupid things. This is half copy of one other thread's code, but as I did not understand/trust the linked list part, I simplified it to simple array based fixed memory allocation version to practice structs and file IO. %s format I could not get to work with my input so I resorted to other formats. Fields are separated by ';' and unquoted.

#include <stdio.h>

#define MAXDATA 100

struct entry {
	char first_name[15];
	char last_name[20];
	char date_of_birth[11];
	char email[45];
	char phone[15];
	char address[50];
	char city[15];
	char zipcode[6];
};

int main(int argc, char* argv[])
{
    char buffer[255];
    int i, index;

    struct entry contacts[MAXDATA];
    FILE* myfile;
    struct entry next_data;

    // with one parameter use it for input file, otherwise use "data.csv" default
    if(argc == 2){
        myfile = fopen(argv[1], "r");
        if (myfile) printf("Opened datafile %s\n", argv[1]);
    } else myfile = fopen("data.csv", "r");

    if(!myfile) {
        printf("Input file did not exist!");
        return 1;
    }

    printf("Reading from the file...");
    i = 0;
    while(fgets(buffer, 255, myfile) != NULL && i < MAXDATA) {
            sscanf(buffer,
                   " %15[^;\t\n] ; %20[^;\t\n] ; %11[^;\t\n] ; %45[^;\t\n] ; %15[^;\t\n] ; %50[^;\t\n] ; %15[^;\t\n] ; %6[^;\t\n]",
                   next_data.first_name, next_data.last_name, next_data.date_of_birth, next_data.email, next_data.phone,
                   next_data.address, next_data.city, next_data.zipcode);
            contacts[i++] = next_data;
            //printf("\nzip %s", next_data.zipcode); //debug print of last info at line
    };
    printf("\n%d names entered.\n---------------------\n", i);

    // printing various info for test
    contacts[i].first_name[0] = '\0';
    for(index = 0; contacts[index].first_name[0] != 0 && index < MAXDATA; index++) {
        printf("\n%s, %s (born %s)\n%s (%s)\nphone: %s\n",
               contacts[index].last_name, contacts[index].first_name, contacts[index].date_of_birth,
               contacts[index].address, contacts[index].city, contacts[index].phone);
    }

    return 0;
}

example data.csv file (data scrambled):

Tony;Fine;01.07.1965; tony.fine@gmail.com;0503711775;Kalastajantie 1 C 4;ESPOO;02230
Edlira;Alimerko; 01.12.1965;edlira.alimerko@gmail.com;0502220912; Kalastajantie 1 C 4;ESPOO;02230
Seppo;Suomalainen; 01.02.1950;seppo.suomalainen@suomi.fi; 050342134927;Suvikumpu 4;ESPOO;02130

Edited 5 Years Ago by pyTony: n/a

Not such a improvement but looks like I got extra semicolon at line 45, also compiler does not complain removing it (like it did about all those semicolons I forgot, being spoiled with Python).

1. Error reporting. The message at line 32 is not informative. Failure to open file doesn't mean that the file doesn't exist. Use strerror(errno) to get a real cause. It is also a common courtesy to include the filename in the error message.

2. An unnecessary copy of a (fairly large) struct entry at line 43. You may scan data directly into contacts, and get rid of next_entry altogether.

3. Check the return value of sscanf.

4. The large argument list for scanf is better to be avoided. It becomes very hard to follow which conversion in a format string correcponds to which argument. You may want to consider scanning elements one by one.

Thanks for suggestions.

1. I have the default file name behaviour and was lazy to change my code logic of deciding between that and argv[1] as that was added in the end. So I change a variable to have either default or argv[1], then I am ready to include the name in error message. Thanks for info on strerror.

2. Laziness to write subscripts, but it is really easy to just do find and replace.

3. Hmmm must see what kind of return value it has from documentation.

4. I would have produced it in loop in Python (except there I would have done simple split), but C string expressions being so clumsy, I had not skill to do it so here.


However I was thinking what kind of first problems I had with the code with original limited widths in struct. I had longer than reserved limit string in some fields. Result was that that field included the next field as that field had overwriten the end '\0' from previous field and so that field was not terminated properly. So I think the limits should be reduced by one in the sscanf to avoid end null overwriting.

I tried to reduce the maximum, but extra long field seems to mix up the sscan, which should drop the letters until next ';' before continueing.

Data to overflow the field length is easy to do, say:

Tony;Finesssssssssssssssssssssssssssssssssssssssssssssse;01.07.1965; tony.fine@gmail.com;0503711775;Kalastajantie 1 C 4;ESPOO;02230

And result in printing is that output become mess for that line:

Opened datafile data.csv
Reading from the file...
3 names entered.
---------------------

Finessssssssssssssss, Tony (born )
 ( )N┬wBN┬wá↑@)
phone: ┬wÞ→┼w, "

Alimerko, Edlira (born 01.12.1965)
Kalastajantie 1 C 4 (ESPOO)
phone: 0502220912

Suomalainen, Seppo (born 01.02.1950)
Suvikumpu 4 (ESPOO)
phone: 050342134927

Process returned 0 (0x0)   execution time : 0.016 s
Press any key to continue.

Edited 5 Years Ago by pyTony: n/a

The for loop on line 50 can be reduced to

for(index = 0;index <i;index++){
       // Your printing stuff here
}

In this piece of code, yes. But I was testing the marking of the first unused record so that later I can return the read values to main function for processing (search/filter, sort, add/delete)

I was testing the marking of the first unused record so that later I can return the read values to main function for processing (search/filter, sort, add/delete)

In case you do read MAXDATA records, contacts[i].first_name[0] = '\0'; will be an out-of-bounds write.

@mitrmkar: I was suspicious of that, but did not test that yet, I tested with MAXDATA = 2, and this worked with 3 records in file. I also changed the order of limits checks to compare to MAXDATA first

#include <stdio.h>
#include <string.h>
#include <errno.h>
#define MAXDATA 2

struct entry {
	char first_name[15];
	char last_name[20];
	char date_of_birth[11];
	char email[45];
	char phone[15];
	char address[50];
	char city[15];
	char zipcode[6];
};

int main(int argc, char* argv[])
{
    char* buffer[255];
    char* filename;
    int i, index, result;

    struct entry contacts[MAXDATA];
    FILE* myfile;

    // with one parameter use it for input file, otherwise use "data.csv" default
    filename = (argc == 2)? argv[1] : "data.csv";

     myfile = fopen(filename, "r");

    if(!myfile) {
        printf("%s: \"%s\"", strerror(errno), filename);
        return 1;
    }

    printf("Reading from the file \"%s\"...", filename);
    i = 0;
    while(fgets(buffer, 255, myfile) != NULL && i < MAXDATA) {
            result = sscanf(buffer,
                   " %14[^;\t\n] ; %19[^;\t\n] ; %10[^;\t\n] ; %44[^;\t\n] ; %14[^;\t\n] ; %49[^;\t\n] ; %14[^;\t\n] ; %5[^;\t\n]",
                   contacts[i].first_name, contacts[i].last_name, contacts[i].date_of_birth, contacts[i].email, contacts[i].phone,
                   contacts[i].address, contacts[i].city, contacts[i].zipcode);
            if(result < 8) {
                printf("\nData reading failed in field number %d, record %d:\n%s\n", result, i+1, buffer);
                exit(2);
            }
            i++;
    }
    printf("\n%d names entered.\n---------------------\n", i);

    // mark last if not full
    if (i < MAXDATA) contacts[i].first_name[0] = '\0';
    // printing various info for test
    for(index = 0; index < MAXDATA && contacts[index].first_name[0] != 0; index++) {
        printf("\n%s, %s (born %s)\n%s (%s)\nphone: %s\n",
               contacts[index].last_name, contacts[index].first_name, contacts[index].date_of_birth,
               contacts[index].address, contacts[index].city, contacts[index].phone);
    }

    return 0;
}

Is there any easy way to iterate over the records in struct instead of access them by name, and find the maximum length of them without storing them in some constant array first. Then I can loop mayby with strtok instead of sscanf and check bounds before copying each to structure. Now I stop for unsuccessfull record read. I can not separate the sscanf's as the read would start again from beginning of string (without kludges like buffer = &buffer[length_of_this_record+1]), strtok has possibility of passing NULL as argument to continue.

Edited 5 Years Ago by pyTony: n/a

As other practise I figured out that "%*s" could be used as word counter:

#include <stdio.h>
#include <string.h>
#include <errno.h>

int main(int argc, char* argv[])
{   FILE* myfile;
    int   words;
    char* filename;
    char* default_file = "data.csv";

    filename = (argc == 2)? argv[1] : default_file;

    myfile = fopen(filename, "r");

    if(!myfile) {
        printf("%s: \"%s\"", strerror(errno), filename);
        return 1;
    }

    for(words = 0; fscanf(myfile, "%*s") != EOF; words++);
    printf("Wordcount of file \"%s\" is %i.", filename, words);
    return 0;
}

Edited 5 Years Ago by pyTony: n/a

Next experiment moving to constant int limits and splitting line with strtok:

#include <stdio.h>
#include <string.h>
#include <errno.h>
#define MAXDATA 2000

struct entry {
	char first_name[15];
	char last_name[20];
	char date_of_birth[11];
	char email[45];
	char phone[15];
	char address[50];
	char city[15];
	char zipcode[6];
};

int main(int argc, char* argv[])
{
    char buffer[BUFSIZ];
    char* filename;
    int i, index;
    char* result;

    struct entry contacts[MAXDATA];
    struct entry* current;
    // tried to use limits in struct definition, not possible
    const int limits[]={15, 20, 11, 45, 15, 50, 15, 50, 15, 6};
    FILE* myfile;

    // with one parameter use it for input file, otherwise use "data.csv" default
    filename = (argc == 2)? argv[1] : "data.csv";

     myfile = fopen(filename, "r");

    if(!myfile) {
        printf("%s: \"%s\2", strerror(errno), filename);
        return 1;
    }

    printf("Reading from the file \"%s\"...\n", filename);
    for(i = 0; fgets(buffer, 255, myfile) != NULL && i < MAXDATA; i++) {
            current = &contacts[i];
            result = strtok(buffer,  ";");
            for(index = 0; index < 8 && result != NULL; index++) {
                // pass spaces in front
                while(result[0]==' ') result = &result[1];
                //printf("%d: limit %d, actual %d\n", index, limits[index], strlen(result));
                if(limits[index] > strlen(result)){
                    switch(index){
                        case(0):
                            strcpy(current->first_name, result);
                            break;
                        case(1):
                            strcpy(current->last_name, result);
                            break;
                        case(2):
                            strcpy(current->date_of_birth, result);
                            break;
                        case(3):
                            strcpy(current->email, result);
                            break;
                        case(4):
                            strcpy(current->phone, result);
                            break;
                        case(5):
                            strcpy(current->address, result);
                            break;
                        case(6):
                            strcpy(current->city, result);
                            break;
                        case(7):
                            strcpy(current->zipcode, result);
                            break;
                    }
                } else {
                    break; // if comment out this check failure it succeeds even too long field!
                }
                if(index < 8) result = strtok(NULL, ";");
            }
            if(index < 8) {
                printf("\nData reading failed in field number %d, record %d:\n\"%s\"\n", i + 1, index, result);
                return 2;
            }
            //else printf("Record %d OK\n\n", i + 1);
    }
    printf("\n%d names entered.\n---------------------\n", i);

    // printing various info for test
    for(index = 0; index < i; index++) {
        printf("\n%s, %s (born %s)\n%s (%s)\nphone: %s\n",
               contacts[index].last_name, contacts[index].first_name, contacts[index].date_of_birth,
               contacts[index].address, contacts[index].city, contacts[index].phone);
    }

    return 0;
}

Edited 5 Years Ago by pyTony: n/a

Hey tonyjv,

If you use strok(), it successively replaces the next occurrence of the ';' (or whatever) with a '\0' (indicating end of string), and returns the location of the start of the preceding string. So, if you're stuck with fixed-size fields in your struct, you can use strncpy to copy the first N characters of each word into the next field, or if dynamic memory is OK, then replace each:

char fieldName[fixed_field_size];

with

char *fieldName;

and then with each return from strtok(), for the corresponding field:

...
    if ((ptr = strtok(NULL, ';') != NULL)
        contacts[i].fieldName = strdup(ptr);
    ...

Don't forget to correctly clean up your unused ENTRY objects by free()'ing the fields!

I did not quite understand the details of previous post. Here is my malloc version of code, still I did not figure out how to get rid of the array limit MAXDATA of number of records.

Check for extraneous fields in line not yet implemented.

#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <malloc.h>
#define MAXDATA 2000

struct entry {
	char *first_name;
	char *last_name;
	char *date_of_birth;
	char *email;
	char *phone;
	char *address;
	char *city;
	char *zipcode;
};

int main(int argc, char* argv[])
{
    char buffer[BUFSIZ];
    char* payload[MAXDATA];
    char* filename;
    int i, index;
    char* result;

    struct entry contacts[MAXDATA];
    struct entry* current;
    FILE* myfile;

    // with one parameter use it for input file, otherwise use "data.csv" default
    filename = (argc == 2)? argv[1] : "data.csv";

     myfile = fopen(filename, "r");

    if(!myfile) {
        printf("%s: \"%s\2", strerror(errno), filename);
        return 1;
    }

    printf("Reading from the file \"%s\"...\n", filename);
    for(i = 0; fgets(buffer, 255, myfile) != NULL && i < MAXDATA; i++) {
            current = &contacts[i];
            payload[i] = malloc(strlen(buffer) + 1);
            strcpy(payload[i], buffer);
            //printf("Payload copied: %s", payload[i]);
            result = strtok(payload[i],  ";");
            for(index = 0; index < 8 && result != NULL; index++) {
                // pass spaces in front
                while(*result == ' ') {
                    *result = '\0';
                    result++;
                }
                switch(index){
                    case(0):
                        current->first_name = result;
                        break;
                    case(1):
                        current->last_name = result;
                        break;
                    case(2):
                        current->date_of_birth = result;
                        break;
                    case(3):
                        current->email = result;
                        break;
                    case(4):
                        current->phone = result;
                        break;
                    case(5):
                        current->address = result;
                        break;
                    case(6):
                        current->city = result;
                        break;
                    case(7):
                        current->zipcode = result;
                        break;
                }
                if(index < 8) result = strtok(NULL, ";");
            }
            if(index < 8) {
                printf("\nData reading failed in field number %d, record %d:\n\"%s\"\n", i + 1, index, result);
                return 2;
            }
            else printf("Record %d OK\nZIP was %s\n", i + 1, current->zipcode);
    }
    printf("\n%d names entered.\n---------------------\n", i);

    // printing various info for test
    for(index = 0; index < i; index++) {
        printf("\n%s, %s (born %s)\n%s (%s)\nphone: %s\n",
               contacts[index].last_name,
               contacts[index].first_name,
               contacts[index].date_of_birth,
               contacts[index].address,
               contacts[index].city,
               contacts[index].phone);
    }
    // deallocation
    for(index = 0; index < i; index++) {
        free(payload[index]);
    }

    return 0;
}

Edited 5 Years Ago by pyTony: n/a

Two things to consider:

1) Include your "char *payload;" as a member of your struct entry, rather than maintaining it in a separate array, especially since you're having your other struct members reference into it. At the end, your deallocation becomes

free(contacts[index].payload);

2) As far as getting away from the MAXDATA constant and static allocation, take a look at function "realloc" (it's in the same family with malloc() and free()). While it's time-inefficient to grow an array for each new element, and space-inefficient to have an array much bigger than needed, an approach I've seen here and there (and now use myself as needed), is a combination: start with "a few" items in the array, and when you run out, grow the array by "a few" more. If you have a sense how the array is likely to be used, you can pick a sensible constant number for "a few" (maybe 10 for your case?). If you can't determine ahead of time how fast a program is likely to need more, or how many it might want over all, increasing by a factor works (e.g. 2 to double the length of the array each time, something more like 1.2 or 1.5 to grow by less, but more often). Then in addition to "i" (the number of entries you've populated), you need one more scalar variable "num_contacts_allocated" (to replace MAXDATA) so you can tell when i catches up to it and you need to grow again:

for(i = 0; fgets(buffer, 255, myfile) != NULL; i++) {
    if (i == num_contacts_allocated) {
        struct entry * grown_contacts = (struct entry *)realloc(contacts, (num_contacts_allocated + 10)*sizeof(struct entry));
        if (grown_contacts == NULL) {
            fprintf(stderr, "out of memory!");
            /* stop reading input and continue with program */
            break;
        }
        contacts = grown_contacts;
        num_contacts_allocated += 10;
    }
    current = &contacts[i];
    ...

still I did not figure out how to get rid of the array limit MAXDATA of number of records.

Maybe that's a hint that you shouldn't be using an array to store records. ;) For an address book, searching is the primary task, so a data structure that supports dynamic growth along with fast searching and the capability of alphabetical ordering would be ideal.

Might I suggest a skip list? They're a nice alternative to balanced trees when you might have ordered insertions (likely with an address book) and don't want to write a lot of code but still want the benefits of a randomized tree.

This article has been dead for over six months. Start a new discussion instead.