How to read different files and store some data from them to a single file

Question

okwy 0 Newbie Poster

14 Years Ago

his is a follow up to the question I asked earlier and with the help of some people here I was able to start up with the function I want to write,but I am yet to complete it. Here is my earlier question: I have a series of files with the extension (.msr), they contain measured numerical values of more that ten parameters which ranges from date,time,temperature, pressure, .... that are separated by semi colon. The examples of the data values are shown below.

2010-03-03 15:55:06; 8.01; 24.9; 14.52; 0.09; 84; 12.47; 2010-03-03 15:55:10; 31.81; 24.9; 14.51; 0.08; 82; 12.40; 2010-03-03 15:55:14; 45.19; 24.9; 14.52; 0.08; 86; 12.32; 2010-03-03 15:55:17; 63.09; 24.9; 14.51; 0.07; 84; 12.24;

Each of the files have as a name REG_2010-03-03,REG_2010-03-04,REG_2010-03-05,... and they are all contained in a single file.

I want to extract from each of the file the date information which in this case 2010-03-03, column 3 and column 6.
Find the statistical mean of the each of the columns of 3 and 6. 3.Then store the results in a new file which will only contain the date,and the calculated mean of the columns above for further analysis.

My question now:
I want to to be able to open the first file(source) which contains 30 files with extension of .msr . I want to open the source file, then for each file inside it, to extract the informations needed as I have explained earlier and for each file read above to store the date (uniform in each file) and the mean value of column 3 and 6 in a single file.Thus the destination file will contain at each line three rows which are the date, mean(3rd column) and mean(6th column) separated by space making it a total of 30 rows. Below is the code I started with and would appreciate your guide on how to implement this.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int file_getline_analyse(char *infile,char *outfile,char *path,char *strline) {

int return_value=0;
FILE *fd=NULL;    // pointer for data source
FILE *fo= NULL;   // Destination file
char *file_path=NULL;     

char *date, *tmp,*time;
double sum, mean = 0;
file_path=calloc((strlen(path)+strlen(infile)),sizeof(file_path));   
if (file_path==NULL) {
    printf("file_path in get_line\n");
    exit(EXIT_FAILURE);
}

strcpy(file_path,path);    // copies the path entered in the function call to the allocated meomory 
strcat(file_path,infile);  // concatenates the contents of the  allocated meomory from the source file

fd=fopen(file_path,"r");

fo = fopen(outfile, "w");

if((fd==NULL) && (fo==NULL))  {
    return_value = -1;
}
else {
    int i=0;
    int j=0;
    while ((fgets (strline, BUFSIZ, fd))>0){
        date = strtok(strline, " ");
        time=strtok(NULL, " "); // skip over time
        tmp = strtok(NULL, ";");
        if (i == 3|| i == 6) { // get only the 3rd and 6th value
            sum += strtod(tmp, NULL);
            ++i;
            if(j== '\n') {
                // Replacing the characters at the end of the line by 0:
                char *p = strchr (strline, '\n');
                if (p) {
                    *p = 0;
                }
                return_value = 0;
                break;

            }
            j++;


        }

        mean = sum/(double)(j+1);

        fprintf(fo,"%s: %.2f\n", date, mean);

    }
    fclose (fd);
    fclose(fo);
}

free(file_path);
file_path=NULL;

return return_value;

c c++

Edited 12 Years Ago by mike_2000_17 because: Fixed formatting

2 Contributors
12 Replies
203 Views
4 Days Discussion Span
Latest Post 14 Years Ago Latest Post by Adak

Adak 419 Nearly a Posting Virtuoso

14 Years Ago

You'll find this much easier if you use a bit of top-down design here, especially:

Put Off The Details

Get the basic control and flow working, then get your details going, LATER.

You have a source file, and it has 30 names of other files in it, or what? Because your source file won't have 30 other files, in it. ;)

please get in the habit of always putting your code between [code] tags - otherwise your code, won't look like code, at all. The forum editor will turn it into html text, and it's ugly to study, then. The code tags icon is at the top of the editor window.

Prepare and post a 5 line sample file, in a properly formatted source file, so it can be studied. Include 5 msr files, also (just small one's). Your descriptions are helpful, but not what will be needed.

Make a list of what your program needs to do, from start to finish:

1) Open the master file
2) Read all the other msr filenames
3) Open each msr files
4) extract data 1 (specify the data)
5) extract data 2 (and specify it again-details are important to know, but not code, now)
6) list the computations you need to make on the data (mean, etc.)
7) output to file certain data (list specifics)
8) close msr files
9) close master file

Short, and punchy. This is just an example, of course. Each numbered item should be a functional SOMETHING, and small. Each item may wind up being a function, of it's own, in the program.

Edited 14 Years Ago by Adak because: n/a

Adak 419 Nearly a Posting Virtuoso

14 Years Ago

Good.

For #1, will the msr files always be found in the same directory, or do we need to search for them, on the flash drive?

What is your operating system for this program, and what compiler and IDE are you going to be using?

For the data:

The first column is '2', in each record. If you need the date, please refer to it as the first field, instead of the first column.

I confuse easily. ;)

The fields for one record are below, and you need the fields that are bold typed, right?

2010-03-07 00:00:11; 70.68; 15.8; 5.66; 0.00; 86; 9.47; 9,55; 1,089; 48,5; 221,59; 372,6; 16.4; 0

On the mean calculations, do you need the mean from one msr file's data, or from ALL the msr files data?

Do the msr files vary in length a great deal, or are they close to being the same size, each time?

And #10 you can cross off - directories or folders are just locations, and don't need to be closed.

And you're welcome. ;)

Adak 419 Nearly a Posting Virtuoso

14 Years Ago

1.Yes all the msr files are in the same folder in the USB key directory.
2.The OS is linux,compiler is gcc, IDE is debian.
3.On data, Your right its the first field as you highlighted above.
4.Mean is calculated for each msr data seperately.(ie for all the msr data, calculate the mean in each file) thats what will be written to the file with thir respective dates.
5. Their lengths are close

Yes, I am practicing my counting <j/k> ;)

1) You know how to compile and run a simple program on your system? I ask because your linux distro is Debian, not your IDE, and I have not used gcc myself.

2) Separately is good, that makes it easier.
3) I'm always right -- c'mon! (Except when I'm not) ;)
4) Two means get written to the output file per intake msr file. How many of the #1 field's do we write out to the output file?

What's the format for the output?

Field #1: <which date??> Mean#1: <double var> Mean#2: <double var>
??

First and last dates, maybe?
From: <first date>

To: <last date>

Mean#1: <double var>

Mean#2: <double var>

??

Each output file will have just a very small amount of data, then?

Oh! Before I forget, when you post any code, be sure to put it between [code] tags, so the forum will keep the code looking like code, instead of html text.

The [code] tags icon is at the top of the editor window.

That will end the study part. Now we look at your code.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

okwy 0 Newbie Poster · Answer 1 · 2010-08-02T17:36:19+00:00

Thank you for your directive.
Just as you outlined above.
Here is the outline of what I want to achieve

1) Open the directory that contains the files(here is USB KEY).
2) Read all the msr filenames inside it.
3) Open each msr files.
4) Extract the date (its the first column in the file),ignore the time and the separator(;)
5) extract data 1 (data at the 3rd column)
6) extract data 2 (data at the 6th column)
7) Calculate the mean for 3rd column and 6th column.
8) output to file (date,mean 3rd column,mean 6th column)
9) close msr files
10) close the directory(if possible)

I attached here 5 samples of the files.
Thank you

okwy 0 Newbie Poster · Answer 2 · 2010-08-03T05:11:17+00:00

1.Yes all the msr files are in the same folder in the USB key directory.
2.The OS is linux,compiler is gcc, IDE is debian.
3.On data, Your right its the first field as you highlighted above.
4.Mean is calculated for each msr data seperately.(ie for all the msr data, calculate the mean in each file) thats what will be written to the file with thir respective dates.
5. Their lengths are close
Thank you

okwy 0 Newbie Poster · Answer 3 · 2010-08-03T16:24:36+00:00

1. Yes I know how to compile and run the programme in my system.
2. The mean is calculated once for each file.There is only one output file where each line contains the date, mean column 3 and mean column6 of each file.
3. Just as you pointed out,the ouput format will be like

Field #1: <date file1> Mean#1: <double var> Mean#1: <double var>
Field #2: <date file2> Mean#:2 <double var> Mean#2: <double var>
Field #3: <date file3> Mean#3: <double var> Mean#3: <double var>
.
.
.
Field #lastl: <date last_file> Mean#lastl:<double var> Mean#lastl: <double var>
This the code I am working with at

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int file_getline_analyse(char *source,char *dest,char *path) {

		int return_value=0;

		FILE *fd=NULL;
		FILE *fo= NULL;
		char *file_path=NULL;
		char *strline=NULL;
		char *date, *tmp;
		double sum=0;
		double mean = 0;

		
                        strline=calloc(MAX_BUFFER_SIZE,sizeof(strline));  // momory for holding the character string that will be read from the source file
			if (strline==NULL) {
				printf ("Error calloc strline.................");
				exit(EXIT_FAILURE);
			}

		file_path=calloc((strlen(path)+strlen(source)),sizeof(file_path));
		if (file_path==NULL) {
			printf("file_path in get_line\n");
			exit(EXIT_FAILURE);
		}

		strcpy(file_path,path);
		strcat(file_path,source);

		fd=fopen(file_path,"r");    
		if(fd==NULL)                 //opens the file for reading operations
		{
			printf("First file creation.\n");
				
		}

		fo = fopen(dest, "a");  

		if( (fo==NULL))  {
			printf("first start.\n");  //Check whether the file already exist
				fopen(dest,"w")	;
		}

		else{

			int i=0;
			int j=0;
			while ((fgets (strline, BUFSIZ, fd))>0){
				date = strtok(strline, " ");
				  strtok(NULL, " "); // skip over time(field two on first column)
				tmp = strtok(NULL, ";");
				if ((i == 3)||(i==6)) { // get only the 3rd and 6th value
					sum += strtod(tmp, NULL);
					++i;
					if(i== '\n') {
						// Replacing the characters at the end of the line by 0:
						char *p = strchr (strline, '\n');
						if (p) {
							*p = 0;
						}
						return_value = 0;
						break;

					}
					j++;


				}

				mean = sum/(double)(j+1);

				fprintf(fo,"%s: %.2f\n", date, mean);

			}
			fclose (fd);
			fclose(fo);
		}
                
 		free(strline);
		free(file_path);
		file_path=NULL;

		return return_value;
	}

Adak 419 Nearly a Posting Virtuoso · Answer 4 · 2010-08-03T17:10:02+00:00

I don't know who helped you with the code previously, but they were quite good, imo.

Questions for you:

My compiler is C89, and won't handle this style of variable declarations. Can you set your compiler to format the code for C89 style declarations in your options menu?

If so, please do that.

Since your current code looks so good, at what point is it not doing what you need? I was going to start from almost scratch, but now that I see the program more closely, there's no reason to do that.

Can you list what the current program does RIGHT from your numbered list? And if you can't re-format the code, I can do it by hand.

TTYou Soon. Thanks for posting the program in code tags - I can study it a lot better.

P.S.
In File #1:

Record #1: field #1 < >  field #2 < > field #3 < >
Record #2: field #1 < >  field #2 < > field #3 < >
Record #3: etc.

Each row is another record, and field numbers start with 1 or zero, again.

okwy 0 Newbie Poster · Answer 5 · 2010-08-03T18:35:59+00:00

Thank you Adak for your words on my code. I asked questions in the forum like this and people gave me insight on how to get started, so I read other peoples code and was able to get something similar to what I wanted.
Then what the program does is that:
1. it can be able to open just a single file, which will be passed to the function as one of the arguments.
2.

#
if ((i == 3)||(i==6)) { // get only the 3rd and 6th value
#
sum += strtod(tmp, NULL);

The sum here returns only the sum of both the 3rd and 6th column as single value.

Then What I want it to do:
1. I want instead of passing the second argument in the function (ie dest), I use the folder name.The function will then read the first file inside the folder, do the calculations as above and then pass to the second file do the same,3rd,...,lastfile. And then store for each file the date of the file(1st column),mean value(3rd column),mean value(6th column).
## Note that all the date field in each file is the same, so each row in the destination file represents a record from one file. ie if I have 30 files in a month, I should have 30 lines in the destination file.

2.Since this calculation will be performed everyday for each new added data file, I want the the function to ignore all the already treated files in the same folder and extract only the information from the last file added.

3.In the destination file, I want to keep only the records of last 30 days calculated starting from the present day.(ie if 30 july is the present day,I want to keep records of 1st to 3oth july. On 31st july, keep records of 2nd to 31st july, and so on).

4. On your question regarding the field numbers, I do not know if get you very well, but, below is the example of what I want to have in the destination file.

2010-3-1 60.23 6.10
2010-3-2 63.53 6.20
2010-3-3 62.23 6.50
2010-3-5 61.25 6.10
.
.
.
2010-3-30 61.25 6.10

Thank you so much for your time and help.

Adak 419 Nearly a Posting Virtuoso · Answer 6 · 2010-08-03T22:01:32+00:00

Of course, I'm on Windows, and wanted to try and do this with Turbo C, which is a mess, because it wants to let me get just old DOS 8.3 file names, instead of the long one's.

So I've been having quite a fun of fun trying to work around that. The funny answer is to have TC call a bat file, then the bat file calls the right cmd.exe (command file), and THEN I can get my long file names (after boldly leaping through a few more hoops). Ah me! ;) ;)

You know until 2 weeks ago, I had a Ubuntu system up and running - that's just weird! (Ubuntu is based on Debian)

Anyway, I have a version that now opens the second or inner file, and reads each line, in that file. I'll finish roughing it out, tomorrow.

I'm putting off the details for now, until the big stuff is up and working OK.

On the averages - if our data is:

field #3 are: and field #6 is:

50.00             8.0
51.00             8.2  
49.00             7.8

Would our output averages be:

50.00             8.0

or are you averaging field #3 together with field #6?

If it's the latter, you're going to have to give me an example. I wasn't expecting that kind of a merge of field data.

Given the above numbers (simplified I know), what would the correct output be if they're merging together, somehow?

TTYL

okwy 0 Newbie Poster · Answer 7 · 2010-08-04T05:23:42+00:00

Yah your quite right there, actually I am running on ubuntu 10.4, just verified the compiler I am using and it is C99.
Regarding the mean,the output as you indicated above is what I want.It cannot be averaged together.

okwy 0 Newbie Poster · Answer 8 · 2010-08-04T20:29:03+00:00

The output is exactly what you did above.There will be no merging together of the mean.

Adak 419 Nearly a Posting Virtuoso · Answer 9 · 2010-08-05T21:46:05+00:00

This is not a polished program, consider it a starting point. The "penguin boys" can give you a hand with the shell commands that it will need, for Linux. (They probably can be done in C, but not on either of my compilers [which are for Windows and DOS respectively]).

I used Turbo C for this, BUT didn't use any TC specific extensions, that you couldn't use in Linux. For that reason, this program requires all the filenames to be DOS compatible (8.3) maximum length. Your version on Linux, will have no such trouble, so don't worry about that.

These are the five files I worked with:
10_03_03.MSR
10_03_04.MSR
10_03_05.MSR
10_03_06.MSR
10_03_07.MSR
which you provided, after being shortened. Just tested it with the longer filenames, in the shell, and Windows XP handles it OK, using some behind the curtain work with re-naming of it's own. So these filenames:

Reg_10_03_03.MSR
Reg_10_03_04.MSR
Reg_10_03_05.MSR
Reg_10_03_06.MSR
Reg_10_03_07.MSR
all worked, although they're longer than 8.3 limit.

They were located in a known path and filename. This program can be run from anywhere it can access the command shell.

This is the output from the five files:
2010-03-03 23.231148 86.688525
2010-03-04 22.540984 88.180328
2010-03-05 20.672131 88.688525
2010-03-06 20.670492 89.672131
2010-03-07 15.526230 88.475410

The first field is the date from the first row of the file, the next two fields are the averages for the data in field #3, and field #6, respectively. The output can be set for any directory you want.

The program is:

/* 
Uses the OS to get a list of all the *.msr files, output to 
the msrdir.txt. Then it reads the msrdir.txt file, and:

1) opens the first file listed

2) from the first row of this second file, it gets the date

3) from every row of data, it gets fields #3 and #6
   a) and computes an average for these two fields, in that file

4) when all data in a file has been read, it outputs the date 
   and the two averages, into the msrdat.txt file.

5) then it closes the current file, and opens the next file listed 
   in the msrdir.txt file

All the msr files must be in the same directory. The program can
be in any directory which has access to the command shell.

*/

#include <stdio.h>
#include <string.h>
#include <math.h>


/* only forces floating point formats to load -- this is
   a quirk of TC, you won't need this
*/
static void forcefloat(float *p)
{ float f = *p;
  forcefloat(&f);
}
int main() {
  int len, count; 
  double f3=0.0, f6=0.0;
  double avg3=0.0, avg6 = 0.0,sum3, sum6;
  FILE *fpin1, *fpin2, *fpout;
	
  char line1[120]= {'\0'};
  char line2[120]= {'\0'};
  char filename[120]= {"*.msr /B /A:-d >C:\\TC\\msrdir.txt"};
  char dir1[120]= {"dir "};
  char dir2[120]={"C:\\TC\\Regfiles\\"};
  char dirCopy[120]={"C:\\TC\\Regfiles\\"};
  char date1[11];
  char time1[10];
		
  /*A bat file can call the newer windows cmd file, so long filenames 
    are OK, then get the *.msr dir into the text file, without headers,
    directories, or summary info - just filenames. <slick!>

    I removed it because long file names are not a problem in linux.
  */

  printf("\n\n\n");
  strcat(dir1,dir2);
  strcat(dir1,filename);
	
  system(dir1);  /* output is to msrdir.txt */
  printf("Command is: %s", dir1);
	
  fpout=fopen("C:\\TC\\msrdat.txt", "wt");
  fpin1=fopen("C:\\TC\\msrdir.txt", "rt");
  if((fpin1==NULL)) { //|| (fpout==NULL)) {
    printf("\n Error opening first input file - terminating\n");
    getchar();
    return 1;
  }
  if((fpout==NULL)) { //|| (fpout==NULL)) {
    printf("\n Error opening output file - terminating\n");
    getchar();
    return 1;
  }
  /* read a file name from the master file */
  while((fgets(line1, sizeof(line1), fpin1)) !=NULL)  {
    count=avg3=avg6=sum3=sum6=0;
         //printf("\n%s", line1); //for debug only
    len=strlen(line1);
    if(line1[len-1]=='\n')  //if a newline char is present
      line1[len-1]='\0';     //delete it
    strcat(dir2, line1);
         //printf("\ndir2 is: %s", dir2);

    /* open the listed file name #2*/
    fpin2=fopen(dir2, "rt");
    if(fpin2==NULL) {
      printf("\n Error opening second file - terminating\n");
      getchar();
      return 1;
    }
    /* read all entries in file name #2 */
    while((fgets(line2, sizeof(line2), fpin2)) !=NULL) {
         //printf("\n %s", line2); //for debug only
      if(count==0) {
        sscanf(line2, "%s", date1);
      }
      sscanf(line2+27, "%lf", &f3);
      sscanf(line2+46, "%lf", &f6);
      ++count;
      sum3+=f3;
      sum6+=f6;
      avg3 = (sum3/count);
      avg6 = (sum6/count);
         //printf("\n%s %s %lf %lf %lf %lf",date1,time1,f3,f6,avg3,avg6);
    }
    fclose(fpin2);
    printf("\n%s %lf %lf", date1,avg3,avg6);
    fprintf(fpout, "%s %lf %lf\n", date1, avg3, avg6);
    len=0;
    line2[0]='\0';
    line1[0]='\0';
    date1[0]='\0';
    time1[0]='\0';
    strcpy(dir2, dirCopy);
    f3=f6=0.0;
  }
  fclose(fpin1);
  fclose(fpout);

  printf("\n\n\t\t\t     press enter when ready");
  count = getchar(); 
  return 0;
}

A good deal of the odd code is to handle data files in another directory and/or drive. It's awkward for TC, (which has it's own extensions to handle this), but this should be an easy code to transfer over to Linux.

Something odd I noticed, was in the IDE, the editor would take the line of data, that was being put into a char array, and automatically replace the '\t' (tabs), with the set number of spaces assigned to a tab, in the editor! Since this data had a tab after nearly every data field, it really had me bewildered.

I have NOT tested the accuracy of the averages.