954,498 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Sorting A Csv file

hello friends,

I have generated a csv file with 27 columns.
third column is of time and i want to sort all the data according to this column in ascending order.
so can you please tell me how to do this.?

i have no clue where i should start.

i am fetching 18 columns from original csv file(say a1) using fscanf() and then calculating some variables and inserting it to another file(say a2) using fprintf();

the a2 file need to be sorted according to its third column.

regards,

rjbrjb777
Light Poster
42 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

You can:
- Define a struct that has time stamp and the whole string (as read from file) as members.
- Define an array of this struct.
- Write a function to read the file and populate the array.
- Write a function that can sort the array of structs, by comparing the timestamps.
- Write the array back into a2.

I guess you can figure out how to write the rest.

If you're dealing with huge files, this might be bad from mem usage pov. In that case the struct can store the line number from original file instead of the whole line itself

thekashyap
Practically a Posting Shark
811 posts since Feb 2007
Reputation Points: 254
Solved Threads: 75
 

Standard practice would be to sort such data through an index or pointer array. I'll describe how to use an index:

Make an int (unsigned long if needed), with at least as many elements as there are rows of data in the file to be sorted.

Now initialize the index[] (array), so each number, matches it's current position number, in the index[]:

for(i=0;i<SIZE;i++) //SIZE is the number of elements in the index[]
  index[i] = i;     //simple, no?


Now, you're ready to start sorting, but instead of sorting (comparing and moving) the actual data, you'll make both of these, by going through your index. If you haven't seen it before, it's a bit confusing.

<< NO data is moved, whatsoever >>

Upon completion (and it's fast), you can print out (on screen, or to file, etc.), all the rows, in sorted order.

You guessed it - by using the now sorted, index. Here's what the final print out to screen might look like:

for(i=0;i<SIZE;i++)
  printf("%5d   ", data[index[i]]);


This is an example of Insertion sort, using an index array:

/* Demonstrates Insertion Sort by using an index[] array.

   Status: OK 
   Adak December 20, 2010

   Requires a file named "soup.txt" with this format:
Tomato 2.28 245678
Broth  1.60 313926
Pea 2.35 455092
Stew 3.85 210420
Noodle 2.41 110288
Vegetable 2.33 699240
*/

#include <stdio.h>
#include <string.h>
#define SIZE 15

void sort(float num[], int lo, int hi);
 
int main() {
  int i,ok; 
  float num[SIZE];
  char str[25];
  unsigned long num_id;
  FILE *fp;

  printf("\n\n\n");
  fp=fopen("soup1.txt", "r");
  if(fp==NULL) {
    printf("Error opening file - Correct file name is soup.txt \n");
    printf("File format is up to SIZE rows with 3 columns:\n\
    Name (1 word) Price (float) and 6 digit unsigned long product ID number\n");
    printf("    Column separators are spaces\n");
    return 1;
  }
  i=ok=0;
  printf("\n Name        Price   Product ID\n");
  printf  (" =======================================\n");
  while(1) {
    ok=fscanf(fp, "%s %f %lu%*c", str, &num[i], &num_id);
    if(ok < 1)
      break;
    printf(" %-10s %6.2f   %lu\n", str, num[i], num_id);
    ++i;
  }
  fclose(fp);
  putchar('\n');
  sort(num, 0, i);  //sort num[], through the index

  printf("\n\n\t\t\t     press enter when ready\n");
  
  (void) getchar(); 
  return 0;
}
/* Insertion sort, through an index[] array */
void sort(float A[], int lo, int hi) {
  int i, j;            //the indeces for the array locations
  float val;           //sorting floats here
  int idx[15];         //the index array
  for(i=0;i<SIZE;i++)  //initialize the index array
    idx[i] = i;
    
  printf("\nThe Prices of Soups in their Original Order:\n");
  for(i=lo;i<hi;i++)     printf("%5.2f   ",A[idx[i]]);   getchar();

  for(i=lo+1;i<hi;i++) {  
    val = A[idx[i]];      //get a value of A[], through the index array
    j = i-1; 
    while(A[idx[j]] > val) {  //the fast insertion sort "shuffle" 
      idx[j + 1] = idx[j];    //only the index int's are moved
      --j;                    //look Ma! no temp variable ;)
      if(j<0) break;
    }   
    idx[j+1] = i;             //drop it into place
  }
  printf("\nThe Prices of Soups After Insertion Sort:\n");
  for(i=lo;i<hi;i++)     printf("%5.2f   ",A[idx[i]]);   
  getchar();

}
/* Just an example of an Insertion sort function, without an index array.
void insertionSort(int A[], int lo, int hi) {
  int i, j, val; 
    
  for(i=lo+1;i<hi;i++) {  
    val = A[i];
    j = i-1;
    while(A[j] > val) {
      A[j + 1] = A[j];
      --j;
      if(j<0) break;
    }   
    A[j+1] = val;
  }
}
*/
Adak
Nearly a Posting Virtuoso
1,479 posts since Jun 2008
Reputation Points: 425
Solved Threads: 185
 

thank you all for attention.

well i have understood your code. it works i want another advice.
i need to sort all the data to be sorted with respect to that column.

for ex,
the original file is like :

abc def 4.5 jkl mno
abc qwe 1.0 uio iop
def abc 3.6 iop nmb

should be like :

abc qwe 1.0 uio iop
def abc 3.6 iop nmb
abc def 4.5 jkl mno

then what changes do i need to make.?

rjbrjb777
Light Poster
42 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 
he original file is like :

abc def 4.5 jkl mno
abc qwe 1.0 uio iop
def abc 3.6 iop nmb

should be like :

abc qwe 1.0 uio iop
def abc 3.6 iop nmb
abc def 4.5 jkl mno

then what changes do i need to make.?


One way to do it:

make a char buff[100] array, to hold one line of text
make a float data[SIZE][COLS]; //fit your sizes for rows and columns.Leave room for 
end of string char and newline char, in your column number  
still have index[SIZE];

while((fgets(buff, sizeof(buff), filePointerName)) != NULL) {
  ok = sscanf(buff, "%*s %*s %f", floats[i]; //skip the 2 leading strings, store the float number into an array of floats
  if(ok > 0) {
    strcpy(data[i], buff);//copy buff string (including float), into data[i]
    ++i;
  }
}

You'll be sorting ON the floats[] array, using the index[]. The index array will be sorted, using the floats[] numbers.

After the sorting, the entire string can be printed out, through the index array (as shown in my example program, but now you want to print out data[index[i]] (the entire string from that row of data), instead of just the numbers, as I did, above.

Adak
Nearly a Posting Virtuoso
1,479 posts since Jun 2008
Reputation Points: 425
Solved Threads: 185
 

okay thank you..
but is it possible if i have data like following..
01 2.3 23 25 32.23
25 2.5 25 14 14 12
85 5.6 12 14 58 89
and so on..

i want to sort on first column data..
i take first column in floats array and rest whole string
in data[]..

float floats[100];
char data[100];

i.e. sscanf(buff,"%f %s\n",floats[i],data[i]);

and then in sort function i write
float(floats,data, 0,i) and sorting as u said earlier????

i tried this but they are giving me segmentation fault..
i guess problem is because my data array is of char type..

and second problem is, instead of the whole row..only second value is being stored in data[i]..
do i need to write %s*%c ??

rjbrjb777
Light Poster
42 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

help me..
why data and floats are not being passed in sort function???

#include <stdio.h>
#include <string.h>
#define SIZE 15
#define COLS 20

void sort(float num[],char data[], int lo, int hi);
int main() 
{

int i,ok;
float num[SIZE];
char str[25];
char buff[100];
float BUFF[100];
float floats[100];
char data[100];
int j=0;
float num_id;
FILE *fp;
printf("\n\n\n");

fp=fopen("address.csv", "r");

if(fp==NULL) 
{
	printf("Error opening file - Correct file name is soup.txt \n");
	printf("File format is up to SIZE rows with 3 columns:\n\
Name (1 word) Price (float) and 6 digit unsigned long product ID number\n");
	printf(" Column separators are spaces\n");
	return 1;
}

i=ok=0;

while((fgets(buff, sizeof(buff), fp)) != NULL) 
{

ok = sscanf(buff, "%f %s\n", &floats[i],&data[i]); 
printf("data: %s\n",data[i]);
printf("num :%f\n",floats[i]);
	++i;
}


fclose(fp);

putchar('\n');

sort(floats, data, 0, i); 

(void) getchar();

return 0;
}



void sort(float A[], char D[], int lo, int hi) 
{
int i;
printf("floats:%f\n",A[i]);
printf("data:%s\n",D[i]);
i++;
}


:(

rjbrjb777
Light Poster
42 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 
help me.. why data and floats are not being passed in sort function???


They are.
You print statements are wrong, so if you expected a certain output, you won't get it.
You are only printing 1 single value from bothA and D. But since i has no defined value, it could be literally anything. It might even be displaying A[2653491554]. I'm surprised your program didn't blow up.

WaltP
Posting Sage w/ dash of thyme
Moderator
10,506 posts since May 2006
Reputation Points: 3,348
Solved Threads: 944
 

okay..lets just not consider function right now..
i just want first column data into floats.. and rest all data in data[i]..
and if i've written this program

include <stdio.h>
include <string.h>
define SIZE 15
define COLS 20

int main()
{

int i,ok;

float num[SIZE];

char str[25];

char buff[100];

float BUFF[100];

float floats[100];

char data[100];

int j=0;

float num_id;
FILE *fp;

fp=fopen("address.csv", "r");
if(fp==NULL)
{
printf("Error opening file - Correct file name is soup.txt \n");
printf("File format is up to SIZE rows with 3 columns:\n\
Name (1 word) Price (float) and 6 digit unsigned long product ID number\n");
printf(" Column separators are spaces\n");
return 1;
}

i=ok=0;
 
while((fgets(buff, sizeof(buff), fp)) != NULL)
{
printf("buff:%s",buff);

ok = sscanf(buff, "%f,%s\n", &floats[i],&data[i]);

printf("data: %s\n",data[i]);
printf("num :%f\n",floats[i]);
++i;
}
fclose(fp);

putchar('\n');
(void) getchar();
return 0;
}


assuming my address file is..

23,2.6,65
2.3,26,3.6
5.3,56,23

is there any mistake in sscanf syntax?
i am getting segmentation fault.

rjbrjb777
Light Poster
42 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

Yes, one problem is that data[] is just a ONE dimension array. When you put your string into it, then it's full, and that's all it can hold:

printf("data: %s\n",data[i]);


Makes no sense. data[i] is one single char, not a string (%s), see what I mean?

If you want data[] to hold more strings, then you need to make it a 2D array:
data[][], where the first dimension is the rows, and the second dimension can be viewed as a column within the row. Then data[i] could indeed hold an entire string.

I'm not sure of what you need, because I don't know what you want to do with the data, yet.

Adak
Nearly a Posting Virtuoso
1,479 posts since Jun 2008
Reputation Points: 425
Solved Threads: 185
 

okay.. i got u..
well i want to sort the CSV file according to first column..

so what i was thinking that take first column data in floats array.
and rest whole data in other data array..
then will do bubble sort for both of this array...

any other simple way to do this.?

rjbrjb777
Light Poster
42 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

to make it more clear let me tell u..
suppose my file is in format..

time,day,year,AET,PET
2.5,23,2011,7.8,9.0
15.5,56,2011,8.9,0.9
1.0,78,2011,8.0,9.0
1.5,34,2011,9.0,1.2

i want to sort according to first column and file should be like

time,day,year,AET,PET
1.0,78,2011,8.0,9.0
1.5,34,2011,9.0,1.2
2.5,23,2011,7.8,9.0
15.5,56,2011,8.9,0.9

rjbrjb777
Light Poster
42 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

okay.. i got u.. well i want to sort the CSV file according to first column..

so what i was thinking that take first column data in floats array. and rest whole data in other data array.. then will do bubble sort for both of this array...


That will work fine. During the swap, you swap both the float valuesand the 'other data' values.

WaltP
Posting Sage w/ dash of thyme
Moderator
10,506 posts since May 2006
Reputation Points: 3,348
Solved Threads: 944
 

okay..
i made data as two dimensional array.. still i am getting no result..
is there any problem with my sscanf syntax..?? as far as i googled examples for sscanf it is write..

#include <stdio.h>
#include <string.h>
#define SIZE 15
int main() 
{

int i,ok;
float num[SIZE];
char str[25];
char buff[100];
float BUFF[100];
float floats[100];
char data[100][100];
int j=0;
float num_id;
FILE *fp;
printf("\n\n\n");

fp=fopen("E:\\address.csv", "r");

if(fp==NULL) 
{
	printf("Error opening file - Correct file name is soup.txt \n");
	printf("File format is up to SIZE rows with 3 columns:\n\
Name (1 word) Price (float) and 6 digit unsigned long product ID number\n");
	printf(" Column separators are spaces\n");
	return 1;
}

i=ok=j=0;

while((fgets(buff, sizeof(buff), fp)) != NULL)
{

ok = sscanf(buff, "%f,%s", &floats[i],&data[i][j]);
printf("data: %s\n",data[i][j]);
printf("num :%f\n",floats[i]);
	++i;
	++j;
	getch();
}


fclose(fp);

putchar('\n');

(void) getchar();

return 0;
}
rjbrjb777
Light Poster
42 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

You still have that data[i][j] stuff in there.

LOSE THE DANG [j]! ;)

Maybe not with this small a program, but try to come up with better names than two arrays named "buff" and "BUFF". Maybe buff1 and buff2? Capitals are normally used for macro's (see SIZE?), and globals (Globalvariable, (also typedef's, if needed).

Adak
Nearly a Posting Virtuoso
1,479 posts since Jun 2008
Reputation Points: 425
Solved Threads: 185
 

well that would be fine..
i am not using BUFF anyways..
i jus want to know what is the problem...
why string is not being stored in data[i][j]??

rjbrjb777
Light Poster
42 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

Because data[i][j] is just ONE element, big enough for ONE char.

The string goes into data[i] (which designates the row with i).

Print up some data[i] and see what the contents are, after you change it as above.

Adak
Nearly a Posting Virtuoso
1,479 posts since Jun 2008
Reputation Points: 425
Solved Threads: 185
 

now i am confused totally..
can i store the whole string in data[i] or not??

rjbrjb777
Light Poster
42 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

In data[i]? Yes! ;)

In data[i][j]? No! Only one char will fit into data[i][j]

Adak
Nearly a Posting Virtuoso
1,479 posts since Jun 2008
Reputation Points: 425
Solved Threads: 185
 

Draw data out on paper. 1 dimension is a row. 2 dimensions is row and col. Try loading a string into it and see what happens.


If you don't understand something, draw a picture.

WaltP
Posting Sage w/ dash of thyme
Moderator
10,506 posts since May 2006
Reputation Points: 3,348
Solved Threads: 944
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: