read in using fread() for arbitrary length data

Please support our C advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved

Join Date: Mar 2008
Posts: 7
Reputation: cheesy_mel is an unknown quantity at this point 
Solved Threads: 0
cheesy_mel cheesy_mel is offline Offline
Newbie Poster

read in using fread() for arbitrary length data

 
0
  #1
Apr 23rd, 2008
Hi..

I'm want to read each record (in 1 line) using fread(), the problem is the record length is arbitrary..

e.g.
1 "Joshua" "Rosenthal" "34 Mellili Ln" "Earlwood" 1 "000113133121" 0.000
2 "Martin" "Serrong" "45 Rosenthal Ccl" "Doveton" 1 "000113133121" 0.000
3 "Jacob" "Leramonth" "59 Dalion Pl" "Belmont" 1 "000113133121" 0.000

since fread() required how many characters we want to read, i'm doing like this..

  1. while(fgets(str,100,fp) != NULL) // read each line using fgets
  2. {
  3. recordLen[j] = strlen(str); // so i can get length of each record
  4. j++;
  5. }
  6.  
  7. do
  8. {
  9. len = 0;
  10. randNum = lrand48() % 830000;
  11. // read record randomly, 830000 is number of records
  12. rewind(fp);
  13.  
  14. for(i = 0; i < randNum-1; i++)
  15. // in order to seek the file pointer, i'm sum up length of each record
  16. // until the record i want to read
  17. {
  18. len += recordLen[i];
  19. }
  20.  
  21. fseek(fp, len, SEEK_SET);
  22. fgets(str,100,fp);
  23. // another fgets() to read the record I want to read, to find the length
  24.  
  25. fseek(fp, len, SEEK_SET);
  26. result = fread(buffer, CHAR_BYTE, strlen(str), fp);
  27. count++;
  28.  
  29. } while(count < MAX_READ); // MAX_READ = 50000, read in 50000 times

do anyone have better way to do it? since the one i'm doing take so much time, and time is the important things in my asg (im calculating file access time)..
Reply With Quote Quick reply to this message  
Join Date: Sep 2004
Posts: 7,804
Reputation: Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute 
Solved Threads: 747
Team Colleague
Narue's Avatar
Narue Narue is offline Offline
Code Goddess

Re: read in using fread() for arbitrary length data

 
1
  #2
Apr 23rd, 2008
Why use fread at all? fgets seems to be a better fit for your problem, especially since you're using it anyway to preprocess the record lengths. The more input requests you make, the slower your code will be, so if optimization in your goal, minimize the number of calls that read from the file.

Ideally you would keep a portion of the file in memory at any given time, but it's difficult to make this efficient when the access pattern is random.
I'm here to prove you wrong.
Reply With Quote Quick reply to this message  
Join Date: Mar 2008
Posts: 7
Reputation: cheesy_mel is an unknown quantity at this point 
Solved Threads: 0
cheesy_mel cheesy_mel is offline Offline
Newbie Poster

Re: read in using fread() for arbitrary length data

 
0
  #3
Apr 23rd, 2008
the requirement says that i need to use fread()..

actually the fgets() is a "cheat" code for me to get the length of records, and that part seems to be the one that make my program slower as i read the file twice..

so there's no way to get the length of record directly?
Reply With Quote Quick reply to this message  
Join Date: Dec 2005
Posts: 5,850
Reputation: Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute Salem has a reputation beyond repute 
Solved Threads: 751
Team Colleague
Salem's Avatar
Salem Salem is offline Offline
Void main'ers are DOOMed

Re: read in using fread() for arbitrary length data

 
0
  #4
Apr 23rd, 2008
If all you're doing is calculating file access time, what's wrong with just doing random position + random length ?

The length of an individual record doesn't seem that important if you're not actually using that record when you're done.

All you're going to produce is something like bytes/sec as the answer - right?
Reply With Quote Quick reply to this message  
Join Date: Sep 2004
Posts: 7,804
Reputation: Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute Narue has a reputation beyond repute 
Solved Threads: 747
Team Colleague
Narue's Avatar
Narue Narue is offline Offline
Code Goddess

Re: read in using fread() for arbitrary length data

 
1
  #5
Apr 23rd, 2008
>so there's no way to get the length of record directly?
No, but you're already assuming that the maximum length of a record is 99 characters, so I don't really see how the length matters. Why not build the length array as you need it, and instead of storing the record length, store the offset of the record. Something like so:
  1. recordLen: array[0..830000]
  2. n: int as 0
  3.  
  4. # The first record is always at offset 0
  5. recordLen[n] := 0
  6.  
  7. while count < MAX_READ do
  8. randNum: int as rand() % 830000
  9.  
  10. if randNum > n then
  11. # Fill the lengths up to randNum
  12. seek fp to recordLen[n]
  13.  
  14. # Build the offsets up to randNum
  15. while n < randNum and read str from fp do
  16. n := n + 1
  17. recordLen[n] := recordLen[n - 1] + length str
  18. loop
  19. else
  20. seek fp to recordLen[randNum]
  21. endif
  22.  
  23. # At this point we're at the correct offset
  24. read str from fp
  25.  
  26. # Process str
  27.  
  28. count := count + 1
  29. loop
That way you only read as much of the file as necessary. There's also an added benefit of being able to calculate the record length from the offsets if you need it. As long as seeking is quick, you should see at least some improvement, unless you're unlucky and you hit the upper end of the random range immediately.
Last edited by Narue; Apr 23rd, 2008 at 12:37 pm.
I'm here to prove you wrong.
Reply With Quote Quick reply to this message  
Join Date: Mar 2008
Posts: 7
Reputation: cheesy_mel is an unknown quantity at this point 
Solved Threads: 0
cheesy_mel cheesy_mel is offline Offline
Newbie Poster

Re: read in using fread() for arbitrary length data

 
0
  #6
Apr 24th, 2008
hi Narue, thanks for your help, it really helps me..
first time it was 35 secs only to read 500 records, now it's like 5secs++ for 50000 records..

regards,
MeL
Reply With Quote Quick reply to this message  
Reply

This thread has been marked solved.
Perhaps start a new thread instead?
Message:



Other Threads in the C Forum
Thread Tools Search this Thread



Tag cloud for C
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC