read in using fread() for arbitrary length data

Question

cheesy_mel 0 Newbie Poster

17 Years Ago

Hi..

I'm want to read each record (in 1 line) using fread(), the problem is the record length is arbitrary..

e.g.
1 "Joshua" "Rosenthal" "34 Mellili Ln" "Earlwood" 1 "000113133121" 0.000
2 "Martin" "Serrong" "45 Rosenthal Ccl" "Doveton" 1 "000113133121" 0.000
3 "Jacob" "Leramonth" "59 Dalion Pl" "Belmont" 1 "000113133121" 0.000

since fread() required how many characters we want to read, i'm doing like this..

while(fgets(str,100,fp) != NULL) // read each line using fgets
   {
	   recordLen[j] = strlen(str); // so i can get length of each record
	   j++;
   }
   
   do
   {
      len = 0;
      randNum = lrand48() % 830000; 
      // read record randomly, 830000 is number of records
      rewind(fp);
      
      for(i = 0; i < randNum-1; i++) 
      // in order to seek the file pointer, i'm sum up length of each record 
      // until the record i want to read
      {
         len += recordLen[i];
      }
      
      fseek(fp, len, SEEK_SET);
      fgets(str,100,fp); 
      // another fgets() to read the record I want to read, to find the length
      
      fseek(fp, len, SEEK_SET);
      result = fread(buffer, CHAR_BYTE, strlen(str), fp);
      count++;
      
   } while(count < MAX_READ); // MAX_READ = 50000, read in 50000 times

do anyone have better way to do it? since the one i'm doing take so much time, and time is the important things in my asg (im calculating file access time)..

c

3 Contributors
5 Replies
123 Views
1 Day Discussion Span
Latest Post 17 Years Ago Latest Post by cheesy_mel

All 5 Replies

Narue 5,707 Bad Cop

17 Years Ago

Why use fread at all? fgets seems to be a better fit for your problem, especially since you're using it anyway to preprocess the record lengths. The more input requests you make, the slower your code will be, so if optimization in your goal, minimize the number of calls that read from the file.

Ideally you would keep a portion of the file in memory at any given time, but it's difficult to make this efficient when the access pattern is random.

Narue 5,707 Bad Cop

17 Years Ago

>so there's no way to get the length of record directly?
No, but you're already assuming that the maximum length of a record is 99 characters, so I don't really see how the length matters. Why not build the length array as you need it, and instead of storing the record length, store the offset of the record. Something like so:

recordLen: array[0..830000]
n: int as 0

# The first record is always at offset 0
recordLen[n] := 0

while count < MAX_READ do
  randNum: int as rand() % 830000

  if randNum > n then
    # Fill the lengths up to randNum
    seek fp to recordLen[n]

    # Build the offsets up to randNum
    while n < randNum and read str from fp do
      n := n + 1
      recordLen[n] := recordLen[n - 1] + length str
    loop
  else
    seek fp to recordLen[randNum]
  endif

  # At this point we're at the correct offset
  read str from fp

  # Process str

  count := count + 1
loop

That way you only read as much of the file as necessary. There's also an added benefit of being able to calculate the record length from the offsets if you need it. As long as seeking is quick, you should see at least some improvement, unless you're unlucky and you hit the upper end of the random range immediately. ;)

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

cheesy_mel 0 Newbie Poster · Answer 1 · 2008-04-23T21:01:51+00:00

the requirement says that i need to use fread()..

actually the fgets() is a "cheat" code for me to get the length of records, and that part seems to be the one that make my program slower as i read the file twice..

so there's no way to get the length of record directly?

Salem 5,265 Posting Sage · Answer 2 · 2008-04-23T21:16:03+00:00

If all you're doing is calculating file access time, what's wrong with just doing random position + random length ?

The length of an individual record doesn't seem that important if you're not actually using that record when you're done.

All you're going to produce is something like bytes/sec as the answer - right?

cheesy_mel 0 Newbie Poster · Answer 3 · 2008-04-24T09:34:50+00:00

hi Narue, thanks for your help, it really helps me.. :)
first time it was 35 secs only to read 500 records, now it's like 5secs++ for 50000 records..

regards,
MeL

read in using fread() for arbitrary length data

Recommended Answers Collapse Answers

All 5 Replies

Recommended Answers