Hi all,

I was reading Vijaya Mukhi's The 'C' Odyssey UNIX - The Open-Boundless C (1st ed.). This is in reference to program 43 in the first chapter when file buffering is introduced.

#include <stdio.h>
int main() {
 FILE *fp;
 char buff[11]; 
 int pid;
 fp=fopen("baby1","r");
 pid=fork();
 if(!pid) {
  printf("initial file handle :: %d\n",ftell(fp));
  fread(buff,sizeof(buff),1,fp);
  buff[10]='\0';
  printf("child read :: %s\n",buff);
  printf("child file handle :: %d\n",ftell(fp));
 }
 else {
  wait((int *)0);
  printf("parent file handle :: %d\n",ftell(fp));
  fread(buff,sizeof(buff),1,fp);
  buff[10]='\0';
  printf("parent read :: %s\n",buff);
  printf("end file handle :: %d\n",ftell(fp));
 }
 return 0;
}

The contents of baby1 are:

abcdefghijklmnopqrstuvwxyz

As per the book, the output should be:

initial file handle :: 0
child read :: abcdefghij
child file handle :: 11
parent file handle :: 1024
parent read ::
end file handle :: 1024

The book explains this by saying that fread() reads 1024 characters (whatever the block size is) by default into a buffer. Hence the file pointer also moves by 1024 bytes which is reflected because the process is different.

I tried the same code on Solaris and RHEL and got interesting results.

Output on Solaris:

initial file handle :: 0
child read :: abcdefghij
child file handle :: 11
parent file handle :: 11
parent read :: lmnopqrstu
end file handle :: 22

This indicates that in Solaris, block size is the same as specified in the fread() arguments. There is no such thing called default block size.

Output on RHEL:

initial file handle :: 0
child read :: abcdefghij
child file handle :: 11
parent file handle :: 26
parent read :: ¸
end file handle :: 26

This indicates that in RHEL, block size is a number greater than 26 but the file pointer never exceeds the size of file.

From what I understood, across various OS'es system calls are different but library calls behave in the same way. Library calls should be system independent which is not the case here. I would like some clarification on this point.

I would also like to have some idea of how the file buffering takes place internally in case of system calls and library calls.

Thanks in advance.

sree_ec commented: gud qn +1

Recommended Answers

All 18 Replies

Interesting. On MS-Windows I get the same results that you got on Solaris, which is correct. That book is wrong. The operating system may read into memory any amount of the file it wants to (buffering), but the file pointer only moves to the location requested in the program. There is little, or no, connection between the operating system's buffer size and the location of the file pointer.

Why the difference on RHEL, I don't know. Maybe there was some other kind of error.

initial file handle :: 0
child read :: abcdefghij
child file handle :: 11
parent file handle :: 11
parent read :: lmnopqrstu
end file handle :: 22
Press any key to continue . . .

#include<stdio.h>
#include <Windows.h>

FILE* fp = 0;
char buff[11] = {0};

DWORD WINAPI ThreadProc(void* lpParameter)
{
 printf("initial file handle :: %d\n",ftell(fp));
  fread(buff,sizeof(buff),1,fp);
  buff[10]='\0';
  printf("child read :: %s\n",buff);
  printf("child file handle :: %d\n",ftell(fp));
  return 0;
}


int main()
{
    fp = fopen("TextFile1.txt", "r");
    DWORD dwThreadID = 0;
    HANDLE hThread = CreateThread(0,0,ThreadProc,0,0,&dwThreadID);
    WaitForSingleObject(hThread,INFINITE);
  printf("parent file handle :: %d\n",ftell(fp));
  fread(buff,sizeof(buff),1,fp);
  buff[10]='\0';
  printf("parent read :: %s\n",buff);
  printf("end file handle :: %d\n",ftell(fp));

}
commented: Thanks a lot! +4

The operating system may read into memory any amount of the file it wants to (buffering), but the file pointer only moves to the location requested in the program. There is little, or no, connection between the operating system's buffer size and the location of the file pointer.

That is what I thought. It seems to be quite logical too. I got my doubts after running it on RHEL. And from that I was surprised to see library calls behaving differently across two different OS.

There was no error in running the code on RHEL. Maybe someone with easy access to RHEL can give it a shot?

Thanks.

This is my O/P on a mac machine

~$ ./test
initial file handle :: 0
child read :: abcdefghij
child file handle :: 11
parent file handle :: 28
parent read ::
end file handle :: 28
~$

Maybe some one could shed some more light on this matter

By the way how is the book ?
As in what topics does the book cover ?

~$ ./test
initial file handle :: 0
child read :: abcdefghij
child file handle :: 11
parent file handle :: 28
parent read ::
end file handle :: 28
~$

See, that is similar to what I got on RHEL! It is showing 28 instead of 26, maybe due to some leading/trailing whitespaces. I am also waiting in case anybody comes up with some sort of explanation :)

The book is good, introduces unix as a multitasking multiuser OS, covers file handling, and exhaustively covers IPC, some system calls and curses.

Tried in RHEL
Got 27 for parent instead of 26 in your output .. :-O

In fork
When creating a process, in fork() the child takes the duplicate of the memory space from parent.. So the address of &fp need not be same but fp is same.

Then
I used vfork() instead of fork(), with the same program.. [just gave it a try]
The o/p obtained was different :-/. It was like the output expected which shows parent fp at 22; just like your solaris output.

In vfork() parent and child uses the same memory space [shared]
So i think this is the problem with fork() instead of fread()
This also aligns with your suggestion that system calls can behave differently but library calls must behave same.
May be fork implementation in diff OSs are different.

But i must confess that i still dont understand , even if fork uses different memory spaces for parent and child, how the ftell(fp) points to a different value after child exits.

One wild wild guess is that when child terminates, the system resources associated with it are freed. So the point to which fp[ i mean the location from where it gets number of bytes read] [since fp is a pointer, it will point to same location for both parent and child] points to will also get freed and it is filled up with some junk value. :?:


May be someone with exp in fork can explain better?

commented: thanks a lot :) +4

MS-Windows creates threads just like vfork, where all threads share the same memory. I didn't realize fork() creates new data space :-O

MS-Windows creates threads just like vfork, where all threads share the same memory. I didn't realize fork() creates new data space :-O

That just aligns with what i thought!
Is it just that the code is not the correct way the fork is supposed to be done?
That will also mean that author was wrong which is hard to believe..

This is the output on Solaris in the case of vfork()

initial file handle :: 0
child read :: abcdefghij
child file handle :: 11
parent file handle :: -1
Segmentation Fault (core dumped)

I will add a detailed post later.

This is the output on Solaris in the case of vfork()

I will add a detailed post later.

we need to do a clean up before child terminates. So add an exit(0); or _exit(0); before the process terminates. This should solve the problem

This is the output on Solaris in the case of vfork():
Segmentation Fault (core dumped)

A vforked child is confined to _exit() and execve(). Anything else causes undefined behaviour.

MS-Windows creates threads just like vfork, where all threads share the same memory.

It does not. After vfork the parent process is blocked until the child does either _exit() or execve. Parent and child never run in parallel while sharing memory space.

But i must confess that i still dont understand , even if fork uses different memory spaces for parent and child, how the ftell(fp) points to a different value after child exits.

The point is that even though parent and child have distinct FILE objects, both of them refer to the same file obect in the kernel. The parent's ftell() obtains the read pointer from this shared object, which is of course affected by child's read().


It does not. After vfork the parent process is blocked until the child does either _exit() or execve. Parent and child never run in parallel while sharing memory space.

I have no clue how vfork() works -- I was just comparing the posted results with how MS-Windows CreateThread() works. The test I posted and ran works just as you described for vfork() -- the win32 api function WaitForSingleObject() causes the thread to wait until the child thread terminates.

The point is that even though parent and child have distinct FILE objects, both of them refer to the same file obect in the kernel. The parent's ftell() obtains the read pointer from this shared object, which is of course affected by child's read().

But if that is the case they why are parent and child file pointers different on some operating systems and the same on others? (Note: I'm not trying to state an opinion here, but just asking a question from someone who is probably more knowledgable about this than I am.)

we need to do a clean up before child terminates. So add an exit(0); or _exit(0); before the process terminates. This should solve the problem

Thanks sree_ec. I added exit(0) for child (the vfork case) and it gave the same output as Solaris gives for fork().

The point is that even though parent and child have distinct FILE objects, both of them refer to the same file obect in the kernel. The parent's ftell() obtains the read pointer from this shared object, which is of course affected by child's read().

This is understood. The question arises when only 10 characters are read by the child and fp is pointing to the EOF when parent resumes. It should point to 11 only.

Now, let's see what we have concluded until now. Earlier one question was that why does a library call fread() work differently across different OS. It looks like the different outputs across different OS is a consequence of using fork() system call. So that is okay.

Let me list down the questions that remain:

Q-1) Why does fp point to EOF in case of RHEL and Mac after the parent resumes when only 10 characters are read in the child process?

Q-2) How is the value returned by ftell(fp) after the parent resumes affected by the fact that fork() is defined differently in different OS?

Q-3) Why is the author of the book expecting the fp to point to 1024 (or the block size) after parent resumes irrespective of the OS?

Thanks sree_ec. I added exit(0) for child (the vfork case) and it gave the same output as Solaris gives for fork().


This is understood. The question arises when only 10 characters are read by the child and fp is pointing to the EOF when parent resumes. It should point to 11 only.

Now, let's see what we have concluded until now. Earlier one question was that why does a library call fread() work differently across different OS. It looks like the different outputs across different OS is a consequence of using fork() system call. So that is okay.

Let me list down the questions that remain:

Q-1) Why does fp point to EOF in case of RHEL and Mac after the parent resumes when only 10 characters are read in the child process?

Q-2) How is the value returned by ftell(fp) after the parent resumes affected by the fact that fork() is defined differently in different OS?

Q-3) Why is the author of the book expecting the fp to point to 1024 (or the block size) after parent resumes irrespective of the OS?

http://www.linuxquestions.org/questions/programming-9/fork-and-file-buffering-791825/

Before trying to answer all these, i would suggest you to read this. thanks to google for getting me this.

Now you might be in a position to ans all these qns by yourself...
and for 3rd question, Guess blocksize as 1024 for the author..

So as far as it turns out to be , in diff os's the buffering parameter is also varying.So in the case we use vfork() and assume that it shares same fle pointer structure
, we can still come to a logical conclusion.

So finally it seems, fork() [using diff memory space] and buffering parameter differences in OS's makes this problem.

commented: thought provoking link :) +4

Q3: The author may be wrong. Wouldn't be the first time wrong info was printed in a book.

Q-3) Why is the author of the book expecting the fp to point to 1024 (or the block size) after parent resumes irrespective of the OS?

Adding to my prev post
might not be irrespective of os/fs. it must be 1024 where he tested.

Use 'stat -f .' to see the default block size of your system

Q3: The author may be wrong. Wouldn't be the first time wrong info was printed in a book.

Ok. Lets leave Q-3 aside for the time being.

http://www.linuxquestions.org/questions/programming-9/fork-and-file-buffering-791825/

Before trying to answer all these, i would suggest you to read this. thanks to google for getting me this.

Now you might be in a position to ans all these qns by yourself...
and for 3rd question, Guess blocksize as 1024 for the author..

So as far as it turns out to be , in diff os's the buffering parameter is also varying.So in the case we use vfork() and assume that it shares same fle pointer structure
, we can still come to a logical conclusion.

So finally it seems, fork() [using diff memory space] and buffering parameter differences in OS's makes this problem.

Let me get back to you after reading your link and doing some further experiments.

Thanks sree_ec. This is the code I experimented on. It is derived from the code posted there.

#include<stdio.h>
#include<unistd.h>
int main()
{
FILE *fp;
char buff[11];
int pid, i;
fp=fopen("vivek","w+");
for(i=0;i<8192;i++)
    fprintf(fp,"A");
for(i=0;i<8192;i++)
    fprintf(fp,"B");
for(i=0;i<8192;i++)
    fprintf(fp,"C");
for(i=0;i<8192;i++)
    fprintf(fp,"D");
fclose(fp);
fopen("vivek","r");
printf("M0 ftell-> %ld lseek-> %ld buff-> %s\n",ftell(fp),lseek(fileno(fp),01, SEEK_CUR), buff);
pid=fork();
if(pid==0)
{
printf("C1 ftell-> %ld lseek-> %ld buff-> %s\n",ftell(fp),lseek(fileno(fp),01, SEEK_CUR),buff);
fread(buff,sizeof(buff),1,fp);
buff[10]='\0';
printf("C2 ftell-> %ld lseek-> %ld buff-> %s\n",ftell(fp),lseek(fileno(fp),01, SEEK_CUR),buff);
sleep(2);
printf("C5 ftell-> %ld lseek-> %ld buff-> %s\n",ftell(fp),lseek(fileno(fp),01, SEEK_CUR),buff);
fread(buff, sizeof(buff),1,fp);
buff[10]='\0';
printf("C6 ftell-> %ld lseek-> %ld buff-> %s\n",ftell(fp),lseek(fileno(fp),01, SEEK_CUR), buff);
_exit(0);
}
else
{
sleep(1);
printf("P3 ftell-> %ld lseek-> %ld buff-> %s\n",ftell(fp),lseek(fileno(fp),01, SEEK_CUR),buff);
fread(buff,sizeof(buff),1,fp);
buff[10]='\0';
printf("P4 ftell-> %ld lseek-> %ld buff-> %s\n",ftell(fp),lseek(fileno(fp),01, SEEK_CUR),buff);
sleep(3);
printf("P7 ftell-> %ld lseek-> %ld buff-> %s\n",ftell(fp),lseek(fileno(fp),01, SEEK_CUR),buff);
fread(buff,sizeof(buff),1,fp);
buff[10]='\0';
printf("P8 ftell-> %ld lseek-> %ld buff-> %s\n",ftell(fp),lseek(fileno(fp),01, SEEK_CUR),buff);
_exit(0);
}
}

The output is:

M0 ftell-> 0 lseek-> 1 buff->
C1 ftell-> 1 lseek-> 2 buff->
C2 ftell-> 13 lseek-> 8195 buff-> AAAAAAAAAA
P3 ftell-> 8195 lseek-> 8196 buff->
P4 ftell-> 8207 lseek-> 16389 buff-> BBBBBBBBBB
C5 ftell-> 8208 lseek-> 16390 buff-> AAAAAAAAAA
C6 ftell-> 8220 lseek-> 16391 buff-> AAAAAAAAAA
P7 ftell-> 8210 lseek-> 16392 buff-> BBBBBBBBBB
P8 ftell-> 8222 lseek-> 16393 buff-> BBBBBBBBBB

This output helped me understand the behaviors satisfactorily.

Thanks all for the help. I highly appreciate it.

Still I would like to state an interesting conclusion

The high-level call fread() is buffered. By default, it reads a block of data into a system buffer. The block size depends on the system. And hence, in spite of being a library call, its behavior is system-dependent. However, this statement holds true only for RHEL and MAC (among OS that I have tested on) . For Solaris it is valid only if file size is greater than the default block size. In Solaris, if the file size is less than the default block size, then the system only buffers that much bytes as specified in the function call.

Another situation is one in which _exit(0) calls are removed from the parent and child processes. I did lot of experimenting and observed some output patterns. But that is probably out of the scope of this forum..

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.