>How does a file open work (in c, as I'm using C functions)?
fopen returns a pointer to FILE (usually a structure), which is basically the bookkeeper for the low level information required to read and write the file from disk and maintain a buffer to avoid excessive device requests. A very simple example would be:
struct _iobuf {
int fd;
int offset;
char *buff;
char *curr;
int bufsize;
int left;
};
typedef struct _iobuf FILE;
The fd field is the file identifier, the offset field is the number of characters into the file we are, the buff field is the buffer, the curr field is the next character in the buffer, bufsize is the size of the buffer, and left is how many characters are left in the buffer. The left field is needed because the buffer might not be completely filled, such as if a read error occurs or end-of-file is reached.
Say you open a file with the contents "This is a test" and the call fopen ( "somefile", "r" ). If we use a buffer size of 6, the contents of the FILE object would be (leaving out fd and bufsize because they don't change):
offset = 0
buff = "This i"
curr = "This i"
left = 6
If you read a single character from the file using fgetc, the resulting object would be:
offset = 1
buff = "This i"
curr = "his i"
left = 5
This continues until the end of the buffer is reached by counting left down to 0:
offset = 6
buff = "This i"
curr = ""
left = 0
At this point the next request for input will require a device read starting at offset and reading bufsize characters into buff. Then curr is set to buff, and left is set to the number of chracters read:
offset = 6
buff = "s a te"
curr = "s a te"
left = 6
Once again, the buffer is completely filled and can be completely read, as above. But the next buffer fill is different because there are only two characters left. Supposedly, in this hypothetical implementation, the system call used to read from the file will signify end-of-file.
offset = 12
buff = "st"
curr = "st"
left = 2
Two characters are read, left goes to 0, and the next request for input fails due to end-of-file. Okay, this is all well and good in theory, but how does itreally work? Here is a quick and dirty C implementation: :)
#include <stdio.h>
#include <stdlib.h>
struct jsw_iobuf {
FILE *fd;
char *buff;
char *curr;
int bufsize;
int left;
};
typedef struct jsw_iobuf JSW_FILE;
JSW_FILE *jsw_fopen ( const char *path )
{
JSW_FILE *ret = malloc ( sizeof *ret );
if ( ret == NULL )
goto error;
ret->fd = fopen ( path, "r" );
if ( ret->fd == NULL )
goto error;
ret->bufsize = 6;
ret->buff = malloc ( 6 );
if ( ret->buff == NULL )
goto error;
ret->left = fread ( ret->buff, 1, 6, ret->fd );
ret->curr = ret->buff;
return ret;
error:
free ( ret );
return NULL;
}
void jsw_fclose ( JSW_FILE *fp )
{
fclose ( fp->fd );
free ( fp->buff );
free ( fp );
}
int jsw_fgetc ( JSW_FILE *in )
{
if ( in->left == 0 ) {
/* Refill */
in->left = fread ( in->buff, 1, in->bufsize, in->fd );
in->curr = in->buff;
}
if ( --in->left < 0 )
return EOF;
return *in->curr++;
}
int main ( void )
{
JSW_FILE *in = jsw_fopen ( "test.txt" );
char ch;
while ( ( ch = jsw_fgetc ( in ) ) != EOF )
printf ( "%c\n", ch );
jsw_fclose ( in );
}
Not very magical, is it? Sure, I used a portable approach, so this implementation is very contrived, and somewhat silly, but (barring the really irritating details) that's the gist of what happens when you open and read from a file stream.
>but are you moving the whole file into active memory, or disk
>reading it, which causes no extra space to be used?
A combination of the two. Portions of the file are buffered in memory, so extra space is used, but not enough for the entire file if you're using a large file. Typically though, small files of less than a few kilobytes will be small enough to completely fit in the buffer.
>because the pointers to the position in the file will be overwritten or something
It depends on how dynamic the file is. If you can expect simulaneous writes to the file while using the sliding window (such as with a multi-threaded environment), then it's very possible that without locking the file, you risk having your offsets invalidated when the file changes. If the file isn't going to change, there isn't a problem, but the entire concept of a framed file viewer is tricky.