Hello,

I made an arbitrary precision integer library a few weeks ago. It works well, but is bounded by 256^(8*sizeof(size_t)) because it uses a simple dynamically allocated array to do its work. I want some way to have theoretically unbounded storage. I think that file io is probably best, but then I realized that stdio functions (and thus their fstream counterparts) use size_t to indicate the position in a file, so at best I will merely double the number of bits I can store. As such I thought of a format that would in a way 'cascade' files in the following way:

Typical file: +/- (indicates sign), data stored in little-endian, base-256.

File for really really big numbers: f (indicates that this is a file of files), list of files that represent integers, in little-endian base 115792089237316195423570985008687907853269984665640564039457584007913129639936 (256^32 if wolfram is to be believed).

My issue is that this will take a LOT of work to incorporate, and I want to know that it will work. I also have questions concerning implementation. For example, when a number gets too big to be stored as a dynamic array I want to shift it to a temporary file in the above format. However, since this is an active number I would like to keep the file structure open so that I don't have to keep opening and closing it. The issue is that as far as I know, its when you close the file that you actually save it.

My first question then is: is it okay to keep the file open and still use it as large storage?

Then there is the issue of the really really big numbers. A file of files would require that I open and close each file every time I need it right? There is no way to keep it open and then 'find' it?

My second question then is: is there any way to NOT have to re-open those files every time I need them?

Finally, I am wondering, if I want to use temporary files I will have to remove them in my destructor, however the remove function in the stdio library only accepts file names, not files themselves.

My third question then is: is there any way to get the file name out of a FILE structure in stdio.

I realize that using stdio in a way makes this a more c-esque question, but my program is in c++ so fstream IS an option, except that I definately need speed and as far as I know I can get more of it from stdio.

tl;dr: If you keep a FILE (stdio file type) pointer open, can you still use it for long-term storage in your program without having to continuously close and open it? Is there any way to keep a FILE pointer open and then access it via its file name without having to close and reopen it? Is there any way to get the file name out of a FILE pointer?

Recommended Answers

All 3 Replies

I realized that stdio functions (and thus their fstream counterparts) use size_t to indicate the position in a file,

If you use win32 api function WriteFile() and ReadFile() then it uses 64-bit addresses which is two size_t integers.

You will find there is no such thing as unbound storage because the size of any one file is bound by the operating system parameters and hard drive size.

A file of files would require that I open and close each file every time I need it right?

Not necessarily -- you can have multiple files open at the same time.

The issue is that as far as I know, its when you close the file that you actually save it.

You can force the os to flush data to the file. stdio.h is fflush(stream) while fstream is stream.flush()

My second question then is: is there any way to NOT have to re-open those files every time I need them?

If you close the file then you will have to open it again. But why close it? Just leave it open for as long as the program needs to use it. As I said before you can have a lot of files open at the same time, the limit is pretty large.

My third question then is: is there any way to get the file name out of a FILE structure in stdio.

No. I've had that same question years ago. FILE does not contain the name of the file, just a handle to it.

Finally, I am wondering, if I want to use temporary files I will have to remove them in my destructor, however the remove function in the stdio library only accepts file names, not files themselves.

If there are a lot of files to be removed you can easily get the file names by calling win32 api function FindFirstFile() and FindNextFile(), which return file names of all files that match the pattern you specify to FindFirstFile().

Another way would be to keep a vector of file names as the program generates them.

except that I definately need speed and as far as I know I can get more of it from stdio.

I don't think it will matter very much, one is about as slow as the other. I tried to test that a few years ago on MS-Windows compiler (forget which version) and found that both FILE and fstream eventually called win32 api ReadFile() or WriteFile() to do actual file i/o. The implementation may have changed since then, I don't know. You might also want to test C++/CLR, which is .NET version of C++. Very similar languge to standard c++ but calls .NET functions instead of win32 api functions. I've heard Microsoft will eventually deprecate win32 api.

If you keep a FILE (stdio file type) pointer open, can you still use it for long-term storage in your program without having to continuously close and open it? Is there any way to keep a FILE pointer open and then access it via its file name without having to close and reopen it?

Why do all that work? Of course there's a way to keep FILE handles open throughout the lifetime of the program -- just keep a vector of them, e.g. vector<FILE*> files. If you want to associate file names with FILE pointers then use std::map to store them.

Whew, a lot of questions. ;) I'll answer them, but before doing that I'll strongly recommend that you consider memory mapping your files if you go the route of file lists. That way you only need to manage the organization of the files themselves as well as the memory maps, and after that your code can just pretend to be working with really large arrays. So the meat of your code that exists now won't need to be modified much, if at all.

My first question then is: is it okay to keep the file open and still use it as large storage?

Yes, but keep in mind that there's a limit to the number of files you can have open at one time. Usually both a system and a stream library limit will be imposed and they may or may not match. In the C stdio library, the limit is defined in the FOPEN_MAX macro.

is there any way to NOT have to re-open those files every time I need them?

You need some sort of way to access the files, whether it be keeping them open or re-opening them.

is there any way to get the file name out of a FILE structure in stdio.

Not portably, no. You'll need to maintain a map of the file with its path.

fstream IS an option, except that I definately need speed and as far as I know I can get more of it from stdio.

That's debatable, especially if you're using a modern compiler.

Perfect answers. I think I will use fstream (for ease, and portability [I assume that if win32 api gets depricated that fstream will probably be one of the first to be 'fixed']) I will open 1 file per integer while keeping track of how many integers are in existence (something I already do). If that number goes over FOPEN_MAX I will close all of them and then only open them while needed. As for the recursive files for really really big numbers, they will already take FOREVER to work with, so the slight overhead of opening and closing a file really shouldn't be noticeable. Thank you for the help :)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.