I am working on a software project that utilizes the MSSQL Express version. My question is...

I have checked out software that does the same thing that I want to do, but it seems that the competition uses a file based database system. I intend to use SQL because it seems easier and more reliable to me. Anyway, the issue I have is, the competition stores their data in generic dbf files that can be opened in excel or any other similar type of program. So I checked out the data structure, they store all addresses in one file, all contact names in another file, all billing information in one file, all billing addresses in another file etc... I am just wondering...because I know the company won't tell me if I ask, what would be the reason for splitting this information up into so many fragments? I assume because it is file based it is to help avoid read/write IO problems but I can't tell. I would think that having to look in multiple locations for one "record" would cause significant performance problems. I know once the data is retrieved from the file, the file is "closed" so to speak so that other instances across the network can view/modify the data (I tricked them into telling me that) but that's about all I can get out of them.

Any thoughts?

If I'm using an SQL database to store my information, would it improve or hinder performance to split data among all those different tables like they do in the file based system? I just can't believe a company would write software that is file based like that...I mean, these files could have the potential over years to grow to many gigabytes in size...just seems silly and a major performance short-sight to save them to the disk. Am I missing something? Would it really be better to be file based?

There is overhead with a SQL type database. Using a file based system allows them to avoid that overhead at a cost of performance (and limited size). Splitting the tables into different files enhances performance, as you don't have to read what you don't need (I suspect they also use fixed length records, which allows for direct access in the files, rather than having to read through each record to find the one you want). There is a book on how to do this (I have it on my shelf at home), the name of which I can't remember (Build Your Own C++ Relational Database, or something like that).

As for splitting your data into multiple tables, it can both help and hinder performance. Multiple tables helps reduce duplicate data, which makes searching faster, but too many cross table joins can slow down searches. If you really need the best performance you can get, you'll have to experiment with normalization and see how your particular usage fairs.


If you get a chance can you find the name of that book? I might want to check that one out and see if it's any help.

I'm not really sure why it's split up. I understand that SQL does have overhead...it's a server. But in a network environment, where you have hundreds of potential users accessing the same data, a file based system seems (to me) to be a terrible choice...a programmer would have to create (I would think) some sort of manager to update the file as new queries come in...which in essence is a SQL server, it's just a custom made one. General Windows file sharing wouldn't suffice. Right?