I have the following scenario:
There are several trips, each with several people, and each people has several data, like... idk., height, weight, hair color, and of course passport number, name...
So the point is to retrieve these from several sources, store them, then make statistics.
I'm only interested in speed, the disk usage is irrelevant. I use Java DB (Apache Derby), or I may use anything cross platform through JDBC.

I can think of 2 designs:
- Normal: I create an autoinc surrogate key for the people, and use joins between trips and people tables.
- Simpler: I store the joined data in only one table, for each person the trip datas will be replicated. Key is tripDate/tripName/person.seat.

And here is the problem:
For sequential querying the Normal form is faster, because there is no replicated data about trips, so optimally there are less clusters accessed. Also this way there are 2 lookups, but I think this is irrelevant. But the problem occurs when some data source is slower than others, or they inserts data in random order (like paper based ones), and the data becomes fragmented: I can sort the trips table by date, but the autoinc field in the person table keeps it fragmented. On the other hand, if I create only 1 table, there are large amounts of data replicated, but everything can be sorted by date and searched sequentially.
So which way is the better after all? Or any other idea?
Thanks for helping!


Derby/JavaDB & JDBC/type2/3 is a perfect basis, I also use it in netbeans-developement. If you put all attributes together in just one relation/table, you will run into problems like insert/update/delete anomalies. Then you will be forced to patch by Java code for SQL queries will not work or too difficult to design. Therefore NF (3NF) is a must!

You can post your tables and key-definitions so we can discuss them.

-- tesu

Thanks for the post, I'm almost done, and I've noticed the anomalies: Trip data should be separated. But even after then I can simply use the tripDate/tripName/seat key for persons, or I can create an autoinc. The long one keeps the table properly sorted, but the autoinc is smaller... Well, I choose the 1st one, and make sure it's the primary key, and I just hope the engine will use a tree structure, so dates and names won't be replicated for every seat.