Hi Friends,

Our project is developed in J2ee technology using EJB, now we are migrating to hibernate. We are using Oracle 10g. Architecture of this project has not been satisfactory (including database) and we have several performance and maintenance issues, we are changing the architecture and database design now. I have some questions about database refactoring.
I would like to get your point of view on some database issues. I have following questions, please give your opinion about them:

1. Foreign Keys - We do not have tables related to each other through foreign keys, although we do have common column for joins, but FK constraints are not there. Should we enforce FKs on DB level? There are some concerns about adding FKs like - performance issues may occur, truncating tables will be a problem (which we need to do for data merge across databases). But we do want to main referential integrity to not to have any orphan records in DB. One alternative is to write triggers which will enforce data integrity. According to you what is the best way - use triggers or FKs?

2. Different tables for pieces of data that look same but are logically different - We have many kinds of properties in our project - Product Properties, Quote properties, LineItem properties are some examples. Since each type of property has SAME columns (fields), so we have a single table for storing all properties. In addition, we have a column - parentid, which contains productid for product properties, quoteid for Quote properties and lineitemid for lineitem properties. parentid column is not mapped as a foreign key from any MaSTER table. Should we either
a) have different table for different set of properties so that tables could simply be linked to their parent master table, OR

b) remove parentid from Property table, and create multiple 2-column tables for each type of property to link parent to property - that means create product_property_relation table for (productid, propertyid) ,
quote_property_relation table for (quoteid, propertyid) and similarly for lineitem properties.

3. Surrogate Numeric Keys - Right now we have text values acting as PKs. Should we change all of them to numeric surrogate keys to have small sized tables and fast retrieval. But a different opinion is that text keys are descriptive and help understand what that key is for. What is your opinion?

4. Splitting a big table - Right now we have a table which contains 3 types of data.
a) ParentProduct to ChildProducts relation
b) ParentProduct to VirtualChildProducts relation
c) VirtualChildProduct to OptionalChildProducts relation

And we have a column "Relation_type" which tells us what type of relation this is. For e.g.

prod1 subprod1 CHILD
prod1 subprod2 CHILD
prod1 subprod3 VIRTUAL
subprod3 subprod4 OPTIONAL

This table is very important for majority of queries in our application. Many times we need to get all subproducts of a product, or all virtual and optional products (together or separately). This table gets huge because of a lot of product relations and affects performance everywhere (because it needs to be self joined). One idea we are pondering on is to split it into 3 tables:


Then we will be dealing with much smaller tables, but JOINS will still be a concern. What is your suggestion about it?

Try to respond as fast as possible. Thanks in advance.


Hi varun077,

Most of your questions including your several performance and maintenance issues will simply get solved if you (re-)design a true relational datamodel based on correct relationships (cardinalities) and effective normalization, as for example:

1. Foreign keys: will appear automatically if relationships are correctly taken into account and implemented in database. They are a must for ensuring referential integrity. Why shall one painfully reinvent that at Java programming level or even simulate that by triggers if database has already the most effective referential-integrity feature? It is true myth that foreign keys would provoke serious performance issues. As for triggers: Simulating foreign keys by means of triggers is like breaking flies on wheels. In both cases you probably need indexed columns to gain reasonable performance depending on the size of tables. Additionally for triggers, interpreted program code must be executed on server, how to coordinate triggers? Did you ever worked with databases where some hundreds triggers were firing every which way?

2a. Never put different data/information described by similar attributes into one table what would lead to poor normalization what also would entail typical anomalies. Is there really a common primary key? And how to write effective-SQL select statements within such context?

2b. Never drop foreign keys, which implements one-to-many relationships, from a table and replace them by additional two-column tables containing two foreign keys from both associate tables to joining them together: You would then have many-to-many relationships only. For example, such constellation would allow to assign items of an invoice of a particular invoice, between them exists a real one-to-many relationship and nothing else, to every other existing invoice. To monitor and avoid this, you must invest additional plain programming efforts.

3. Surrogate keys: If a column or set of columns is able to identify each row uniquely and permanently over the course of time you should use this naturally given key. OK, if such natural degenerates into many columns sometimes it's cleverer to introduce a surrogate key what is then usual a numeric and auto-increment key. But attention: Surrogate keys never avoid duplicate data. Therefore, this must be done in Java program itself. (Don't forget to put indexes on all relevant columns you want to check for duplicates in consequence of beloved surrogate keys. You got it? Yes, you wouldn't have run into such mess, if you hadn't that surrogate thing.)

4. Splitting big table: There is no reason to split tables what contains tree-like data into various tables because trees can be effectively handled by recursive tables. You may google for Joe Celko trees.

Quintessence: All above enumerate problems would never arise, if database designers follow Edgar Frank Codd's idea of relational database design strictly.

Don't forget: In a well designed datamodel all selects should be implemented on SQL level rather then by plain programming code no matter how complex the query is. (It's really poor if one finds in a Java program two nested loops what implements fetching the rows from a one-to-many relationship (aka master-detail tables).

Best regards,