Howdy All,

I'm creating a database for UFO sightings and would appreciate some feedback or suggestions. Yes I know, strange topic.

Some background info so that the problem is understood:

I'm extracting data from several large UFO database catalogs, news articles, and other miscellaneous sources, so that the data can be queried, and the user pointed toward the original resources.

I'm up to about 250 data points that need to be searchable. Things like UFO shape, external light colors, internal light colors, surface characteristics, sound, direction, altitude, estimated speed, etc...

The problem is that I can't normalize the database like would typically be done, because an advanced query would create too many table joins. In my experience, too many joins makes the query very ineffecient - unless someone has a suggestion about that, as I am not a database expert.

So I have one table with 250 or so fields. But the query length gets hard to manage.

Some of the fields are related, such as external light colors. Multiple colors may apply, so there are about 17 fields for external light colors that are coded either 'Y' or 'N'

What I need feedback on:
Trying to make the table more manageable, I thought about coding related fields into one, such as external light colors mentioned above. Example: 1 for true and 0 false for each color, so you would end up with one field with a string of 17 ones and zeros. As long as the query structure matches the input structure, they should match for queries. Kind of like a hashed password value. Any thoughts on why would work, not work, or have any suggestions?

Any feedback much appreciated. Thanks!

You can coelesce related fields, but that can wind up causing more of a headache than you'd expect at first.

As with all things SQL, it's often hard to predict where the bottlenecks in your database are going to be. Most of the time, however, you can analyze slow queries to figure out what those bottlenecks are. Sometimes, it's just a matter of coming up with the right fields to index. Occasionally, specific views can help. But analyzing a database is a lot easier than trying to prognosticate.

My advice would be to put your database in third normal form, get some data in there, and start running queries. The really slow ones will show up soon enough; then you can analyze the queries and address the specific issues that come up.

You have to find a balance between normalization and de-normalization. De-normalization is commonly used for reporting and OLAP workloads. So for tables that will have workloads use de-normalization when designing them for other tables that will not workloads use the normalization. Of course, all of this depending on your bussiness. You have to know the bussiness dramatically.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of 1.18 million developers, IT pros, digital marketers, and technology enthusiasts learning and sharing knowledge.