0

Howdy! I'm trying to get my head wrapped around normalization and it's not working out to well. I'm using datasheets from the general and primary election results of 2008. The data sheets have a lot of duplicate info in them and I understand that is what normalization is supposed to get rid of. However if I try to break the duplicate info out into other tables it's still duplicated. If I turned it into a table it would look like this

tbl_prim_pres_elect
state
state_id
date
cand_id
cand_name
party_id
numb_votes
Because each candidate can be listed on ballets of every state, data in every field is duplicated throughout the table except for num_votes. It doesn't make sense to me to make a table for state with the candidate in it if the candidate will be listed in every state. That would cause the state and the candidate to be listed over and over again. Maybe it's already normalized.
I did have some luck with normalizing the congressional elections both primary and general elections but still end up with tables with duplicate info but with senate and house tables and primary and general election tables.
Here is the original House and senate table General election fields
STATE
STATE ABBREVIATION
DISTRICT FEC ID# I
NCUMBENT INDICATOR (I)
CANDIDATE NAME (First)
Candidate Name (Last)
CANDIDATE NAME
PARTY
PRIMARY_votes
PRIM_RUNOFF_votes
GENERAL_votes
GE RUNOFF_votes
I broke that down into four tables
As follows
Senate_PE
state
state_id
cand_id
cand_name
party_id
incumbent_indicator
numb_votes

House_PE
state
state_id
district_numb
cand_id
cand_name
party_id
incumbent_indicator
numb_votes

Senate_GE
state
state_id
cand_id
cand_name
party_id
incumbent_indicator
numb_votes

House_GE
state
state_id
district_numb
cand_id
cand_name
party_id
incumbent_indicator
numb_votes
Any pointers would be appreciated
Thanks

2
Contributors
1
Reply
2
Views
7 Years
Discussion Span
Last Post by drjohn
0

This is NOT properly normalised
tbl_prim_pres_elect
state
state_id
date
cand_id
cand_name
party_id
numb_votes


Because cand_id dictates cand_name and party_id
SO you should have a candidate table containing that data.

next state and state_id - will there be two states with the same name??? No, so the true primary key for state is state.

Oh, and will there be two parties with the same name but a different party_id??? No, so party is the true key for that item.

There is a tendency for beginners to assume that all tables must have a numeric id as the primary key THIS IS WRONG. Use the natural key were ever appropriate. If the natural key was a triple composite key, yes, adding a numeric key does make life a bit easier, of course.

Edited by drjohn: n/a

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.