Howdy! I'm trying to get my head wrapped around normalization and it's not working out to well. I'm using datasheets from the general and primary election results of 2008. The data sheets have a lot of duplicate info in them and I understand that is what normalization is supposed to get rid of. However if I try to break the duplicate info out into other tables it's still duplicated. If I turned it into a table it would look like this
tbl_prim_pres_elect
state
state_id
date
cand_id
cand_name
party_id
numb_votes
Because each candidate can be listed on ballets of every state, data in every field is duplicated throughout the table except for num_votes. It doesn't make sense to me to make a table for state with the candidate in it if the candidate will be listed in every state. That would cause the state and the candidate to be listed over and over again. Maybe it's already normalized.
I did have some luck with normalizing the congressional elections both primary and general elections but still end up with tables with duplicate info but with senate and house tables and primary and general election tables.
Here is the original House and senate table General election fields
STATE
STATE ABBREVIATION
DISTRICT FEC ID# I
NCUMBENT INDICATOR (I)
CANDIDATE NAME (First)
Candidate Name (Last)
CANDIDATE NAME
PARTY
PRIMARY_votes
PRIM_RUNOFF_votes
GENERAL_votes
GE RUNOFF_votes
I broke that down into four tables
As follows
Senate_PE
state
state_id
cand_id
cand_name
party_id
incumbent_indicator
numb_votes
House_PE
state
state_id
district_numb
cand_id
cand_name
party_id
incumbent_indicator
numb_votes
Senate_GE
state
state_id
cand_id
cand_name
party_id
incumbent_indicator
numb_votes
House_GE
state
state_id
district_numb
cand_id
cand_name
party_id
incumbent_indicator
numb_votes
Any pointers would be appreciated
Thanks