I'm working on a website that will be based on user contributed data, submitted using a regular HTML form.

To simplify my question, let's say that there will be two fields in the form: "User Name" and "Country" (this is just an example, not the actual site).

There will be two tables in the database : "countries" and "users," with "users.country_id" being a foreign key to the "countries" table (one-to-many).

The initial database will be empty. Users from all over the world will submit their names and the countries they live in and eventually the "countries" table will get filled out with all of the country names in the world.

Since one country can have several alternative names, input like Chile, Chili, Chilli will generate 3 different records in the countries table, but in fact there is only one country. When I search for records from Chile, Chili and Chilli will not be included.

So my question is - what would be the best way to deal with a situation like this, with conditions such that the initial database is empty, no other resources are available and everything is based on user input?

How can I organize it in such way that Chile, Chili and Chilli would be treated as one country, with minimum manual interference.

What are the best practices when it comes to normalizing user submitted data and is there a scientific term for this? I'm sure this is a common problem.

Again, I used country names just to simplify my question, it can be anything that has possible different spellings.

7 Years
Discussion Span
Last Post by urtrivedi

You just fill up your country database with country code and country name. you will find it very easily. make country code as foreign key.

Then give user dropdownlist to select country. dont allow user to type country name.


I feel like country names is a bad example to explain my problem. Another example: MoMA and Museum of Modern Art. Lexically different but refer to one entity. I understand that it would be impossible to detect this programmatically. I'm just looking for general ways to deal with it, with a hope that someone dealt with something similar before. One solution would be something in the user interface that will allow users to flag MoMA as a variation of Museum of Modern Art, website admin can then apply appropriate changes to the database. But what if there are thousands of records like this?


You may give two options for entering MoMA, one dropdown list from moma database which are entered in database and another text box. tell user to select from list and if not found then they may enter it in textbox.

<select name=moma1><option>........</select> &nbsp;
<input type=text name=moma2>

when you update your database if moma1 is having non null value then ignore moma1. if moma1 is null then insert moma2 in your database.

Edited by urtrivedi: n/a

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.