0

Hi there,

This question relates to data structures, performance and approach to a problem. It is best explained with example so bear with me. For the record, I will be using a combination of PHP, MySQL and Apache server on a dedicated Linux server.

The problem is how we would model the world within a database and use our data structure to efficiently execute queries on this world structure. For instance, take a chunk of the USA part of our model of the world:

The world
- North America
- - USA
- - - California
- - - - Los Angeles
- - - - San Diego
- - - - San Francisco
- - - - ...
- - - Florida
- - - - Orlando
- - - Nevada
- - - - Las Vegas
- - - etc.
- - Canada
- South America

The way I initially thought of doing this would be to have a kind of parent child structure in a MySQL table. Each place would have it's own record with an ID, name, any other stuff, and a parent ID. For example say the USA record had an ID of 81, California's parent ID would be 81 and say California's ID is 101, Los Angeles would have a parent ID of 101 and so on. I also thought that a "type" field would come in handy if I wanted to store different data and represent each different type of place differently, i.e. country, state, city etc.

Now the issue comes when say I land on a page at a low level, like LA for instance. I want to be able to say.. grab everything within this place's country and print them out on the page. This would likely involve a process similar to the following:
- Execute a query to select LA's parent which is California
- Execute a query to select California's parent which is USA (And is the country we are looking for)
- Execute a query to select all of USA's children i.e. all records that have parent ID equal to USA's ID, which are California, Florida and Nevada
- Execute multiple queries (Or potentially a single query with a fair few arguments depending on the number of children USA has) to grab the children of California, Florida and Nevada
- Continue this process all the way down the structure until we no longer get anything else back

This would result in quite a lot of processing (Some kind of caching would probably come into play to save having to repeat it) to do something which you would think would be rather simple. You can imagine in the example of the LA page you would end up with a hell of a lot of data considering there are 52 states, and each state containing a lot of cities and you could even go as far as to include counties, districts (I'm not American so I'm not sure what exactly comes after cities hah) etc. You can imagine the number of queries this would require.

Can anyone think of a more efficient way to model and play around with this data? Perhaps there is a more efficient way to grab all the data I want from this structure or perhaps there's a better structure I could use?

Thanks for your input.

Edited by bops: Added code tags

5
Contributors
9
Replies
39
Views
4 Years
Discussion Span
Last Post by bops
0

The way I would approach this is exactly what you suggested - a good old traditional tree structure saved in a relational database (which mysql is). Relational databases are good at processing this kind of data, all the rest depends on the way the site would be used. If a user selects a city there are a few queries (or one nested query) to get the parents and the data the user is interested in. I think performance-wise this can be handled without problems.

The problem could only be if you have a really busy site (i.e. millions of users - like Facebook, Google etc.). In that case maybe other options are to be looked at. Maybe NoSQL database but I am really not an expert on that topic so I might wait for other opinions to pop up.

0

Make a column called parents and list every parent in some order of inheritance which is sane to you, as a comma delimited list of IDs.

0

Make a column called parents and list every parent in some order of inheritance which is sane to you, as a comma delimited list of IDs

This might break normalization rules.

0

That's fair, it would probably be more sane to make a purely relational table. If the OP has to ask about this though, it makes me wonder what I could suggest that would remain feasible.

0

In huge sites relational databases might not be efficient solution anymore, especially if data is not tightly structured. I read something about NoSQL that approaches the problem more horizontaly but I am realy a noob here so I won't comment on that much :-)

Edited by broj1

0

Wouldn't this type of scenario be a tightly structured scenario though? A relational approach in my opinion would be more than adequate. 1 continent will have 1 or many Countries, 1 country will have 1 or many provinces or states, 1 state will have 1 or many counties or municipalities, 1 county will have 1 or many towns or cities, 1 town or city will have 1 or many neighborhoods.

Really, we're talking about global demographics right? There can be no short cuts in my opinion.

My 2 pennies.

Edited by Stuugie

1

This kind of problem is always kind of interesting. You can either use a dynamic tree structure, which involves recursive queries, which can be a performance issue; or you can do like Stuugie suggests and make a more rigid structure where each level is in itself a table, linked via foreign keys. - Personally, for MySQL at least, I would be more inclined to go with Stuugie on this. It's more controlled and will likely perform much better. You just need to make sure to pick a structure that can be applied globally. (Countries are divided differently.)

I've also seen other, more complex, ways of dealing with this. Somewhere I saw a B-tree implementation of a dynamic hierarchical structure in MySQL, though I can't remember where. That one was interesting, but kind of over the top for most uses I would say. - There are also ways to store the hierarchy in a sort of linked-list manner, with a depth index assigned to each item. That one supposedly performs great, because you can query the whole tree with a single command, but can be a pain to manage.

If your choice of databases isn't limited to MySQL, you can approach this differently. The other major RDBMS (MSSQL, Oracle, PostgreSQL) have a feature known as Common Table Expressions, which allows you to deal with dynamic hierarchies with actual SQL statements, rather than having to involve external programming or proceedures, which is a performance boost. If you are interested in that, this is an excellent article on the topic.

Developing Robust Hierarchical Data In MSSQL

Make a column called parents and list every parent in some order of inheritance which is sane to you, as a comma delimited list of IDs.

First rule of relational database design: No repeating groups of data. Which means: only one value per field, and no repeating columns for multiple, identical fields. - There are often times good reasons to violate the 2NF or 3NF (performance issues mostly), but you simply should not violate the 1NF. Ever.

0

I read something about NoSQL that approaches the problem more horizontaly but I am realy a noob here so I won't comment on that much :-)

Some NoSQL databases would handle parts of this kind of problem better than a RDBMS. For example, MongoDB is a document oriented NoSQL database, where each database is a list of documents (roughly equivalent to a RDBMS row) that are essentially JSON objects. There is no fixed structure to the documents, so the children of each document can be structured however they need to be structured. - It's extremely flexible, yet extremely fast. Fetching the entire world tree would be practically instant.

The only thing that would worry me about this is if you need to search the lowest levels of the tree. Not having a fixed, predictable structure, they may require you to do a full recursive scan of the tree in the program itself to go through it fully. (Unless MongoDB has features to do this. I've never come across them, though I've not really looked for them either.)

0

Thanks for the responses guys. Being that my actual project isn't as big as this and it was more of a future thought I think I will probably go for my current method with a tree modelled in MySQL. Interesting to see some other suggestions though. Thanks again.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.