User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the Database Design section within the Web Development category of DaniWeb, a massive community of 402,699 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 2,412 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Database Design advertiser: Programming Forums
Views: 839 | Replies: 5
Reply
Join Date: Oct 2007
Posts: 7
Reputation: sjvr767 is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
sjvr767 sjvr767 is offline Offline
Newbie Poster

database design advice

  #1  
Apr 7th, 2008
Hi,

Firstly, I'm not a programmer nor a web-developer, but rather a statistician. I am doing equity-related (shares / stocks) research and I need to place my data in a database (MySQL), since it is too big to work with in a spreadsheet.

I have weekly closing price, volume, market cap, PE etc data for about 900 companies over a 16 year period. Obvioulsy, I can't throw it all into one table, so I was considering a table per company with year and week defining the primary keys. But a join accross 900 table in order to find the 20 companies with the lowest market cap in the first week of 1990 would be very cumbersome.

So I'd like to know from anyone here how they would go about designing this database.
AddThis Social Bookmark Button
Reply With Quote  
Join Date: Sep 2007
Location: North Bay Ontario
Posts: 174
Reputation: trudge is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 20
trudge trudge is offline Offline
Junior Poster

Re: database design advice

  #2  
Apr 8th, 2008
First I suggest you sit down with a good book on relational database design ('Database Design For Mere Mortals' by Michael Hernandez and 'MySQL' by Paul DuBois come to mind). We can help you help yourself, but can't do your work for you.
Amer Neely - Web Mechanic
"Others make web sites. We make web sites work!"
Reply With Quote  
Join Date: Oct 2007
Posts: 7
Reputation: sjvr767 is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
sjvr767 sjvr767 is offline Offline
Newbie Poster

Re: database design advice

  #3  
Apr 8th, 2008
Oh, I fully understand that you can't do the work for me and I'll have a look to see if I can't get hold of those books.

Was thinking (read "drawing diagrams") that I should maybe do a table per year. It will still result in a large number of records (less than 47 000), but fewer records than if I dumped all into one table. Making joins across 16 tables (one per year) should be easier than doing it across 900!

My big concern is the type of performance I'll get if I have a table with so 47 000 records... Is that big? Is too big?

Anyway, I'll experiment and report back on my results (might take a few days). Any further pointers are more than welcome in the mean time.
Reply With Quote  
Join Date: Sep 2007
Location: North Bay Ontario
Posts: 174
Reputation: trudge is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 20
trudge trudge is offline Offline
Junior Poster

Re: database design advice

  #4  
Apr 8th, 2008
47000 records is trivial - MySQL can handle billions of records, and many companies use it for large databases and heavy traffic.

I'm not sure what kinds of queries you will be making, but one point to consider in any relational database design, is that it is driven by those anticipated queries. Maybe post a few pseudo-queries that you expect to run into, and that might get some suggestions.

You mentioned some of your data already - 'closing price, volume, market cap, PE '. And 900 companies is also trivial. I would look at holding your various points of interests in one table, along with a company ID. Then you would need a bridge table to tie that with company data in a separate table.

Or you could put all your various points of interest in separate tables (how many of these are you tracking?). Again, bridge tables would be needed to tie these to the various companies.

It depends on how you are going to be using the data.

Indexing can greatly speed up queries too.
Amer Neely - Web Mechanic
"Others make web sites. We make web sites work!"
Reply With Quote  
Join Date: Apr 2008
Posts: 295
Reputation: tesuji is an unknown quantity at this point 
Rep Power: 1
Solved Threads: 41
tesuji tesuji is offline Offline
Posting Whiz in Training

Re: database design advice

  #5  
Apr 15th, 2008
hi,

storing some data (physically) in 900 different tables seems to be a rather bad idea, because you would have union them firstly to draw any useful information from them. Aside from the fact that handling 900 different tables when adding actual data every week isn't that easy-going. I would like to suggest you that give a meaningful example with concrete data, something like this:

week, clPrice, volume, marketC, PE(?), Company, ...
1, 100, 1000, 30%, ??, INTG
2, ....
...
10, ....

You should add further missing but important attributes to that big table. Don't worry about decomposition into smaller tables; this 2nd step is quite simple if one has broadly understand your data. To speed up processing your data you can add some indexes.
btw, 900x52x16 = 748800 records isn't that much data for any RDBMS, even MS Access will cope with this task.

brs, cliff
Reply With Quote  
Join Date: Oct 2007
Posts: 7
Reputation: sjvr767 is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
sjvr767 sjvr767 is offline Offline
Newbie Poster

Re: database design advice

  #6  
Apr 16th, 2008
Hi,

Thanks trudge for all your input. I couldn't get my hands on those specific books, but stumbled across some tutorials on the net that got me started in the right direction.

While my database isn't exactly 2NF it is pretty close, with slight amount of redundancy in order to facilitate the type of queries I'll do. Taking your advise of keeping the type of queries I'd make to the database really helped in the design process.

So, the end result (after much python scripting in order to populate the tables) is quite good. I have a table containing the share code (primary key), full name and status (delisted or not) for all the companies in my study. Then I have a table for each year which contains the share code, week, close, volume, cap, pe, dy and year for each company. Share code and week form the primary key. The year column, though somewhat redundant, is there to ease joins across multiple year tables.

Tesuji will notice that the above design looks quite similar to his suggestion. I appreciate his input too, but separating the company name out of the year tables seemed like the logical thing to do. I have also gotten my hands on some fundamental data, which is yearly, and separating the company details out of the yearly tables helps when adding in that kind of data.

Thanks again for all the help!
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.

DaniWeb Database Design Marketplace
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 

Thread Tools Display Modes

Similar Threads
Other Threads in the Database Design Forum

All times are GMT -4. The time now is 5:45 am.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC