954,597 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Which Database?

Hi folks. I'm looking for a bit of help and guidance on what you think would be the most suitable DB for a project I'm working on. This isn't a web-based project but as I couldn't find a more suitable sub-forum to post this query, I thought I would ask on here.

My DB will initially hold approximately 300 columns and 3 million rows but this will grow to tens if not hundreds of millions as time passes. Each row will contain information relating to a specific file - author, date created etc... Now the main purpose of this DB is to identify commonalities between entries - a query for example to see if any records contain a created date of between 01/01/2011 - 05/01/2011 AND the same author? Similarly, would it be better to spread my data across multiple tables or just one? The sources for my data will be from various sources but the information will all fall under the same headings (columns) and I'd want to compare the data across all info from the various sources.

Many thanks and enjoy your weekend

scrivomcdivo
Newbie Poster
13 posts since Jan 2011
Reputation Points: 11
Solved Threads: 0
 

300 columns and 3 million rows is not that huge as it sounds. Any standard RDBMS application would do that. To decide the database a lot would depend on how much you can spend on the DB and server(hardware).

debasisdas
Posting Genius
6,872 posts since Feb 2007
Reputation Points: 666
Solved Threads: 434
 

debasisdas is correct as usual, but may I ask what data types are the 300 columns?
If 200 columns are text fields (or any other large data type) and you want them to be searched for some text this might be a challenge for the "hundreds of millions" records.
This decision must also take into account number of concurent users and hits per minute, time allowed for processing and hardware.

adam_k
Practically a Posting Shark
804 posts since Jun 2011
Reputation Points: 256
Solved Threads: 149
 

for this magnitude of data i think you need a real database not like MS-Access, which sometimes called as database also but actually it is not

asaukani
Light Poster
44 posts since Mar 2010
Reputation Points: 13
Solved Threads: 9
 
would it be better to spread my data across multiple tables or just one?

Bearing in mind what debasidas and adam_K have said, For such large volume of data, keeping all in one table may cause difficulty when you want to make some update on the database structure and may also affect the rate at which data is retrieved.

Netcode
Veteran Poster
1,021 posts since Jun 2009
Reputation Points: 43
Solved Threads: 67
 

I was just about to post the same question, having found that BigTable isn't available to the public :confused: (and I don't want to rely on Google via their App engine)

I doubt I'll be dealing with millions of rows, especially not within the same table, however a lot of my fields will be text based.

So what implementations of SQL are on the table?

Since all seem to have their own twist on the SQL standard, I'm guessing theres no cross compatibility between systems, so picking the right one straight off is essential(?).

SQLite seems relatively robust, and isolation makes back-ups easy. Having said that, its probably not the best for multitasking (I'm expecting lots of people writing/reading at the same time). I'm versed in MySQL, but there aren't many platforms that support it.

PostgreSQL looks like the way forward (its also widely supported among public web hosts), unless anyone can say otherwise...?

rom.
Newbie Poster
19 posts since Oct 2010
Reputation Points: 23
Solved Threads: 1
 

Sql server

xuqi
Newbie Poster
23 posts since Jan 2008
Reputation Points: 10
Solved Threads: 2
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You