Hi folks. I'm looking for a bit of help and guidance on what you think would be the most suitable DB for a project I'm working on. This isn't a web-based project but as I couldn't find a more suitable sub-forum to post this query, I thought I would ask on here.

My DB will initially hold approximately 300 columns and 3 million rows but this will grow to tens if not hundreds of millions as time passes. Each row will contain information relating to a specific file - author, date created etc... Now the main purpose of this DB is to identify commonalities between entries - a query for example to see if any records contain a created date of between 01/01/2011 - 05/01/2011 AND the same author? Similarly, would it be better to spread my data across multiple tables or just one? The sources for my data will be from various sources but the information will all fall under the same headings (columns) and I'd want to compare the data across all info from the various sources.

Many thanks and enjoy your weekend

rom. commented: yep, had the same question in mind +1

Recommended Answers

All 6 Replies

300 columns and 3 million rows is not that huge as it sounds. Any standard RDBMS application would do that. To decide the database a lot would depend on how much you can spend on the DB and server(hardware).

debasisdas is correct as usual, but may I ask what data types are the 300 columns?
If 200 columns are text fields (or any other large data type) and you want them to be searched for some text this might be a challenge for the "hundreds of millions" records.
This decision must also take into account number of concurent users and hits per minute, time allowed for processing and hardware.

for this magnitude of data i think you need a real database not like MS-Access, which sometimes called as database also but actually it is not

would it be better to spread my data across multiple tables or just one?

Bearing in mind what debasidas and adam_K have said, For such large volume of data, keeping all in one table may cause difficulty when you want to make some update on the database structure and may also affect the rate at which data is retrieved.

I was just about to post the same question, having found that BigTable isn't available to the public :confused: (and I don't want to rely on Google via their App engine)

I doubt I'll be dealing with millions of rows, especially not within the same table, however a lot of my fields will be text based.

So what implementations of SQL are on the table?

Since all seem to have their own twist on the SQL standard, I'm guessing theres no cross compatibility between systems, so picking the right one straight off is essential(?).

SQLite seems relatively robust, and isolation makes back-ups easy. Having said that, its probably not the best for multitasking (I'm expecting lots of people writing/reading at the same time). I'm versed in MySQL, but there aren't many platforms that support it.

PostgreSQL looks like the way forward (its also widely supported among public web hosts), unless anyone can say otherwise...?

Sql server

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.