The web application will have a database that consists of millions (over 20mil ... maybe 100mil or more) of hosts (ipv4,ipv6 and domains) and additional data to each host like blacklist, uptime, history, geo data (country, region, city, isp). I could, of course, use other RMDBS system like MySQL, Oracle, PostgreSQL ... but i want speed and I will have a lot of data. I stumbled upon at NoSQL and MongoDB and want to know what are the best way to develop and manage a big database.

I was thinking about using b-tree search on hosts but i will have to sort them and have 4 columns for ipv4 and 8 columns on ipv6. On domains i don't know how i would manage to do that ... i might just use full text search, maybe just a sort will help me.

Database will be constantly updated and used so a clever lock per row must be enabled. Main searches will be by host, geo code, isp, tags and hostId.

hosts table
hostId , host (must be either text or multiple columns) , tags (array) , blacklists (array) , uptime , geoCountry , geoRegion , geoCity, geoIsp , dateInserted

There is also a possibility that database might have a lot of offline hosts and multiple records per host so another history table must be created and whenever a host goes offline or a change occures (blacklist,geo ip data), must go in history.

hosts_history table
hostHistoryId , host (must be either text or multiple columns), tags (array) , blacklists (array) , uptime , geoCountry , geoRegion , geoCity, geoIsp , dateInserted

What would be the best approach and how would can I optimize this project for the best speed as possible?

Recommended Answers

All 2 Replies

Today the servers are switching from hard disk to SSD along with the usual bumps in CPU and RAM. I can't tell from your post if you are using a cloud or self hosting but MySQL is fairly common and about speed. A lot of speed is about your database design and then the query(s). I decline from duplicating the web here but there's a lot to consider so I'll suggest you get a team member dedicated to the database design and using a modern system for hosting.

If you read this page, from MYSQL itself, you will see that your database is nowhere near the size limit.
https://dev.mysql.com/doc/mysql-reslimits-excerpt/8.0/en/table-size-limit.html
It talks about 256TB as its max. On another forum, several years ago, a real expert said that your sort of size would be a small database when asked a similar question. The real limit will be what your server or host sets. If it's a private server, you configure it to do what you want.

You talk about searching - MYSQL creates indexes to speed up searches. The manual says that it can use b-trees for its default InnoDB storage engine - I didn't check, but you might have to specifically tell it to do this.
https://dev.mysql.com/doc/refman/8.0/en/innodb-introduction.html

Remember that the design of the database can make a big difference to its speed in searches.

Assuming you are not entering all this data by hand ;), why not design and build a test database, populate it, then test it?

But as suggested above, if you are a team member then seek out the database expert and talk to him/her. Or hire one for a couple of hours of consultation where you can ask many more questions face to face or via Zoom. Don't immediately go to Fiverr or the like, as you probably will not get the expertise you might need for pocket money, and might spend a lot of time searching for the right person - and recognising them might be tricky. Ask others who they have had as a consultant. Don't ask the question you have asked here, instead ask which of a list of databases would be recommended to solve a problem like yours. Your question is channelled down a route you are thinking of, it should really be asking what is the correct way to tackle the problem. It's like asking how to do something using Java on a website when really the best way would be using javascript (a totally different thing) or a simple CSS hover effect.

commented: correct, the number of records mentioned is an interesting exercise to get performance (indexing etc.) but nowhere near a massive database. +16
commented: thank you for your response, it is interesting, right now i have 230+ mil records in my database and some internal queries are taking like 2h +0
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.