I'm trying to load dxf files CAD into a database for searching.

Some of these files are huge with hundreds of thousand of separate entites.

I've separated the file into the following tables:

  • drawings
  • layers
  • style
  • entities
  • points
  • flags
  • floats
  • text

The drawing table contains a path to the drawing and an ID for each drawing.
The layers has an ID, the Drawing ID, a text name, a colour and a linetype.
The style has an ID, the Drawing ID, a text name.
The enities table has an ID, the Drawing ID, an etity type, layer, style and a location (point).
The points table has the idetity ID to which it belongs and 3 floats indicating x, y and z coordinates
Some enities use flags to further define the entity; a bit may be set to indicate a closed polyline, for example. The flags table contains the ID reference to the entity, the flag type and the flag integer itself.
Simialary, an entity may have additional floats (radius for a circle), or text.

I am writing a cpp program to parse the drawing file and insert the info into a database.

Currently I'm using MySQL:

include <mysql/mysql.h>
struct connection_details
{
    string server;
    string user;
    string password;
    string database;
};
void initDXFDB(struct connection_details mysql_details){
  mysql_init(&mysql);
  if (!mysql_real_connect(&mysql,mysql_details.server.c_str(),
                                 mysql_details.user.c_str(),
                                 mysql_details.password.c_str(),
                                 mysql_details.database.c_str(),0,NULL,0)){
    printf("Failed to connect to database: Error: %s\n", mysql_error(&mysql)); 
  }
}
void load_new_file(string DB, string name, string path){
  struct connection_details mysqlD;
  mysqlD.server = "localhost";  // where the mysql database is
  mysqlD.user = "root";    // the root user of mysql 
  mysqlD.password = "********"; // the password of the root user in mysql
  mysqlD.database = DB;  // the databse to pick
  initDXFDB(mysqlD);
  drawing_ID=insertRecords("drawing","name,path","'"+name+"','"+path+"'");
  insertTecords("entities",...,...);
  ...
  ...
   mysql_close(&mysql);
}

Firstly, would I be better off using a SQLite database? The database will need to be interrogated over the Web, which is why I'm using MySQL at the moment.

Secondly, my test file (which is admitedly 50meg) is taking a few hours to process and presumably there is a more ifficient way of inserting thousands of records other than sequentially.

Recommended Answers

All 6 Replies

I have never used SQLite, so I cannot comment on that one - however, if you are going to be using an internet based system of data management, MySQL is a fine choice.

Without some sort of pre-processing, and making some sort of tokenized batch and sent to either a stored proceedure or a series of functions to untokenize, there is little you can do to run a mass update like this. However, tokenization should be very fast, and since it's just string manipulation up front whatever you are using should also process very fast.

The speed deficit you are experiencing is likely that you are doing thousands of inserts, which causes a lock on the tables, which in turn will have to adhere to any foreign key constraints and locks you have in place. No matter what you do, since you have to insert or update, you will always have these table locks, even with tokenization - however, if you tokenize you take one more thing away from your database software to have to deal with.

On the other hand, if you are pre-parsing, and all you are doing is running the same "insert into A (col1, col2, col3) values (val1, val2, val3);" over and over, you can't really get any faster than that. At that point, you will have to see if your processing code is taking long or the database (or both), and begin to optimize based on which is taking the longest. If you are trying to pull the whole contents of the 50mb file into memory, maybe that's your issue. If you are reading from disk and constantly seeking, maybe that's your issue (try both methods and see which is faster. I would assume loading all into memory first, then parsing, would be faster - and since memory is fairly cheap, even a 500mb file wouldn't hurt too much performance wise).

Another option is, in case you are making multiple calls and opening mutliple sessions (so, each insert into... making a unique call) is to make a buffer and just load it up with all the the SQL and run everything as a batch when you have finished processing the file. This way, you are only technically making a single call to the database (and, only one return trip to the MySQL Server, which your https/traffic may also be a bottle neck if you are making thousands of https requests with each file). The downside of this is if you have any errors, you can potentially kill the batch - so you may want to invest in making stored proceedures to limit the chance of failure. Your other potential limit is the MySQL server itself, and if it has an input buffer big enough to handle the batch request you are trying to make.

Hope that gives you some ideas.

Good luck!

Ryan

Agree with ryantroop: haven't done C++ in a while, but I'd choose to generate a SQL script file and run it on the server (don't forget to use a transaction on the whole thing).

Thanks very much for your comments, both of you.

That gives me a lot to go on and a few of the suggestions yielded useful information when I used some of the terms in a google search. Specifically I've found out that you can insert multiple rows using the syntax INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
Only problem would be accessing the record ID for insertion into the other linked tables.
Also this page mentions ways of importing data from a file which looks as though it would be worth looking at.

Thanks again!

In MS SQL you could do a Bulk Insert. In MySql it is

LOAD DATA INFILE 'C:\MyTextFile'
INTO TABLE myDatabase.MyTable
FIELDS TERMINATED BY ','

Should be much faster then inserting record by record. You just have to have a text file in a regular (record) format. Specify a field terminator of your choice.

Writing to a file instead of the database has certainly improved the speed. What was taking a few hours is now taking a few minutes.

However there is one issue I need resolved.

How can I get the position of the next record's autoincemented ID?

I either need to lock the database, get the next record's ID before the import or do the import and get the ID of the last record after the import's finished.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.