How to handle high-volume inserts?

Question

kunalashar 0 Newbie Poster

15 Years Ago

I'm relatively new to MSSQL, and I'm trying to improve the DB write performance.

I have some legacy Java code that's generating about 1500 records every second. These 1400 records are generated by multiple Java threads. Right now, my approach is to create an SQL statement in each thread, to write each record into the MSSQL DB; this is proving to be terribly slow.

What are my options specific to MSSQL and Java? How do I insert so many records into the DB without seriously impacting performance?

Thanks in advance.

mssql

4 Contributors
5 Replies
176 Views
2 Days Discussion Span
Latest Post 15 Years Ago Latest Post by cartman714

All 5 Replies

cutepinkbunnies 4 Junior Poster

15 Years Ago

Hi there.

Have you by chance attemped any type of batching? I'd queue up the record and try to batch them first, say 50,000 recs for each batch? See if that makes things any easier and let us know what you find.

EDIT: Also check your disk I/O. You can setup filegroups to ease this and distribute ldf/mdf across multiple drives. That is, if you need to keep your code as-is.

cutepinkbunnies 4 Junior Poster

15 Years Ago

Hmm...I've done some beginner level java programming, do they by chance have an ADODB Recordset object? You could load records into the object then push those records into the table when batch reaches capasity.

You could assemble a wicked insert (as you demonstrated) but that will create the same amount of SQL commands as doing them one at a time.

Regarding the CSV files, you may run into even more disk IO issues if you use that, and if the call is not made async it may cause latency in your application.

We seem to be looking at workarounds here (which is productive) although we haven't determined the cause of your problem. Are you familiar with perfmon.msc? You can use it to monitor processor util, disk I/O, read/write queue, etc. Might want to take a peek at it before you invest time in changing your code over just to find the server has the same problem still. :)

Let me know what you find from perfmon and we can determine a fitting solution.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

kunalashar 0 Newbie Poster · Answer 1 · 2009-06-18T11:48:33+00:00

I understand that batch inserts could be faster; it's going to be a challenge to figure out the right "delay" while I cache records before insertion. My client applications need *near" real time read access to the data that's being inserted (the data is actually GPS data - latitude and longitude, amongst 50 fields - of 1500 vehicles worldwide, that are being tracked in real time).

Would it be faster/more efficient to do bulk inserts directly from my Java program, like so:

conn = getMySqlConnection();
        stmt = conn.createStatement(ResultSet.TYPE_SCROLL_SENSITIVE,
                                    ResultSet.CONCUR_UPDATABLE);
        conn.setAutoCommit(true);
        stmt.addBatch("INSERT INTO survey(id, name) VALUES('11', 'Alex')");
        stmt.addBatch("INSERT INTO survey(id, name) VALUES('22', 'Mary')");
        stmt.addBatch("INSERT INTO survey(id, name) VALUES('33', 'Bob')");
        stmt.executeBatch();

Or to store the records in CSV files, and periodically do a BULK INSERT (maybe from a seperate Java program or within SQL server itself... like a cron job), like so:

BULK INSERT OrdersBulk 
            FROM 'c:\file.csv' 
            WITH 
            ( 
                FIELDTERMINATOR = '\t', 
                ROWTERMINATOR = '\n' 
            )

Any suggestions?
TIA.

mail2saion 14 Posting Whiz in Training · Answer 2 · 2009-06-18T20:02:50+00:00

My understanding is creating online indexing is the best tune rather than insert. Because keep in mind that client wants real time data if data volume is 10 crore or more than what happen? Client query never return values. So make online indexing for this table first & then think about how to insert.

For Insert Issue:
I think you process dump periodically. Process the dump using pure c language. I want to let you know that make an exe then use bash script to load data. Concentrate on dump file. When you found dump process this dump. Use parameterized query which is the alternative of oracle bind variable. Performence not a bad at all. Thats all.

cartman714 -42 Light Poster · Answer 3 · 2009-06-19T08:27:52+00:00

cartman714 -42 Light Poster

15 Years Ago

thx helped alot

How to handle high-volume inserts?

Recommended Answers Collapse Answers

All 5 Replies

Recommended Answers