hi all
i have log file with size over 45GB and more than 110 million lines and it is stored in server i want to read the file from the server and insert it to an oracle database in another server but it takes along time so is there is any way to handle these files in a faster way

thanks in advance

ehm .... best way: update regulary, and don't wait until you are with a log file of size 45GB ..
you could automate it, so that it runs every night for a fixed period of time and performing each time a part of the task, until it's completely done, but 45GB ... a log file is (normally) flat text. do you have any idea of how many lines of text you have, if you reach 45GB? personally, I can't see a way to get this done very quickly.

thank for replay
the file contains 110,000,000 lines and this for one day only and until now it 4 days to insert it and i have to read that file every night so do you see if i have to insert the information of the yesterday i have to wait four days

I feel like I'm missing something here. What application logs 110 million events per day - do you work for Google?

Comments
nah, just a company trying to screw with the NSA. can you imagine their analysts going over each and every line of that? :)

I'm with James on this:
either you have way to lot informational logging done, and should switch your logging scope to ERROR only
or, your colleagues wrote some very, VERY crappy code in order to generate that much error logging on a single day.

I've seen very buggy code in that day, but never code that generated over 100MB of error logs in a single day (testing environment) let alone 45GB

... anyway, my guess is that you can read a sequential log file faster than you can add records to a live database, so I would be inclined to have a serious talk with your DBAs about the best possible SQL for that task.

It sounds like you are reading the records from a sequential file, and adding them to a database. Yes?
My guess is that maybe 25% of the time is in the reading, and 75% in the database writing. If that's right, then the best way to improve the speed is to optimise the database structure and the SQL you are using.

One thing that may also help a bit is separating the reads and the writes into two concurrent threads.

Edited 3 Years Ago by JamesCherrill

i think you write because i run the code without the insert statement and it reach 24,000,000 in half hour so how can i insert without influnce the performance

So, it's the database inserts that are the problem. Even so, at 4 days for 110M records that's still over 300 per second - depending on the hardware that may be as good as it gets?
That's not really a Java problem, it's a pure database issue (assuming you're doing anything silly like creating a new Connection for each insert!), and will come down to the table structure and the exact SQL commands.
I would try asking those question in our database forum, then just implement whatever they suggest into your Java code.

If your database is Oracle, did you consider using SQL loader or External Tables ?
Even if your source data needs to be pre-processed in Java it may be a lot quicker to write it to a sequential file that SQL loader can bulk load. I think that's probably the fastest possible way to do this.

Edited 3 Years Ago by JamesCherrill

This article has been dead for over six months. Start a new discussion instead.