I have got a problem which has left me banging my head against the wall. Anyways, here is my situation.
I've got to scan for a directory continuously and as soon as I've got a CSV file in it. I need to process that CSV file and pass on some data from it to another application, since this external application can only process only limited data at a time, I can only send some data from the CSV file, when its done processing this data, I need to send some more and so on and so forth.
Till here it looks pretty straight forward, however the problem now is that I could have multiple CSV files coming into the directory, though all of them follow a standard format. To add to the problem, I need to send data to the external application based on the percentage of records from each CSV file. For e.x here is the structure of the CSV file
CustomerID, EmailID, ContactNumber, campaignID
CampaignID remains the same for all records in one file, i.e. each file has its own unique CampaignID.
Now lets say there are 3 files in the directory ,i.e. File A, File B and File C, with the number of records in each file being 5000, 3000, 2000 respectively and that I can only send 300 records at a time to the external application.
Now when I send 300 records, I need to populate it from all 3 files based on their total percentage ,i.e. 150 from file A(50%), 90 from File B(30%), 60 from File C(20%).
I need to also keep track of the records I send, so I don't send them again.
Also the external application gives me a response back later in the day for each campaignID in another CSV file. So I need to store these records from the Files in database as well and provide reporting.
The interface option with the external application is either through ActiveMQ or WebServices. I have experience with both of these so that's not a problem. The problem is in choosing the right approach to send records percentage wise and maintaining a track of it. Also I am not sure when to load Data into the database, should I do it after processing the file or before processing and then read from the database instead of the file.
I need to design this keeping in mind that there could be upto 500 CSV files coming in a short time of an hour with them having a total of million records in them. I will really appreciate for any help.