I use a webservice to pull some data from it everyday & update it on my MySQL database. I'll make around 600 calls to this service. The webservice response is very slow (takes around 30 minutes to complete 600 transactions) & during this time, the MySQL is also giving slow response which is affecting other DB operations.

So, I was thinking of an alternate way to store the data obtained from the webservice on filesystem or something temporarily & dump it all at once to database after completing 600 transactions instead of keeping the MySQL connection ON for 30mins.

Which is the best way to store this kind of data? Should I use filesystem or any caching mechanisms etc?

I was thinking of storing it as JSON like this:

{
  "table_name": "webservice",
  "entry": {
    "a": [
      "1",
      "2",
      "3"
    ],
    "b": [
      "1",
      "2",
      "3"
    ],
    "c": [
      "1",
      "2",
      "3"
    ]
  },
  "status": "OK"
}

About “MySQL is also giving slow response which is affecting other DB operations” as I understood you keep open the connection throughout all processes , you could just open the connection make your update – insert or what ever - close it , and then from the start.

Thank you @jkon . What I'm doing is open a connection to MySQL DB, run a while loop where 600 transactions & MySQL inserts/updates happen & then while loop closes & the DB connection closes. Do you think this persistent connection to DB might be a problem & could be solved by opening & closing the connection for each transaction? :)

Edited 1 Year Ago by v1shwa

Hello v1shwa, first of all what you thought is good and could work , even the way you wrote storing a json to a file (I was planning to write a tutorial about application scope caching here but it would be focused in objects and lists of them). If you go that way the caching mechanism doesn't play a role because you will have only one file , that doesn't need to be opened by other processes as you run your operation , and you will just append to it , so there isn't any need for a special caching mechanism.

I wrote about opening and closing the connection because caching is meant to be used primarily for other reasons ( I repeat , you can do it that way appending e.g. in a json “file” for example). Yes keeping the connection open for 30 minutes or so , may play a performance role depending your server and MySQL settings, so opening – update or whatever and closing the connection might seem a more straightforward way.

Edited 1 Year Ago by jkon

A lot of people (including my group at Panasonic) use CSV files to cache data that will be uploaded to the database. You can also use XML files, but they are more verbose, and a lot of databases will require a translater program on the front-end. Most will allow direct import of CSV files if the data is in column order.

Thank you very much @Jkon, That's really helpful. Now, I also want consider @rubberman's suggestion to use CSV files which will be easier for exporting. Thank you @rubberman.

One more thing. You started something about objects & lists. Do you suggest storing in an array? The data will be around 10-15MB for all 600transactions & I'm afraid I might run out of memory using arrays. I'm using PHP, BTW.

No you don't have any reason to store them as objects or objects list for that job. If you are talking about 10 – 15 mb consider again opening and closing the connection or even a csv file if you choose to go that way. It doesn't seem a good idea to use json_decode on purpose with that amount of data.

Alright! I'll try both open&close and CSV, calculate the performance of each method & stick with the better one. Thank you so much for your help! :)

it's been almost a year since this question been asked, but the solution I am about to propose, might help someone in the future (or even the OP if he decides to use it).

I suggest using MySQL partitions.
1) store the data in a CSV file
2) create a partition containing copy of the table you'll be updating
3) update the partition (load data from file)
4) switch partition with table

using this solution, it takes no longer than 7 min to update approx 1100 tables containing between 30k to 12kk entries

Edited 1 Month Ago by Tomaski

This question has already been answered. Start a new discussion instead.