I am using mySQL to develop a database collected from internet sources. Python is used for the spider coding, but I'm having issues with the mySQL portion. I'm pretty new to mySQL, so I'm a bit lost.
I need a way to lock a single row of a table and then unlock it. We have multiple machines access a single job table and picking the jobs labelled pending. Unfortunately, without locking and unlocking, the machines will, in milliseconds, pick the same jobs and screw up the whole proccess. We are currently locking the whole table, selecting a job, writing its status as in progress, then unlocking the table. That wastes time. Any ideas on how to make this more efficient? I was thinking, lock the single row being accessed, then the other rows could be accessed by the other machines. I do not, however, have any clue how to do that. Thanks in advance.

Recommended Answers

All 3 Replies

This depends on the database engine you're currently using. By default MySQL uses MyISAM (Also uses MyISAM internally). However, MyISAM only supports table-level locking so when you're writing it locks the entire table. InnoDB on the other hand uses row-level locking along with a host of other features like foreign key constraints

Perhaps you could do it with your own MySQL Function that returned the next free rowid while updating status on the row. Optionally you could combine this with 3rd party MySQL UDF that allows global variables. That would:
- centralise the processing on the db server side, making it faster and avoiding long table locks (if you're not doing it already)
- optionally use a global variable for the id of the next pending row. If the variable is not defined (db restart, etc) load it with the id of the first pending row (eliminating using select on the work table).
- optionally use insert trigger on the jobs table to set the last available id in a global variable. The function could then use this to check that there are free rows available. (further speeding it up a bit).
Haven't done any of this myself, so it's just idle speculation.

I'm not sure of the terminology you've used, as I know nothing about mySQL. I was commissioned to write python code, but I'm helping a small bit with the mySQL planning just as a helpful hand. I know that the way they run the proccess is multiple machines accessing the same database over the internet. Due to this, any sort of update that labelled the row "in progress" of any kind is out of the question. Mere milliseconds is just enough time for more than one machine to try and access the same row, thus creating confusion and chaos. I don't know if you're ideas fall into that catagory, but it sounded like they might. I wish I could understand the speak a bit better. My apologies.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.