Insert Multiple Records with PDO Prepared Statement

Updated diafol 4 Tallied Votes 12K Views Share

I've been playing with PDO and got quite a shock, so I thought I'd run this past you all.
I love PDO, and I thought that the "prepare once, execute many" idea behind prepared statements was beautiful.

Now previous to using prepared statements, I used to build up a long SQL 'VALUES' clause by looping through an array of "records" to insert, instead of running an INSERT query on every 'record' in the array.
Sensible enough. So should I now dump that for "multiple executes" in my array loop? Well I tried. Here's my experience...

I'm using a simple 'members' table: id (PK/int) | firstname (varchar/25) | surname (varchar/25) | title (tinyint/1)

$d = ['Osian', 'Williams', 1];
$data = array_fill(0, 1000, $d); //create 1000 identical members

$db = new PDO('mysql:host=localhost;dbname=myDB','root','');

$stmt = $db->prepare('INSERT INTO members (firstname, surname, title) VALUES (?,?,?)');

$s = microtime(true);
for($i=0;$i<count($data);$i++) {
    $stmt->execute($data[$i]);
}
$e =microtime(true);

echo ($e-$s);

This test gave repeated results between 3.07 and 3.18 seconds. Perhaps you're wondering why I didn't use a higher number like 10,000?
Well I did and it timed out (>30 seconds). OK, I'm not using transaction/commit, but still that's some serious arse-dragging [EDIT - See First Reply for use of transaction].

So, really, really disappointed with this, I thought I'd try the old way of building a long SQL statement. This isn't quite as straightforward any more as we need to use placeholders to take full advantage of security benefits.

Here's the code:

$d = ['Osian', 'Williams', 1];
$data = array_fill(0, 1000, $d); //create an array of 1000 records

$s = microtime(true);

$db = new mPDO('mysql:host=localhost;dbname=myDB','root',''); //NOTE the extended mPDO class

$stmt = $db->multiPrepare('INSERT INTO members (firstname, surname, title)', $data); //New Method

$stmt->multiExecute($data); //New Method

$e =microtime(true);

echo ($e-$s);

Using a couple of extended classes, each just having a single simple extra method each, I achieved a 1000 inserts in 0.018 seconds. Over 150x quicker.
The 10,000 inserts no longer timed out. They were inserted in 0.15 seconds.

So here's my code. The first class, mPDO just provides the multiPrepare() method which takes the first half of an INSERT statement (without the VALUES clause) as the first parameter and the data array to be inserted as the second parameter.
What it also does is point the prepare() and multiPrepare() methods to create a mPDOStatement object instead of the default PDOStatement object. That's the purpose behind Line 21:

$this->setAttribute(PDO::ATTR_STATEMENT_CLASS, array('mPDOStatement', array($this)));

The second class, mPDOStatement has the multiExecute() method, just taking the data array as its only parameter.

Please note that the an mPDOStatement created by the multiPrepare() method should not be executed again unless there are an exact number of 'records' in the next data array to be inserted.

I know this needs some work before implementing for real, so I'm open to suggestions, criticisms, improvements, etc. Also, did I get anything wrong with the testing? Did I miss something? Is there a better way?

Thanks d.

cereal commented: This is brilliant! Thanks for sharing +13
broj1 commented: Thanks for sharing +11
<?php
/**
 * Class mPDO
 *
 * This class extends the main PDO class by providing just one additional method
 * in order to prepare for adding multiple records at a time
 *
 * @param string $dsn as for PDO, e.g. 'mysql:host=localhost;dbname=mydb'
 * @param string optional $username as for PDO, e.g. 'root'
 * @param string optional $password as for PDO
 * @param array optional $options as for PDO
 *
 * @return mPDO object on success
 */

class mPDO extends PDO
{
   public function __construct($dsn, $username, $password, $options=null)
    {
        parent::__construct($dsn, $username, $password, $options);
        $this->setAttribute(PDO::ATTR_STATEMENT_CLASS, array('mPDOStatement', array($this)));
    }

    public function multiPrepare($sql, $data)
    {
        $rows = count($data);
        $cols = count($data[0]);
        $rowString = '(' . rtrim(str_repeat('?,', $cols), ',') . '),';
        $valString = rtrim(str_repeat($rowString, $rows), ',');
        return $this->prepare($sql . ' VALUES ' . $valString);
    }
}

/**
 * Class mPDOStatement
 *
 * This class extends the main PDOStatement class by providing just one additional method
 * in order to bind multiple records to a prepared statement in a single execution
 *
 * @param mPDO (PDO) object $dbh
 *
 * @return mPDOStatement object on success
 */

class mPDOStatement extends PDOStatement
{
    public $dbh;

    protected function __construct($dbh) {
        $this->dbh = $dbh;
    }

    public function multiExecute($data)
    {
        $bindArray = array();
        array_walk_recursive($data, function($item) use (&$bindArray) { $bindArray[] = $item; });
        $this->execute($bindArray);
    }
}
Member Avatar for diafol
diafol
UPDATE

Just thought I'd add the bit about transactions:

$d = ['Osian', 'Williams', 1];
$data = array_fill(0, 1000, $d); //create 1000 identical members

$s = microtime(true);
$db = new PDO('mysql:host=localhost;dbname=journey','root','ygwterfawr');
$db->beginTransaction(); //**** ADD THIS
$stmt = $db->prepare('INSERT INTO members (firstname, surname, title) VALUES (?,?,?)');
for($i=0;$i<count($data);$i++) {
    $stmt->execute($data[$i]);
}
$db->commit(); //**** ADD THIS
$e =microtime(true);
echo ($e-$s);

Previously this took over 3 seconds.

Using transaction/commit, it goes at 0.2 seconds! About 15x quicker. Amazing what 2 extra lines of code can do! Still nowhere near the build and execute once method (0.018 s).
Using transaction/commit on the build and execute once method didn't have much effect.

David_50 13 Junior Poster in Training

We configured our RDBMS with a flat file or pipe as a selectable table, so we could do an insert-select and just write to the pipe all the data we wanted to insert. We did this with the combination of UDFs and a stored procedure that generated the rows. Each UDF call would take the next cell from somewhere in the input record based on the call's input parameters. The insert rate was way above other methods.

Member Avatar for diafol
diafol

When you say the rate was "above", do you mean that it was quicker? If so, could you please share your solution?
In addition, is the data properly escaped?

David_50 13 Junior Poster in Training

Yes, much faster. It depends on a client running on the same box as the RDBMS, so the local file or pipe can be opened and read in the database engine by the UDF. The UDF opens the file, and calls to the UDF determine if the file has delimited lines or fixed length records and fields. The selection of fields by length for fixed files has the side effect of allowing substringing of delimited fields, as the former is treated as a one delimited field record. A UDF call in the where clause reads the next record or stops the insert at EOF.

This was a very similar technique to the mapping of files as tables, such as in Oracle.

Member Avatar for diafol
diafol

Interesting. My example here is really for trivial form input. For example, inputting multiple entries from a form and then pressing the big Create button, where 'records' would be passed as arrays:

<input name="firstname[]" /><input name="surname[]" />
<input name="firstname[]" /><input name="surname[]" />
<input name="firstname[]" /><input name="surname[]" />
<input name="firstname[]" /><input name="surname[]" />

Could you show a quick example of how you'd do this with your method? It's always good to learn a new technique.

David_50 13 Junior Poster in Training

You could create an "insert ... select ... union select ... union select ...." from your array within batch and statement size limits. The pipe and file input methods are more for ETL situations. The biggest challenge of RDBMS is getting stuff in! A bcp to a staging table in tempdb (global named or by bcp calls within the same session) is also very fast, but again, very ETL oriented.

Member Avatar for diafol
diafol

The pipe and file input methods are more for ETL situations

I had an inkling that may have been the case. But good to know anyhow.

I shall look at the ISUS and see how I get on. Thanks again.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.