I'm trying to track all changes made to a PHP variable. The variable can be an object or array.

For example it looks something like:

$object = array('a', 'b');

This object is then persisted to storage using an object-cache. When php script runs again.

So when the script runs the second time, or another script runs and modifies that object, I want those modifications to be tracked, either as they are being done, or in one go after the script executes.

eg:

$object[] = 'c';

I would like to know that 'c' was added to the object.

Now the actually code looks something like this:

$storage = new Storage();
$storage->object = array('a', 'b');

second load:

$storage = new Storage();

var_dump($storage->object); // array('a', 'b')

$storage->object[] = 'c';

What I want to know is that 'c' was pushed into $storage->object so in the class "Storage" I can set that value to persistent storage.

I have tried a few methods, that work, but have downsides.

1) Wrap all objects in a class "Storable" which tracks changes to the object

The class "Storable" just saves the actual data object as a property, and then provides __get() and __set() methods to access it. When a member/property of the object is modified or added, the "Storable" class notes this.
When a a property is accessed __get() on the Storable class returns the property, wrapped in another Storable class so that changes on that are tracked also, recursively for each new level.

The problem is that the objects are no longer native data types, and thus you cannot run array functions on arrays.

eg:

$storage = new Storage();

var_dump($storage->object); // array('a', 'b')

array_push($storage->object, 'c'); // fails

So instead we'd have to implement these array functions as methods of Storable.

eg:

$storage = new Storage();

var_dump($storage->object); // array('a', 'b')

$storage->object->push('c');

This is all good, but I'd like to know if its possible to somehow use native functions, to reduce the overhead on the library I'm developing, while tracking changes so any changes can be added to persistent storage.

2) Forget about tracking changes, and just update whole object structures

This is the simplest method of keeping the objects in the program synchronized with the objects actually stored in the object-cache (which can be on a different machine).

However, it means whole structures, like an array with 1000 indexes, have to be sent though a socket to the object-cache when a single index changes.

3) Keep a mirror of the object locally

I've also tried cloning the object, and keeping a clone object untouched. Then when all processing is done by the PHP script, compare the clone to the modified object recursively, and submitting changed properties back to the object-cache.

This however requires that the whole object be downloaded in order to use it.
It also requires that the object take up twice as much memory, since it is cloned.

---

I know this is pretty vague, but there is a quite a bit of code involved. If anyone wants to see the code I can post it, or put it up on an open SVN repo. The project is open source but I haven't set up a public repository yet.

Recommended Answers

All 7 Replies

The easiest way is method 2. Frankly, there would be very little overhead in comparison to the first method, they're both function calls except one (the magic method... method) doesn't have to do a check to see if __get/set exist since it knows you're calling a function.

Now, given that you're not using __get/set you would have to sort of emulate or wrap all of the functions you want to apply to the object like array_sort or array_walk, etc. to check for modifications

The easiest way is method 2. Frankly, there would be very little overhead in comparison to the first method, they're both function calls except one (the magic method... method) doesn't have to do a check to see if __get/set exist since it knows you're calling a function.

Now, given that you're not using __get/set you would have to sort of emulate or wrap all of the functions you want to apply to the object like array_sort or array_walk, etc. to check for modifications

The objects are actually stored in some persistent storage, at the moment I'm using a custom build Object cache but it also interfaces with memcached, APC, etc.

The problem is that the storage could be on an external domain, I'd like to minimize the data sent between the storage and actual PHP script.

Say the object is 1Mb, it would be ok in the PHP script, but not sent back and forth between the storage and php scripts requesting it.

I'm actually starting to think I should take different approaches for differnt types of objects. Maybe different means for large Array, or even constricting the types of objects that can be stored..

The objects are actually stored in some persistent storage, at the moment I'm using a custom build Object cache but it also interfaces with memcached, APC, etc.

The problem is that the storage could be on an external domain, I'd like to minimize the data sent between the storage and actual PHP script.

Say the object is 1Mb, it would be ok in the PHP script, but not sent back and forth between the storage and php scripts requesting it.

I'm actually starting to think I should take different approaches for differnt types of objects. Maybe different means for large Array, or even constricting the types of objects that can be stored..

If you want smaller transfer sizes then now you're staring at a performance vs. size issue. If you want smaller transfers at the cost of performance then the obvious solution is compressed serialization (using either PHP's serialize or another method then using gzip/etc. to compress the data). That method would, just as obviously, be a big hit to performance.

The "perfect" solution would probably be a middle ground but it really just ends up being an engineer decision between can you afford to throw hardware at it or you have a quick connection :)

If you want smaller transfer sizes then now you're staring at a performance vs. size issue. If you want smaller transfers at the cost of performance then the obvious solution is compressed serialization (using either PHP's serialize or another method then using gzip/etc. to compress the data). That method would, just as obviously, be a big hit to performance.

The "perfect" solution would probably be a middle ground but it really just ends up being an engineer decision between can you afford to throw hardware at it or you have a quick connection :)

I started using JSON since it is leaner then serialize() while pretty much the same performance on PHP5.3.

JSON cannot discern an associative array from an object (they are the same thing in JavaScript), so I've opted to use serialize(). Serialize is about twice as bulky in notation though.

Gzip is definitely a good option.

--

I've noticed that Doctrine (http://doctrine-project.org/) will actually copy objects and then compare them to find changes before synchronizing changes with the relational db.

I've been told the objects are copy-on-write but I can't seem to find this true in my tests.

clone($obj); // copies the object value

$obj2 = $obj; // copies the reference to $obj so changes to $obj2 reflect on to $obj

There seems to be no way to copy-on-write like in PHP4.

I'm a big fan of Doctrine, Wage (the lead developer) is a smart guy but Doctrine is kind of heavy-weight. As for copy on write, that's true for every variable in PHP if I remember correctly.

I'm a big fan of Doctrine, Wage (the lead developer) is a smart guy but Doctrine is kind of heavy-weight.

Doctrine looks really good. Looks like a good amount of work has been put into it.

As for copy on write, that's true for every variable in PHP if I remember correctly.

I've always thought the same. But it is quite clear to me that in PHP5 there is no copy on write for objects.

eg:

$obj = new StdClass; 
$obj2 = $obj;

$obj2->name = 'value';

var_dump($obj); // object(stdClass)#1 (1) { ["name"]=>  string(5) "value" }

So modifying the properties of $obj2 modified $obj;

However, when you modify the object itself.

$obj2 = 'hi';
var_dump($obj); // object(stdClass)#1 (1) { ["name"]=>  string(5) "value" }

it does not affect the original object.

Other types seem to be copy-on-write however.

My head is a bit sore.

Looks like:

When you clone() and object, it create a copy-on-write object, but only the immediate properties are copy on write. (A shallow copy is made of the object).

There really should be some better documentation on the behavior of PHP objects.

If you just do normal assignment, then the objects properties become references, but not the object itself.

If you assign with =& then the object itself becomes referenced.

All other Types are copy-on-write unless specifically referenced with &.

I'm still testing this but that seems to be how it goes as far as I can see.

What I'm trying to find is some docs on how ZVal containers are manipulated with PHP5 object assignment and passing. I've got some for PHP4 but not PHP5.

http://derickrethans.nl/files/phparch-php-variables-article.pdf

Any idea where I can get that info?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.