Thursday, July 16, 2009

Lazy loading of objects from database

What is lazy loading? A practice in object/relational mapping which simulate the presence of the whole object graph in memory. There is some techniques for produce this illusion that I explain here.

An example
Suppose we have the classic User and Group objects in a php application. The same process is valid for any language that uses a relational database as a backend, like the Java/Hibernate example, but I will include php code snippets.
Tipically User and Group has a many-to-many relation: every user can subscribe to many groups and every groups has a bunch of users as its members. This means that if all objects are loaded from the database you can navigate like this:

$user = find(42); // find the user with id == 42
echo $user->groups[3]->users[2]->groups[4]->name;

Having a potentially infinite navigability in an object graph is not a great practice, but sometimes many-to-many relationships are needed and the simplest approach is to incur in additional overhead (why there is overhead will be explained later) and provide this simple Api.
The main problem is that we cannot load all the object graph, because it will not fit in the memory of the server and it will take much time to build, depending on the size of the database. Nor we can load an arbitrary subset of it, because the user has the freedom to navigate groups and users to the depth he need: if he arrives at the edge of the graph will see null values/null pointers where objects should be.

Solution: Lazy loading
The Proxy pattern comes in our help:
A proxy, in its most general form, is a class functioning as an interface to something else. The proxy could interface to anything: a network connection, a large object in memory, a file, or some other resource that is expensive or impossible to duplicate.

In the first navigation the proxy will be a subclass of Group. The Data Mapper, without further instructions, will provide this type of object graph:

var_dump($user); // User
var_dump($user->groups[3]); // Group_SomeOrmToolNameProxy
var_dump($user->groups[3] instanceof Group); // true

As said previously, an Orm that provides lazy loading will produce a proxy class to substitute the original one. The code for this class is generated on the fly and will look like this:

class Group_SomeOrmToolNameProxy
{
public function __construct(DataMapper $mapper, $identifier)
{
// saves as field references the arguments
}

private function _load()
{
$this->loader->load($this, $id);
}

public function sendMessageToAllUsers($text)
{
$this->_load();
parent::sendMessageToAllUsers($text);
}
}

The new class proxies to the original methods but calls _load() before, giving the object a usable state. Before a call to _load(), or a call to one of the proxied methods, the domain object has only identifiers field in its internal data structure.
Since it is a subclass of Group, it provides the same interface to the user, that will not even notice that it is not its clean, infrastructure-free Group class.

What does it mean?
It means that the first level objects are fully loaded, while second level ones are placeholders that contains only the information to load themselves. Only if you access them they will go to the database to fetch all their fields:

$user = $em->find(42); // a query on the user table
echo $user->groups[3]->name; // another query is executed on the groups and user_groups tables

We can complicate this pattern as far as we want:
  • We can specify with join() commands on the query objects or 'join' options for the methods of the Data Mapper how far it should go with the initial loading. This way if we know that we need to access the second level of the graph starting from user, only one query is executed. Still if we go to the third level ($user->groups[3]->users[2]->role) without specifying that at the reconstitution of $user, additional queries will be sent to the database and performance will suffer.
  • We can activate or disactivate lazy loading, or logging it to view where it is executed and impacts the performance
Hibernate for Java uses this approach to feature lazy loading of object properties and relations. Doctrine 1.x use a simpler approach because it is based on Active Record and the code is put in the models base class Doctrine_Record.

Today I contributed to the ORM\Proxy namespace of Doctrine 2, the component that generates the proxy classes and objects basing on metadata about the original classes, and non-invasive lazy loading will soon became a reality.

9 comments:

  1. great article, it was really what i was searching for. thanks.

    stoimen

    ReplyDelete
  2. Can you illustrate an example of lazy loading using proxy on instantiating an Aggregate? e.g. one to many entity relationship.

    ReplyDelete
  3. "Only if you access them they will go to the database to fetch all their fields"

    How about using Lazy loading on a DDD context, that would mean that a child entity of an Aggregate needs to have an access to the database assuming no ORM of some sort is used. Please advice.

    ReplyDelete
  4. Yes, you need an ORM in this case so that it substitutes transparently the composed object with a subclass, and the client code or the other classes of the Aggregate ds not gain dependencies on the database. A code sample for Doctrine 2 is here:
    http://css.dzone.com/books/practical-php-patterns/practical-php-patterns-2

    ReplyDelete
  5. Hi, just a follow up on my previous question, sorry I overlooked this line in the proxy class:

    public function __construct(DataMapper $mapper, $identifier)

    with this, child entities in an Aggregate will not rely or directly access the database to lazy load themselves as the DataMapper is injected in the their proxy. Thanks giorgio, nice work.

    ReplyDelete
  6. Hi,

    Lazy loading of objects from database is good when loading entity from repository (i.e. previously created entity and just need to reconstitute in the repository), but what if the entity will just about to be created? Meaning its associate objects are empty yet, and you called the methods which will call the _load. How should this be handled?

    ReplyDelete
  7. I'm not sure I understand the use case, but if you have created the entity in the current request *and* did not save it in the database, it won't show up in the Repository. A Repository is like a Collection: if you don't add it, the object is not there.

    ReplyDelete
  8. Assume that no ORM (e.g. doctrine) that has lazy loading feature is in use on this case and we are at the first step of creating the entity (not getting from repository).

    I assume that an associated entity like Group can be classified to use lazy loading or not (maybe thru annotation or property) but let's say the Group entity was set to use lazy loading, e.g. that Group is an associate object of User, when the User is first created, the Group is also instantiated and is employing lazy loading then later on the code when there's a need to call a Group method which actually proxied, it will call _load to do the lazy loading but the identifier to reference it is not yet there because the entity has been just created. How's this?

    ReplyDelete
  9. If there is no ORM in use, there is no lazy loading, so either you will be calling a method on a newly created Group object (created in the constructor of User), or on a null value, but no ORM will ever interfere until you save these entities. No lazy loading and no proxies are used when you have a new transient graph.

    ReplyDelete