Thursday, August 06, 2009

Doctrine 2 now has lazy loading

Lazy loading is the capability of performing a expensive operation on demand, only when it reveals necessary from a client request: in the Orm field, the expensive operation is the loading of an object graph part. In Doctrine 2 I made some architectural choices implementing the proxy approach, which substitute a subclassing object to every association which is not eager-loaded.

Disclaimer: there is not a stable Api for Doctrine 2. This is a design post about how I coded the lazy-loading features with the help of the lead developer Roman Borschel. The name of methods or classes can slightly change in the future.

*-to-one associations: dinamic proxies
The technical solution for a one-to-one or many-to-one association is to use a dynamic proxy, an object whose class is generated on the fly, subclassing the original one. I talked about this approach extensively in the previous post about lazy loading. However, I feel to make some precisations developed while applying in practice the theoretical approach discusses there:
  • the example was about subclassing a Group class to Group_SomeOrmToolNameProxy, and inject a proxy object as the User property ($user->group). This class is generated only the first time the lazy loading capabilities is used with Group as a target entity, and it is saved in a temporary folder to be recycled in subsequent requests. This folder should be cleaned out when rolling out new code.
  • if the dinamic proxy class is already present, there is no need to generate another one and an attempt to redefine it would raise a php fatal error.
  • the proxy object should contain foreign keys of the source object ($user in the example), put there when fetched. The original entity class does not contain fields to store foreign keys as it is persistence-agnostic, so the more cohesive class to place them is the proxy one.
  • I do not enter in the detail on how the proxy loads itself, but a reference of an Doctrine\ORM\Mapping\AssociationMapping subclass is passed to it in the constructor. This allows independent unit testing of the proxy behavior and of the effective hydration of data in a object (load() method on AssociationMapping).
*-to-many associations and the need for a Collection interface
While generating a proxy for a one-to-one or many-to-one association is mandatory to fulfill the same contract of a complete graph, is somewhat simple to satisfy the loading of collections of objects. In Doctrine 2, entities are required to implement collections in their fields with an instance of Doctrine\Common\Collections\Collection interface, and this is commonly done with instancing in the constructor Collections\ArrayCollection; when reconstituting an object from the database, a Doctrine\ORM\PersistentCollection instance is substituted in hydration and it has the mandatory field reference to the AssociationMapping object and the EntityManager to load itself when required to do so.
The Collection interface is not orm-dependent and it has been placed in the Common namespace to let the user build a real persistence-ignorant Domain Model.
A quick solution to implement lazy loading would be to fill the PersistentCollection instance with dynamic proxies. However, this proves to be slow as every object in the collection will issue a different query to the database for retrieving its internal data, and requires to join association tables even if the collection is not used.
Instead, the current implementation injects in the PersistentCollection (which obviously implements Collection) the needed collaborators, as no constructor is specified in the interface:
  • the collaborators are always EntityManager and the AssociationMapping instance.
  • there's no need to store foreign keys in the PersistentCollection since *-to-many relations do not use foreign keys from the source object. In our example, the Groups a User belongs to are joined with the primary key of User, and other collections will do the same.
  • also here the unit testing of the PersistenceCollection trigger capabilities is separated from the loading itself, performed by an AssociationMapping instance. There are also functional tests to run the overall process of lazy loading in its entirety.
Final notes
Remember that lazy loading is a handy feature, but can be easily abused. Performance can suffer when the load is performed as many queries are issued as needed instead of few, eager queries which hydrates the part of graph you need to work on: this is the point of join() in the Doctrine\ORM\Query class. Enabling lazy loading probably will make your php script more chatty but save time when not all the objects are needed.
Doing an eager load can be impossible if the relations are bidirectional, like in the User-Group example: try to simplify your model removing one side of associations when not strictly needed.
I hope you will enjoy using Doctrine 2 and its lazy-loading feature and I think I've done good job of implementing it and explain the architectural issues I've encountered. Feel free to ask any question I have missed out.

8 comments:

  1. Great post!

    What about the identity cache? If I, say, lazy load the group with ID 5 and later explicitly query for that object, will I retrieve the proxy that had already been loaded?

    ReplyDelete
  2. Currently I think proxy objects are not placed in the IdentityMap since they are created independently by a proxy factory. I'll write some tests for this, thanks for the feedback.

    ReplyDelete
  3. But this also means that the assertion

    $group->author_id = 1;
    $group->Author == $authorTable->find(1)

    is not always true, is that correct? I see that this might lead to some problems in complex applications where you need to switch between lazy loading and prefetching from time to time.

    ReplyDelete
  4. Note that the model classes you write are persistent-ignorant, so there's no property author_id. If the partial objects directive is not activated, the property Author will be populated with a proxy for lazy loading or the real object.

    ReplyDelete
  5. How to define an entity to use lazy loading? glad to see an example.

    ReplyDelete
  6. Any time you use an entity without defining the Join in DQL/in your query the collection or object will be a lazy-loaded proxy. You can also ask the Entity Manager for obtaining a proxy instead of a real object.

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. As far as I know collections support lazy loading and addition of elements without load. Not know if the batch size is configurable or all the collection is fetched.

    ReplyDelete