Friday, July 31, 2009

Global state is rarely a salvation

This post is a response to Domain Events - Salvation from Udi Dahan, where he shows a design to manage the Domain Events pattern. He suggest to use a static class to handle publishing and subscribing, but I argue that a static class solution is not pure as it seems.

Disclaimer: Udi Dahan is an authority in DDD and enterprise application development. I think it is a person who gets things done and I do not contest the validity of his approach in production environments. Active Record often gets the job done also and I used it a lot in the past, but there are cleaner solution arising.

In the Domain Events - Salvation post, Udi proposes the third reworking of his Domain Events implementation. DomainEvents is a pattern similar to Observer or Publish/Subscribe applied to a Domain Model, where the domain objects are observed or observers.
He sustains that you should not never inject anything in Entities, and I agree since Entities are not injectables. Here's how he raises an event:
public class Customer
{
public void DoSomething()
{
DomainEvents.Raise(new CustomerBecamePreferred() { customer = this; });
}
}
Then he says:
We’ll look at the DomainEvents class in just a second, but I’m guessing that some of you are wondering “how did that entity get a reference to that?” The answer is that DomainEvents is a static class. “OMG, static?! But doesn’t that hurt testability?!” No, it doesn’t. Here, look:
and he proceeds to show a (not) unit test where an event is raised.
Now, let's clarify some points:
  • a static class is more or less a singleton. They both have reset-like methods and a unique copy of data globally accessible wherever you want. You can make the class package-private in some languages, but every class have potential access to the singleton instance.
  • Singletons are pathological liars. They hide dependencies and carry global state around, between tests.
  • This solution uses a static class.
We can conclude that this Domain Events implementation is not a beautiful architecture as it was proposed. Let's explore the problems it raises:
  • If you forget to reset() the static class between tests, the test suite can blow up suddenly in a strage place and the fault will be hard to locate. An instance can fire an event that is forwarded to subscribers that do not exist anymore. This is global state in action.
  • Everytime you test an entity class, you're testing also DomainEvents class, since it cannot be mocked.
  • There is a compile-time dependency on DomainEvents. In other languages like Php this would not be a problem since there's no compiling, but Udi's examples use .NET and Customer is now dependent on DomainEvents. A possible solution would be defining events support directly in the language, but it is a big shift in the object-oriented paradigm.
  • Production code will be an action at a distance.
Let's explore the last issue.
The code of a service or controller layer will be similar to:
// in the setup
DomainEvents.Register(
b => creditCardProcessor.charge(b.Customer, b.Amount)
);
// Customer class, an Entity
public class Customer
{
public void bill(int amount)
{
DomainEvents.Raise(new CustomerBilled() { Customer = this; Amount = amount });
}
}
// some controller, client code
Customer c = new Customer();
c.bill(1000); //someone is billed? but who? what happens when I call this? if I not persist this customer instance nothing has happened, right? No.
compare it with:
billingService.charge(c, 1000);
or
c.bill(billingService, 1000);
The billing service does not lie: it requires a CreditCardProcessor in the constructor. Now I see what happens: the credit card of the customer is charged for 1000.
My solution is always the same: if a method of an entity requires an injected dependency, place it on a service class or ask for the dependency in the method parameters, extracting an interface to put in the signature. Asking in the constructor is fastidious since we want to be able of create stateful objects like entities simply by calling new, or whenever we create one we have to ask for a factory.

My conclusion is that using a static class in an entity class and say that you're not injecting anything is right, because you're not applying Dependency Injection at all. The phrase:
The main assertion being that you do *not* need to inject anything into your domain entities.
should be changed to The main assertion being that you do *not* need to inject anything into your domain entities: simply throw in a static class so that your entity stops asking for things and begin looking for things, abandoning Inversion of Control/Dependency Injection.
Again, I do not question that this design gets the job done. But don't say "I'm not injecting anything, that's good" if you're nail down a lightweight newable class to a static infrastructure one.

Wednesday, July 29, 2009

When to inject: the distinction between newables and injectables

Dependency Injection is a great technique for producing an application with decoupled components; but injecting every single object of any lifetime is not useful. Where do you find a Mail that sends itself?

In the last post, I introduced Dependency Injection and show useful cases where it allows classes decouplng. I also wrote about the problem of how to inject a service in a class that has to be instantiated not application wide but in the business logic.
class Mail
{
public function __construct(MailService $service)
{
...
}
}

class CommentsRepository
{
public function sendAMail()
{
$mail = new Mail(...); // what I should pass?
$mail->send();
}
}
The problem is that CommentsRepository sends a mail basing on some input data (if someone has posted a comment), so we cannot instantiate Mail at the bootstrap of the script like we do with CommentsRepository or other request-lifetime objects (session-lifetime in case of a Java application instead of a Php one).
The simplest solution is of course, to use a factory:
class CommentsRepository
{
public function __construct(MailFactory $f)
{
$this->_mailFactory = $f;
}

public function sendAMail()
{
$mail = $this->_mailFactory->createMail();
$mail->send();
}
}
This approach let CommentsRepository depending only on Mail and MailFactory (which could be an interface); dependency injection is correctly applied from the technical point of view. A Factory should be created for every object lifetime: the main services like an MVC stack are request-lifetime objects and should be create by an application factory, while other objects that are created during the execution should taken care by a smaller factory to pass where it is needed. Because a parameter in the constructor expresses dependency, CommentsRepository says that it creates Mail in its methods.
However, a factory for Mail trigger some problems:
  • Every time we create a mail, also for testing purposes, we have to use a factory for obtaining an instance, that shield our code from changes in the constructor of Mail. Considering that an instance of Mail is likely to be passed around as a method parameter and we'll need to mock it every time, this is a smell that the current design has some flaws.
  • In production code, if a class create a mail but does not send it, the former must use a factory and so is coupled to the mail sending classes. For instance, a collaborator of the CommentsRepository that creates a nicely formatted html Mail object.
  • Serialization of a mail to send it in another moment is not possible since at the wakeup it won't know where to find its MailService.
  • If Mail was a string, will you inject a StringFactory? I don't think so.
What is the problem? That a mail knows how to send itself. Have you ever seen a mail that sends itself, or a credit card that process itself (like Misko Hevery, the Google testing guru, denounce often)?
This lead to achieve a fundamental disctintion in business objects: Entities and Services. I write them starting with an uppercase letter to denote that they are precise concepts in object-oriented programming and the meaning of the two words is different from their use in common language.
  • Entities are stateful; their job is to maintain their state and to be saved (and not saving itself) in some place or in memory. Services are stateless: you can instance a MailService many times, but it is always the same service for the end-user.
  • Entities are newables, Services are injectable (this is Misko Hevery terminology). Entities should be create with new operator every time you need them, while Services should be created by a factory or a dependency injection container.
  • Services depends on entities, while the opposite should not happen. It can happen in some programming models when they are coupled to a Service interface.
With these distinction in mind, let's review our design:
  • Mail is an Entity
  • String is an Entity (it is a ValueObject, but in a pure Entity/Service distinction it is an Entity)
  • CommentsRepository is a Service
  • MailService is, obviously, a Service
How do we evaluate such a difference? Mail is a stateful object: we can change its subject, text, formatting. It must be a Entity. CommentsRepository have no properties that maintain a state - we cannot make our repository tall or short, thin or fat. The same is true for MailService: every time a factory creates it the result is always the same and probably it is a singleton in our application or there is a limit on the number of instances if is a generic class.
So Mail should knot know about Services, only Services could:
class Mail
{
public function __construct($title, $text)
{
...
}
}

class CommentsRepository
{
public function __construct(MailService $service)
{
...
}

public function sendAMail()
{
$mail = new Mail(...);
$this->_service->send($mail);
}
}

interface MailService
{
public function send(Mail $mail);
}
Now, CommentsRepository is tied to MailService interface, but it was already coupled indirectly by Mail when we started refactor. Mail does not know about anything, and MailService has a dependency on a concrete but small and compact class.
In the first refactoring, we treated Mail like it were a Service, but a similar deference should be adopted when dealing with more complex classes, while Mail is more or less a Value Object. If we wanted to create lazily the MailService, we would have inject a MailServiceFactory or a DI container in CommentsRepository. Beware of not slide towards an Anemic Domain Model, a procedural approach, putting methods that belong to Mail in Services: Mail is by the way a class, not a C structure or an array of fields.
Hope that this design satisfy you - it is very simple to test and loosely coupled as we should strive for every day.

Tuesday, July 28, 2009

Never write the same code twice: Dependency Injection

Is Dependency Injection difficult? Is it hard to do? Certainly it provides value. Particularly in Php, but also in other object-oriented languages, Dependency Injection gives steroids to the process of class reuse, designing components as loosely coupled objects.

In object-oriented programming, Dependecy Injection is an uprising pattern to achieve Inversion of Control. It consist in breaking the dependencies between classes and inject the collaborators of an object instead of having it find them without external aid. This collaborators are other objects which are passed to the unit in question in the constructor or by setters. There are many benefits of this technique other than code reuse, but today I will talk about this aspect.
DI is a fundamental practice that leads to testable code: Test-Driven Development forces the programmer to write code that is testable at the unit level. Unit testable code is necessarily decoupled code: if a class is totally coupled to another, the unit test became an integration test.
The hardware industry follow this pattern - integrated circuits are designed for testability. They're also very decoupled, as they can be connected on boards to build nearly anything. At the high level, the PC industry is full of standards for interchangeable parts: PCI bus, Usb and more. This decoupling is what let you change your monitor or keyboard without throwing away the pc.
Many posts have been written on Dependency Injection and I prefer to show an example here to get to the point: it is simple to write injectable classes, that can be reused later because their collaborators are wired but not soldered together. The language used here is php, but the concept is universal and any object-oriented language could be adopted. Sorry for the bad indentation but it's blogger fault (unit test it and you will see that it mangles spaces).
Let's start with this class:
class MailApplication
{
public function __construct()
{
$this->_service = new GmailService();
}

public function list()
{
$mails = array();
foreach ($this->_service->getMailFrom('xxx@gmail.com', 'password') as $mail) {
// .. do some work and highlist
$mails[] = $text
}
return $mails;
}
}
Here's a component which is very coupled to the collaborator and that is not unit testable: we cannot call list() if we are not connected to the Internet and when its unit tests are run they will take a lot of time to dialogue with Gmail servers. There's more: we are testing not only our class but also the GmailService class; if one of them breaks, the test does not tell us which one is not satisfying its responsibility. If the interaction is not between two objects but five or six, the test became useless since we do not now where to search for a bug.
From the reuse point of view, imagine we have a customer that uses Yahoo and wants to adopt our beautiful application, but only if he can integrate his emails management. We write a YmailService class, but MailApplication knows only Gmail and we cannot tell it to use YMailService. There's no class reuse or extension.
According to DI principle, we should inject in MailApplication its dependency:
class MailApplication
{
public function __construct(GmailService $service)
{
$this->_service = $service;
}
// ....
}
This way, we could subclass (mocking) GmailService and inject a stub object that returns canned results when its method list() is called. We can also extract an interface for GmailService if we want to have multiple implementations like in the Ymail example:
class MailApplication
{
public function __construct(MailService $service)
{
$this->_service = $service;
}
// ....
}

class GmailService implements MailService ...
The test is now a real unit test and not an integration one. If it fails, we know who to blame: MailApplication since the other component is stubbed and returns fake results.
That's all very good; but how a MailApplication object is built?
This is the job of a factory:
class ApplicationFactory
{
public function getMailApplication()
{
if ($this->_config == ) {
$mailService = new GMailService(..);
} else {
$mailService = ....
}
return new MailApplication($mailService);
}
}
// in the "main" php script:
$factory = new ApplicationFactory($configuration);
$mailApplication = $factory->getMailApplication();
Depending on configuration (again, injected configuration), the factory instance builds an instance of the application. Ironically, it's a factory that builds the major parts of your computer: you're certainly not expected to weld hardware components together, because it's not your job and you do not have the mandatory competence.
The simple approach needs refinement, as it leads to problems: suppose we have a CommentRepository. When you write a comment on this blog, it sends a mail.
class Mail
{
public function __construct(MailService $service)
{
$this->_mailService = $service;
}

public function send($to)
{
$this->subject = '...' . strtoupper(...); // some work
$this->_mailService->mail($to, $this->subject, $this->text);
}
}

class CommentsRepository
{
public function sendAMail()
{
$mail = new Mail(...); // what I should pass?
$mail->send();
}
}
Again, reusing is important because now we send mail with these classes but in the future we can switch to a library which provides html or attachment management, or reuse this CommentsRepository without even sending mails because another application does not require it.
However, we cannot create the Mail object and pass it to the Comments in the constructor, since we don't know at the creation time (bootstrap of application, pre-Mvc dispatch, or whatever) how many Mail object it will need or if they will be used at all. Putting every dependency of Mail class in the constructor of CommentsRepository is ugly and makes CommentsRepository coupled to collaborators that it does not use.
There's more than one solution, and we will see them in the next post. But I tell you in advance that using another factory is not always the best approach.

Monday, July 27, 2009

How to stop getting megabytes of text when dumping an object

Php has a useful var_dump() function that helps debugging and unit tests writing by providing a text dump of all the properties of an object, being them public, private or protected. However, with an object that has a reference to the whole object graph, like a factory or an object that uses an abstract factory, the dump will be recursive and so long that it runs forever (probably it lasts several megabytes). Here's how to avoid it.

In the browser
When you are using a browser, the Xdebug extension comes in your help by generating a html dump truncated at an arbitrary level of depth. The installation on a unix box is pretty simple:
@ pecl install xdebug
run as root or with sudo. Pecl is the dual repository system of Pear and contains C extensions instead of Php userland packages; it is provided along with the pear binary in most of the installation of php.
Then add the following lines to php.ini:
zend_extension="/usr/local/lib/php/extensions/no-debug-non-zts-20090626/xdebug.so"
xdebug.var_display_max_depth=2
Probably the first has been already placed from the installer. The second directive set the maximum depth of var_dump(), which is overloaded by the extension, to 2 levels.
There are other directives that influences the behavior of var_dump(), and more and more for activation of other xdebug features.
Make also sure that html_errors is on in php.ini, or ini_set() it.

On the command line
The command line environment is more complex to use, but it is the ideal for debugging, given its speed and automation capability. The var_dump() usage must be tweaked because the var_dump() overloading, as of Xdebug 2.0, works only with html_errors active, producing an html dump that is ugly to view as plain text.
Assuming you have done the same setup of the browser section above, here's a workaround method to use:
function dump($var)
{
ini_set('html_errors', 'On');
ob_start();
var_dump($var);
$dump = ob_get_contents();
ob_end_clean();
echo strip_tags(html_entity_decode($dump));
}
It will get the html output of var_dump() even in a Cli environment that has the html_errors directive not set, and clean it to display as plain text.
Here's what I get dumping a PersistentCollection of Doctrine 2 now:
[16:54:23][giorgio@Marty:~/svn/doctrine2/tests]$ phpunit Doctrine/Tests/ORM/Functional/DetachedEntityTest.php
PHPUnit 3.3.17 by Sebastian Bergmann.
object(Doctrine\ORM\PersistentCollection)[180]
private '_type' => null
private '_snapshot' =>
array
empty
private '_owner' => null
private '_association' => null
private '_keyField' => null
private '_em' => null
private '_backRefFieldName' => null
private '_typeClass' => null
private '_isDirty' => boolean true
protected '_initialized' => boolean true
protected '_elements' =>
array
0 =>
object(Doctrine\Tests\Models\CMS\CmsPhonenumber)[181]
...
1 =>
object(Doctrine\Tests\Models\CMS\CmsPhonenumber)[182]
The '...' ellipsis are added by Xdebug instead of showing thousands of objects and properties, because the PersistentCollection is provided with a reference to EntityManager for lazy-loading itself.
With the old var_dump() instead:
[16:54:23][giorgio@Marty:~/svn/doctrine2/tests]$ phpunit Doctrine/Tests/ORM/Functional/DetachedEntityTest.php > test.txt
I redirect the output to a text file because it is impossible to navigate from a terminal, but you have to expect some minutes because it takes forever to generate a full dump of the EntityManager and all other objects involved. Without redirecting lines are sent on terminal for a unspecified long time, and are also duplicated because var_dump() is not so smart in detecting recursion and it let happen for a while before stopping to output the same objects over and over. It was very frustrating but now it is easy to view even the most complex and coupled objects.
I know that the more an object is coupled, the less it is well written; but even if it only depends on an interface (a parameter in the constructor), at runtime that interface could be implemented by a very heavy object. Decoupling limits this process, but object must communicate and so they have to keep references to other collaborators: the Publish/Subscribe paradigm is very loose coupled, but dumping a publisher will output it, the blackboard and all the subscribers on Earth.
Summing up, Xdebug can boost your php productivity, and has many other features that will make easy to see what happens under the hood of your php application. Profiling and tracing are easily provided with other xdebug.* directives. If you are a php developer, install it as soon as possible.

Thursday, July 23, 2009

Php 5.3 without screwing up apt-get

Php 5.3 is stable and if you want to experience improved performance and lessened memory usage, and also play with nice tools like Doctrine 2 that are built for this version, you have to install on your box. But a .deb is better than 'make install': it does not sends binaries and configuration files all over your system, without a mean to trace where they end up.

Php 5.3 is a new minor version of Php, so it does not break the strict compatibility of your application. Though, it deprecates some old features and practices and it could cause problems, so you shoud cautious about using it in a production environment. That's what staging exists for.
However, if you choose to install it, your better choice is to use a .deb package that could be easily removed when the distributors catch up and provide a php5 package: 'make install' command issued after compiling will spread files all over the filesystem, without let know you what is being overwritten and created. A .deb will also help upgrading with its simple removal procedure.
This example is based on Ubuntu Jaunty (9.04), but probably will work on other versions and Debian-derivated distros.

Step 0: downloading the source
To build a package, the C source code is needed. A tarball for the release is provided from the php team:
wget http://www.php.net/get/php-5.3.0.tar.bz2/from/a/mirror
tar xvjf php-5.3.0.tar.bz2
cd php-5.3.0
Now we have source files at hand.

Step 1: compiling
Compiling is the fragile and longest part, as the compile time can be very long, especially if you use many bundled extensions and your machine is performing other tasks at the same time.
First, the build configuration has to be created.
./configure --disable-short-tags --with-zlib --enable-bcmath --enable-exif --enable-ftp --with-gd --with-jpeg-dir --with-png-dir --enable-mbstring --with-pdo-mysql --with-sqlite --enable-sqlite-utf8 --enable-zip --with-pear
"with" and "enable" commands are listed using --configure --help and will tell you what bundled extensions are available in this release. The more extension you pull in the compilation, the more time it will take, but you do not want not be surprised with undefined function: mysql_connect. With this configuration, it is not included because PDO is used instead.
The ./configure command will fail often, and it probably means that you lack some source files or libraries needed for the extension to compile and to be linked to. They are normally not installed in the average system, so is something is needed you will probably run commands such as:
sudo apt-get install libjpeg
sudo apt-get install libbz2
depending on the extension choosed. To find the name of the library, see the Requirements section on php.net/manual for the extension in question.
When ./configure does not fail anymore, we can start the compilation:
make
This will take time, so you should consider running when you're not at the pc.

Step 2: building a package
When the compiling is finished, a .deb has to be built from the produced binaries. checkinstall is the command line tool that will do the job for us. Of course if you do not have it, sudo apt-get install checkinstall.
sudo checkinstall -D --install=no --fstrans=no --maintainer=piccoloprincipeazzurro@gmail.com --reset-uids=yes --nodoc --pkgname=php5 --pkgversion=5.3 --pkgrelease=200907011400 --arch=x86
We are telling checkinstall to create a Debian package (-D), to not install it for now, do not use a fake filesystem since this is not necessary, to not include the documentation since we aren't going to distribute this package but only to use it at home.
To be useful, checkinstall must be run as root, so we use sudo.
After this command, checkinstall asks you to confirm the options and by pressing Enter your (probably if you're still reading this simple guide) first deb is created.

Note: use php with apache
If you included the directive --with-apxs, to build a mod_php instance, checkinstall (but also make install) will tell you that almost one LoadModule directive has to exist. This happens because the installation process reads /etc/apache2/httpd.conf, that is not used in Ubuntu, so let's fake it. Add the following line:
LoadModule php5_module /usr/lib/apache2/modules/libphp5.so
to /etc/apache2/httpd.conf; create it if it not exists. You will need sudo to write such a file.
After generating the package, you could clean this file and leave it empty as Ubuntu uses the /etc/apache2/mods-enabled/ folder to maintain the LoadModule directives.

Step 3: avoid conflicts
The compilation process has not touched your system yet, but probably there is an old installation of php hanging around that will get in the way. Let's remove all package and extensions: if you need a particular extension you should have included it in Step 1; if you need a PECL or PEAR package you will grab it later, in the new installation.
This will remove any package whose name contains php:
sudo apt-get remove --purge `dpkg -l | grep php | awk '{print $2}';`
You could also delete /usr/share/php, the PEAR folder. A new pear will be installed if you enable it in Step 1 and old files will point the old php binary and it will be a mess. Take them out:
sudo rm /usr/bin/php
sudo rm /usr/bin/pear
sudo rm /usr/share/php
If you have any PEAR packages, the folder was not removed by apt because it was not empty.

Step 4: install your brand new, fine-tuned package
Assuming that your package is named php5_5.3.0-200907181600_i386.deb, install it:
sudo dpkg -i php5_5.3.0-200907181600_i386.deb
You will find it in the source folder.
If you're using php from the command line you have finished. You have the possibility to run pear on php 5.3 and install what you want.
If you use php with apache, you need to set up the loading of mod_php. Put in /etc/apache2/mods-available two files named php5.conf

AddType application/x-httpd-php .php .phtml .php3
AddType application/x-httpd-php-source .phps
and php5.load:
LoadModule php5_module /usr/lib/apache2/modules/libphp5.so
The content could slightly differ, and the files may already be present (from your previous installation).
Then Apache could use php compiled by you:
sudo a2enmod php5
sudo /etc/init.d/apache2 restart
Enabled the module, and restarted the webserver. Have fun with your shiny new php!

Tuesday, July 21, 2009

Naked objects in Php

The validity of Naked Objects pattern, if implemented successfully, can add value to the php scene and help to answer the "Is Php ready for the enterprise?" question. Currently there is no framework to support this approach in the php world.

Php in the enterprise
The enterprise world has several architectural choices to build complex applications: Model-View-Controller frameworks are currently in vogue, especially in the php world, where Zend Framework, Symfony, Code Igniter, CakePhp and many others all implement this paradigm. But there are also examples in different languages: Ruby On Rails, Django for Python, Spring and Stripes for Java.
There is also a niche where developers are tired of bloated controllers and views that contains logic; they're also tired of dumping their own objects and get some megabytes of data because they have dependencies on all the instantiated objects of the chosen framework: here comes in NakedObjects.
As I wrote last week, the Naked Objects pattern let the framework take care of 3 of the 4 layers of an enterprise application: infrastructure, controllers and views are provided as generic or generated objects. The developer has only to write a Domain Model layer that follows some conventions.
Two frameworks that are named Naked Objects exist: one is written in Java and it is open source, while the other runs on the .NET platform; they are provided by the same people behind nakedobjects.org. Domain-Driven Design is really possible when using this tools.
I couldn't find a similar possibility for the average php programmer: it's a pity because php has the potential to introduce more and more cloud computing in the enterprise applications, let you use only a browser as a client.
There is also JMatter available, for Java.

Why wait?
So I decided to write a new one, a port of the Naked Objects for Java framework. It will have only a web interface anyway, since it is the best fit for php (no local GUI as it does not make sense).
Since I don't want to reinvent the wheel, it will incorporate Doctrine 2 (which I am contributing to) in the persistence part of the infrastructure layer and Zend Framework with its MVC implementation in the upper ones. Since the Naked Objects pattern has to be followed, the Views and Controllers will be provided from this framework, which I named NakedPhp. Obviously I chose an Open Source license, the bullet proof LGPL version 2; supporting DDD is also a key feature which I want to include.
Here are some links that points to the main resources activated on SourceForge to support the project:
Main page
Naked Php wiki
In-browser view of the Subversion repository
I will blog often about the architectural decisions of NakedPhp and its improvements, and when a alpha downloadable package will be available. Of course the repository is open to anyone who wants to take a look at the source code.
If you're tired to write controllers and views, consider the choice of a Naked Objects approach. And if you are a php developer and you want to participate, contact me!

Thursday, July 16, 2009

Lazy loading of objects from database

What is lazy loading? A practice in object/relational mapping which simulate the presence of the whole object graph in memory. There is some techniques for produce this illusion that I explain here.

An example
Suppose we have the classic User and Group objects in a php application. The same process is valid for any language that uses a relational database as a backend, like the Java/Hibernate example, but I will include php code snippets.
Tipically User and Group has a many-to-many relation: every user can subscribe to many groups and every groups has a bunch of users as its members. This means that if all objects are loaded from the database you can navigate like this:

$user = find(42); // find the user with id == 42
echo $user->groups[3]->users[2]->groups[4]->name;

Having a potentially infinite navigability in an object graph is not a great practice, but sometimes many-to-many relationships are needed and the simplest approach is to incur in additional overhead (why there is overhead will be explained later) and provide this simple Api.
The main problem is that we cannot load all the object graph, because it will not fit in the memory of the server and it will take much time to build, depending on the size of the database. Nor we can load an arbitrary subset of it, because the user has the freedom to navigate groups and users to the depth he need: if he arrives at the edge of the graph will see null values/null pointers where objects should be.

Solution: Lazy loading
The Proxy pattern comes in our help:
A proxy, in its most general form, is a class functioning as an interface to something else. The proxy could interface to anything: a network connection, a large object in memory, a file, or some other resource that is expensive or impossible to duplicate.

In the first navigation the proxy will be a subclass of Group. The Data Mapper, without further instructions, will provide this type of object graph:

var_dump($user); // User
var_dump($user->groups[3]); // Group_SomeOrmToolNameProxy
var_dump($user->groups[3] instanceof Group); // true

As said previously, an Orm that provides lazy loading will produce a proxy class to substitute the original one. The code for this class is generated on the fly and will look like this:

class Group_SomeOrmToolNameProxy
{
public function __construct(DataMapper $mapper, $identifier)
{
// saves as field references the arguments
}

private function _load()
{
$this->loader->load($this, $id);
}

public function sendMessageToAllUsers($text)
{
$this->_load();
parent::sendMessageToAllUsers($text);
}
}

The new class proxies to the original methods but calls _load() before, giving the object a usable state. Before a call to _load(), or a call to one of the proxied methods, the domain object has only identifiers field in its internal data structure.
Since it is a subclass of Group, it provides the same interface to the user, that will not even notice that it is not its clean, infrastructure-free Group class.

What does it mean?
It means that the first level objects are fully loaded, while second level ones are placeholders that contains only the information to load themselves. Only if you access them they will go to the database to fetch all their fields:

$user = $em->find(42); // a query on the user table
echo $user->groups[3]->name; // another query is executed on the groups and user_groups tables

We can complicate this pattern as far as we want:
  • We can specify with join() commands on the query objects or 'join' options for the methods of the Data Mapper how far it should go with the initial loading. This way if we know that we need to access the second level of the graph starting from user, only one query is executed. Still if we go to the third level ($user->groups[3]->users[2]->role) without specifying that at the reconstitution of $user, additional queries will be sent to the database and performance will suffer.
  • We can activate or disactivate lazy loading, or logging it to view where it is executed and impacts the performance
Hibernate for Java uses this approach to feature lazy loading of object properties and relations. Doctrine 1.x use a simpler approach because it is based on Active Record and the code is put in the models base class Doctrine_Record.

Today I contributed to the ORM\Proxy namespace of Doctrine 2, the component that generates the proxy classes and objects basing on metadata about the original classes, and non-invasive lazy loading will soon became a reality.

Wednesday, July 15, 2009

A look at technical question on Naked Objects

Naked Objects pattern strips down the layering of your application to a single one: the Domain Model, and poses a lot of questions about how to implement the functionality that were present in the other layers.
As explained before, the Naked Objects pattern implies a framework that takes care of providing the other layers of an application in a completely generic manner, leaving you with only the responsibility to write a complete Domain Model. This raises immediate issues which are present in the original Pawson thesis and which I explain here with a language close to the developer who wants to use a Naked Objects approach, but is scared that it is only a buzzword for braindead, scaffolded user interfaces.
Here are the questions; the text in bold is quoted from the thesis, while the answers are mine.

How can the user create a new object instance, or perform other operations that cannot naturally be associated with a single object instance?
In a rich Domain Model there are Factories for the creation of objects when it is a complex procedure; otherwise, if it has a no-arguments constructor, it can be created directly by the framework. Thus, the new operator is wrapped in a method of another object, the Factory itself.
We have also Services, that connect business objects providing the operation which will cause coupling if placed on them. For example, a method called searchBooks(Author a) will be placed in a SearchingService class that will depend on Book and Author, but will not couple them to each other. In this case SearchingService could be a Repository.
However, the Naked Objects framework for Java take care of this and provides automatic injection of services in the business objects. Injecting in a newable object a service one sounds strange, since a common rule is that the entity should have a reference to a service only the stack (aka passed as a parameter), to allow the programmer to call new for it wherever he wants in the application.
Leave the business object free of the burden of a service is also possible, because by default every method of a service that has a particular domain object as a parameter will be listed in the user interface like it was on the object itself. In our example, the Author list of operations will show also list searchBooks(), passing automatically the object selected as actual parameter.

How does the concept of a generic presentation layer permit alternative visual representations of an object?
A unique visual representation is sometimes the best approach as it makes the application coherent. If this is really needed, the domain object can implements some standard interfaces of the framework that the generic user interface will recognize and use at its best.

How is the concept of a generic presentation layer compatible with the requirement to support multiple forms of user platform?
As long as the framework does the reflection work and provides metadata on the domain model, it is possible to implement many different generic presentation layers. The NO Java framework provides already a DragNDrop interface for local use and a web one, while there are many independent projects that features other interfaces to plug in.
If you really want, you can write manually a user interface: a side effect of working with the Naked Objects pattern is that the model is forced to be behaviorally complete and it simplifies the work of coding a user interface. In fact, usually a program does it for you.

With no use-case controllers permitted, how can naked objects support the idea of business process?
A REST guru will tell you: expose more resources, and instead of writing a controller to sends losts passwords via mail, you will produce a /lost-passwords/username resource that the user will POST or DELETE to.
There is no difference in a Naked Objects approach: you will produce a LostPasswordRequest object with some methods that once manipulated (with the aid of a mail service) will provide the desired behavior. The advantage is that the process will reside in the domain layer and it will be simpler to test; the skinny controller remains skinny as you cannot write it.
The objects like LostPasswordRequest are also called purposeful objects.

If core objects are exposed directly to the user, how is it possible to restrict the attributes and behaviours that are available to a particular user, or in a particular context?
Simply by conforming to an implicit interface, via duck typing. Where there is a setName() method, if sometimes this should not be used, the framework says to provide a allowName() method that returns a boolean. This is an implicit interface in the sense that the developer is not forced to implement a specific interface but only to write methods with signatures that follow a standard pattern. When the method is not present, a default behavior is provided.
Another option is to configure via an acl the single method access level, preserving some method only for certain admin roles.

How is it possible to invoke multiple parameter methods from the user interface?
Talking of the NO Java framework, the DragNDrop interface will not allow you to do this, suggesting to narrow down the parameters to one.* I think if a service method has two parameters it can be viewed as a single parameter one on the two business object involved, as I explained previously.
Not allowing methods with many parameters to be called in the user interface (they can be used internally but not exposed) will help keep the model simple, and introduce Parameter Object. However, the Web interface allow these calls providing a form to compile with the various parameters.
* Update: this is no longer true as the thesis was about the first version of the framework. So this point is not an issue anymore.

Naked Objects could be the next paradigm shift in object-oriented programming. Why writing four layers when you should write only one?

Monday, July 13, 2009

Beppe Grillo's candidature

Note: this short post is written in Italian and in English to benefit citinzens of Italy also.

Beppe Grillo, a comedian, has announced its candidature for the position of Segretario del Partito Democratico; the elections which are taking place are the Italian equivalent of the Obama/Hillary Clinton primaries.
My thought: well, we should vote for someone that believes that phones cook eggs and plastic balls wash clothes.

Nota: questo post è scritto in italiano ed inglese per beneficiare anche i cittadini italiani.

Beppe Grillo, un comico, ha annunciato la sua candidatura per la posizione di Segretario del Partito Democratico; le elezioni che avverranno sono l'equivalente italiano delle primarie fra Obama e Hillary Clinton.
Il mio pensiero: ma certo, dovremmo votare per uno che crede che i cellulari cuociano le uova e le palle di plastica lavino i vestiti.

Saturday, July 11, 2009

Naked objects, DDD and the user interface

The SeparatedPresentation pattern is used to abstract the user interface from the domain model of an application. But it isn't more useful to enrich the user with a view of the domain itself?
The MVC pattern is all about separating the presentation of data. The major frameworks like Zend Framework and Rails are built to take advantage of an MVC stack, which means write your models, controllers and views.
Tipically we have a Post model with a PostController class that governs the operations that can be performed on the model and which views to show.
But there's more than MVC in the world.
What I'm talking about is Naked Objects pattern, that is currently implemented in the Java and .Net world by a framework with the same name. This pattern can be the next paradigm shift in development: in short, the V and C from MVC are completely generic and are built from the M part, the model classes. Also the infrastructure part, like database persistence of entities, are taken care by the framework that includes an Orm in its libraries.
Let's discuss the advantages of this approach:
  • there's no mapping between concepts, as the user interface is automatically created on-the-fly or generated and the user gains insight on the model (and thus on the domain). A driver aware of the fact that there is an engine will use the accelerator at its best. This helps an Ubiquitous Language to take shape.
  • there's no views and controllers to write, obviously.
  • there's no infrastructure coupling, also obviously if we exclude the annotations or xml metadata needed to make Hibernate or another Orm work. ActiveRecord is a reminescence of the past.
  • there's no logic in the controllers: they are automatically generate and delegates to the model. Fat model, skinny controller is achieved by force.
  • the domain model will be built without caring much of the other parts because it is the only thing to write. This allows a DDD approach, and the only attention is to conform to the conventions of the framework.
  • no more repeated validation on the ui: it is simply delegated to the model, where it should reside.
  • if the generated user interface does not satisfy you, what remains to you is a powerful and complete domain model that can be used to write an MVC application (thus using naked objects pattern only to prototype).
There are also issues of giving a complete user interface that cannot be edited by hand: this is not scaffolding. In the next posts I will talk about my journey in learning to use Naked Objects framework for Java, which is distributed as open source software.

Monday, July 06, 2009

Becoming a Doctrine 2 committer

In the last weeks I started to contribute with code to the Doctrine project. Doctrine is an Orm and database abstraction layer for php: it is the default orm for Symfony but it's not coupled to it and I integrated it successfully with Zend Framework. It is a good tool for abstracting the database layer and manage the persistence of the Model part of a MVC paradigm, which to me Zf does not very well with its Zend_Db component.

After opening tickets and upload patches for Doctrine 1.x for some months, I took a glimpse of the potentiality of the 2.x branch (that currently is not even in alpha). The code is in the trunk of the Doctrine repository and borrows concepts heavily from Hibernate:
http://trac.doctrine-project.org/wiki/Doctrine2.0
While Doctrine Query Language, based on Hql, was already present in the currently stable 1.x branch, the new Doctrine infrastructure is based on the Unit Of Work pattern instead of the old Active Record one.
The Active Record pattern is very famous and is used for example in Rails. It is the most simple solution of object relational mapping, which defines a class for every table of the database. A row is represented by an object of that class, that is subclassing Doctrine_Record/ActiveRecord depending on the framework and language you are using.
The Unit Of Work pattern does not substitute by itself the Active Record one, but it's part of the solution of refusing to have the domain objects depending on the infrastructure, in this case the mapping classes. This is very clean from a design and testing point of view as the most complex logic of an application resides in the domain classes (the User and Phonenumber ones for instance), and Doctrine 2 let the developer concentrate on writing and testing these models without (almost) taking into account where they will be persisted; how to map these classes is specified in annotations or via other metadata.
If you have written a bit of code you will probably find very interesting these solution, that is already used in Hibernate for the joy of Java programmers.

Said that, I really want to help the project as I dream of doing Domain Driven Design in php and persistence ignorance is a key point of DDD. So I asked how to start contribution to the project and the last week I've written eight test case class for phpunit plus the example model classes that they use. I was helped by Roman Borschel, the lead developer for the 2.x branch, that teached me how to use annotations properly and fixed my faults in the model classes. Keep in mind that contributing to an open source project you use or will use is a win-win situation because it helps the project going forward and obtain feedback, while it helps you to learn to use the tool well. And I will certainly use Doctrine 2 because I think it along with Zend Framework will kick Rails in the ass.

This monday, I have succesfully closed ticket #2276 and I look forward to learn more and go further. If you are a php developer, come to #doctrine-dev on the irc server freenode.net, or stay tuned to try the alpha releases of Doctrine 2 that will come in the future. Having a real domain model, free of dependencies versus concrete orms, is a dream of the php world that will be realized.

Saturday, July 04, 2009

Keep growth under control

Systems grows and also classes do, but we should respect the Single Responsibility Principle. Rule of thumb for noticing potentially God classes.
In a good designed oop application, every class should do one thing, and doing it well, paraphrasing the unix philosophy.
The Single Responsibility Principle (SRP) is one of the five SOLID principles that governs object oriented programming.

A class should have one, and only one, reason to change.

Real world
But in reality, when you open .java and .php files or whatever else, you often find yourself scrolling up and down the class code to fix disparate things. What can be done to keep classes navigable and simple?
I have my quick dirty approach, that uses unix shell; it takes advantage of the one-to-one correspondency of class and source file, and also of the fact that:
Measuring programming progress by lines of code is like measuring aircraft building progress by weight.
-- Bill Gates
In my opinion, lines of code are a metric, but not the way many thinks: the more lines you commit, the more bloat you introduce. If you absolutely need hundreds of lines of code, you should at least divide them in a bunch of methods, functions, files and classes to keep them manageable.
This cli interface is available on ubuntu and every other linux distribution, since this commands are tipically built-in.

$ wc `find library/Otk/ tests/ -name '*.php'`| sort -n

Obviously you can replace library/Otk and tests/ with folders you like, and .php extension with .java or .c one.
What this command outputs is the full list of source files ordered by line count:
3 6 66 tests/TestHelperMysql.php
3 6 67 tests/TestHelperSqlite.php
8 21 202 tests/stubs.php
273 766 7975 library/Otk/Image.php
274 641 8949 library/Otk/Controller/Scaffolding.php
281 787 9713 library/Otk/Form/Generator.php
10298 24649 315267 total

The output of wc is defined as:
linesCount wordsCount charsCount fileName
so you will notice on the last lines of the output the files that contains the highest number lines. All lines are counted: blanks and comments are included. You might want to use cloc or other more specialized (but not standard) programs to count only lines of real code, but as I said before this is a quick'n'dirty approach.

Where I should start refactor?
Now observe the last lines of this output: if the bigger classes in your project are the ones which you are frequently working on, and you struggle to move up and down in their source code, it can be a good sign of where refactoring is needed. I started from having some classes for forms and repository that had 500+ lines and now I'm down to the half of them for my biggest class, which is Otk_Form_Generator. This also has improved my testcases that went from a full integration test of the form class to the unit testing of form generator and collaborator classes.
Typical tdd worflow is Red - Green - Refactor, but when Refactor is executed after some time, where you should start? Here's a brutal indicator.

Zend_Test and captchas

Do you want test automation for a Zend Framework application that contains captchas?
Zend_Test is a useful component that permits stubbing of http request parameters, methods and headers to allow integration testing of an application. It works by dispatching urls and requests and asserting that the response contains the necessary data and http headers.
This testing automation tool shows problems when encounter a captcha field. Since it is built to prevent doing automatic submission of forms, not being capable of automate the test is a good sign that the captcha generation code is well written.
However, no matter which captcha adapter a form is using, there's a workaround that permits a test to access the captcha value while hidden it from an end user. This trick access $_SESSION variable.

Architecture of Zend_Captcha
A Zend_Form_Element_Captcha uses a Zend_Captcha_Adapter_* instance, that when generating a couple of (id,input) saves the input in a namespace of the session superglobal and sends to the element for rendering only id; an empty text input is added by the element.
So the workflow is viewing a form, extracting the captcha id, pull up from $_SESSION the right input value and submit the form like we were human capable of deciphering it - err, we are human, but phpunit's not.

Here's the code
Add this method in your test case class, or wherever you want, since it has no dependencies. Only requirement is that it must be callable from a Zend_Test_PHPUnit_ControllerTestCase instance. The argument is the html of the page containing the form.
public function getCaptcha($html)
{
$dom = new Zend_Dom_Query($html);
$id = $dom->query('#captcha-id')->current()->getAttribute('value');

foreach ($_SESSION as $key => $value) {
if (ereg("Zend_Form_Captcha_(.*)", $key, $regs)) {
if ($regs[1] == $id) {
return array(
'id' => $id,
'input' => $value['word']
);
}
}
}
}

Typical usage. In this example I assume the element is named 'captcha'.
// viewing form for adding comment via ajax
$this->dispatch("/content/article/{$articleSlug}/add?format=html");
$this->assertQuery('form#otk_content_form_comment');

// adding comment
$html = $this->response->getBody();
$this->newRequest();
$this->request
->setMethod('POST')
->setPost(array(
'author' => uniqid(),
'mail' => 'integration_mail@example.com',
'text' => $text,
'suscription' => true,
'captcha' => $captcha = $this->getCaptcha($html)
));
$this->dispatch("/content/article/{$articleSlug}/add");
$this->assertRedirectTo("/content/article/{$articleSlug}/comments");

And now you can cover with integration testing also form with captchas.
Hope you like it! You know, they say if it ain't tested, it's broken...

Domain model is everything

What is a good domain model? The one that bridges together domain and infrastructure...
According to Wikipedia, Domain Model is a conceptual model of a system which describes the various entities involved in that system and their relationships. Note that is written capitalized, because it is recognized as a pattern.
Domain Model is typical complemented by infrastructure, the set of all the data and code needed for an application to work, which are not part of the model.
An important part of Domain Model concept is that it is a model: it does not take into account all the world variables but only the interesting ones for our application. If you need a home banking system, you don't need to persist in a relational database the eye color of the customers...

A practical approach
Producing a Domain Model is tipically achieved with classes (business object, a very overloaded term as Fowler says), while in the past a domain model was constructed using data structures and functions which processes them. OOP encapsulate data and methods that act on them in the same place: that's why it's so natural to produce object oriented code.
Not all the classes in an OO project are part of the model: the majority of them constitutes infrastructure. Collections classes, persistence mechanism for objects, file writers and readers, serialization systems, controllers, java servlet, iterators, view helpers, image manipulators - all part of infrastructure, the low level of an application.
The most important level is the model, that sits on the infrastructure (but it does not means that should be aware of all implementation detail of infrastructure classes) and provides the functionality needed to run the most of your user stories. It is the layer that unit tests absolutely have to cover. All the business logic should be encapsulated by the model.
Practically speaking, the User, Group and Article classes which you've seen in many posts and blogs as examples of code, are at the heart of a Domain Model. Controllers and views in a php framework (such as Zend Framework for php developers) are already present, as the job of a framework is to provide infrastructure: it's your model that differentiates the application from another.

Iterative work
Choosing the right model could lead to simple development: failing to do so would complicate subsequent actions. Because it's not possible to write code for a complete model from the start, a developer must be open to further model refinements. For instance, in Ossigeno there is a single script that rebuilds an environment (a particular combination of database and php files) from scratch updating the tables that persists the domain objects. Adding a field to an object means only writing a line of yaml and push the regenerate button.
Requirements gathering is not a definite phase: it's iterative. Whenever further requirements and modelling occurs, it's useful to have a simple way to refine the model code with a push-the-button script. The more you refine and regenerate, the more the Domain Model adapts to the real world; the more it adapts to the part of the world that interest us for our application, bridging the gap between code and domain.

A Linear Programming example
In math, the right model can do the difference with solving a problem or not being capable of solve it in a zillion of years.
Suppose you have to maximize a function of some variables (real numbers or integers): suppose also you have some constraints on the variables that cannot be violated; the solution to the problem should be a bunch of numbers that gives the top value for the objective function and does not infringe constraints.
Well, if you choose particular variables that let you write linear constraints, there's good news for you. You can apply Linear Programming and have the problem solved by a computer in polinomial time. If your model is not so well-crafted, and you have to resort to write constraints where two variables are multiplicated or divided, Linear Programming can't help you. And very bad things happen, such as finding a numeric solution taking weeks.
This example shows us that having a good Domain Model means not only drawing a correct painting of the business domain, but also use the right colors and lines to fit in the canvas. Programmatically speaking, the painting is our model while our infrastructure constitutes the canvas or panel where we draw.
A good model is one that can solve our problems, mapping the application entities into the code, while not overengineering the infrastructure. It can discosts from the real world, but the correct naming (we'll see in an example right now) can lighten the path...

Bridging the gap
We said that a good model stands in between the infrastructure and the business, so we can make an example of what is not a good model.
A bad model is too close to the infrastructure: for example if we model our users, groups and articles in a unique big, multidimensional array, the infrastructure used is very simple and is not subjected to particular stress.
A bad model is also too tight to the business domain: if we take a picture of the users, groups and articles world with static classes, deep inheritance trees and generated fields in the objects will make very hard to save and restore our application state.
However, the need for abstracting away the persistence concerns of objects from a model (for example in Domain Driven Design) have lead the developers to find a way to keep in touch to the domain. This is done by extending the infrastructure using ORMs. An Orm is a superset of the common infrastructure of a framework that cares about persistence of the objects involved in an application: it adds to the allowed infrastructure the typical object composition and inheritance features. Persistence ignorance is a wide argument and it is not treated here.

Sql relationships as an example
A common example of a model where the mapping to the real world is not immediate, and is sacrificed to performance and cleanliness, is the handling of many-to-many relationship in relational databases.
A relational database has been the state-of-the-art tool for persisting the state of an application, and today it's still the a widely used one. However, a relational database in its simplest form consists of tables, columns and rows, so it has no notion of how to handle a many-to-many relationship between our User objects and Group objects. If a user could belong to a single group, we would persist the relationship by a field group_id in the User table; but a User can belong to many groups, and conversely a particular group could have many number of users.
The standard solution is to create a new table, where there are only two columns: group_id and user_id, containing the numerical index of the Group and User rows. A User belonging to a particular group is described by an entry in this table, so we'll name it UserSubscription.
Our modelling has introduced another entity, that does not exists in the real world, but it is useful to mantain a homogeneous infrastructure. This is the best crafted model at this stage of development (not using ORMs).
This solution works particularly well in a rdbms context, so why not abstracting it away? This way we'll have a model withour UserSubscription that maps even better to the real world. This is the next step taken by the Orm family (Hibernate, Doctrine, ...), where you can define any type of relationship and let the Orm do the dirty job of persisting it in the database, with the aid of developer metadata, and providing a language similar to Sql to query the infrastructure layer and obtaining the stored User and Group objects, correctly composed.
This is stretching the gap between domain and infrastructure, letting your good model (that sits in the half) to go even near the domain side. That's why OOP is so successful, that's why Orms are a complex answer to a simple problem, but still so successful.

Productivity by forcing yourself

It's hard to get work done when you're not in the zone, and distraction comes in. To help productivity, let's enforce good practices instead.
Focusing on problems in a hard task such as software development can be a difficult process. I have previously blogged about idea refactoring as a measure to help getting in the zone as quickly as possible, by having a refined, written path to follow when you step into unexpected issues.
There are other techniques that can help process improvements. And as Quality Assurance people say, a good process is mandatory to a good product, which in the software perspective is the code you write plus the other artifacts like documentation, and the design that they implements.

Premise
There's a lot of talk on software engineering in the blogosphere; the web is biased towards the technologies and practices which is built with. So a developer like you and me reads many tips on coding for his particular platform, language, framework, and normally says "it is reasonable" after learning the existence of practices like Tdd and build automation. Where he fails is in applying consistently this techniques: if only half of your code is written in a tdd fashion, the restant part will suffer; if you automate compiling of you application but not its archiviation and compression (tar/gzip), you will find yourself executing the same dauting task that you could easily factor out earlier but not having the time or stamina to bring it to an end.
Why does it happen? Because as humans, we are not always in the same state of mind (or level of awareness if you prefer). When we are in the zone (aka flow if you prefer) it's straightforward to build from scratch beautiful features. When we are a bit less concentrated, the temptation to hack some code together comes, like death and taxes. It's nice to say "I have to get up at 6 am tomorrow", but it's another problem to keep the 6-am-yourself to hit snooze and turn away from the dawn light. Many of us lack the discipline, and even the ones that have it are not always in that higher level of awareness where discipline resides.

Constraints as aids
That's why the way to go is to force yourself to do the right thing, the one you would choose in a higher level of awareness moment. You program the alarm clock in the evening, knowing that would be useful to rise early and that will be very productive by your experience: the same applies to software. You know a test-driven development will lead to less coupled and less error-prone code. You just have to start writing testcases before domain classes, even when it seems an overkill; because there's always some complication that will arise and that tdd clears out.
So, I wrote a list of practices that enforce the best practices to be adopted. Some can seem exaggerations, but I assure that they have lead to much satisfaction repaying themselves in a short time.
  • some organizations have subversion commit hooks that runs a language linter against the checked in data, not allowing the commit of source files that contains syntax errors. Obviously logic bugs are not caught, but this is a nice procedure.
  • I often turn off my webserver machine to do much work on the command line with PHPUnit instead of testing in the browser; the same applies with other xUnit toolkits and to everything that forces tdd instead of a top-down approach.
  • turning off the net connection to not be distracted by various websites is a classic: I personally feel closing Im programs and not using applets that pops up when you receive mail is enough. However, I removed my facebook bookmark in the browser bar.
  • my phone is kept far from the workstation. Since it lights up when there are unread messages, it is also out from view. Speakers are turned off to avoid annoying sound.
  • Steve Pavlina says it has a Writing troll - Get back sign that he place on the door to not being disturbed while writing. Locking the door will be enough?
  • Rubberducking: a teddy bear or a rubber duck is placed in some university help desk centers and before being allowed to speak to an human students are force to speak out to the teddy bear. Half of the problems are solved during this process, by the student itself. It has happened to me that, while writing to a mailing list, I reached the solution of my issue before complete the email. And if you're being watched by a duck, you'll be more composed. :)
  • leave only battery power to a laptop could make you under pressure and motivated to get a job done in the remaining time before the charge ending.
These are only example of what works for me. Feel free to share your productivity tips!

Code introspection and how to fool PHPUnit

PHPUnit is a wonderful tool, but this time it was being a bit too smart. Here's a workaround to fix its mocking capabilities.
When it comes to write unit tests for a php project, PHPUnit is the choice. And since I practice TDD, that moment is before writing any other code.
PHPUnit is much more than a clone of JUnit, and it take advantage of the dynamic aspects of the php language to do amazing things. It even saves and restores global variables to test legacy code. However, I came to an issue, ironically involving cloning: not of a testing framework, but of Plain Old Php Objects. I know, also POPO is a cloned word; but some aspects of static typed languages are very useful even in the php context where you call $this->$method with no hesitation.

PHPUnit sometimes is too smart
Before talking about the issue and the workaround that I produced, let's learn some background: PHPUnit integrates a simple mocking framework, that with 3-4 lines of code let you generate a mock objects to inject in the system under test. This is called test in isolation, and it's the point of the whole unit test strategy: insulate an element from his dependencies and strictly testing his conformance to a contract: it accepts some parameters and it calls some methods.
So I was setting up a Doctrine_Query object, that represents a sql query, and pass it to my Repository (object under test) to have it filled with where conditions. The example is not important, but I was putting it in a mocked QueryCreator factory that sits in the repository. The ParametersMapper was responsible for putting conditions in the query and was mocked as well. What is under test is the behavior of Repository, that pulls a query, has it filled with wheres and executes it, returning a Doctrine_Collection.
I discovered that PHPUnit mock implementation clones the parameters that have to be passed to a mocked object. This make sense, as the assertions on the parameters passed are executed in the teardown, when the test is finished (and it can't be otherwise for constraint like 'this method is called once or more times'). The framework is only trying to be helpful, as cloning parameters insulate their registered copy in the mocked from the object that goes around in the testcase and can have its state modified from when it was passed to the mock:
$parametersProcessor = $this->getMock('Otk_Model_Repository_ParametersProcessor');
$parametersProcessor->expects($this->any())
->method('injectParams')
->with($this->identicalTo($query), $this->equalTo($params))
->will($this->returnCallback(array($this, 'mockQuery')));

When my repository calls $parametersProcessor->injectParams($query, ...) internally, $query is cloned and passed to the callback defined:
public function mockQuery(Doctrine_Query $q, array $params)
{
$this->assertEquals(array('Stub_Book b'), $q->getDqlPart('from'));
$q->where('b.title = ?', 'The Bible');
}

And instead of having a query with the title The Bible condition, my repository still have original one. And obviously, the assertion on the number of object it retrieves fails.

What can we do to bypass cloning?
Looking in the PHPUnit_Framework_MockObject_Invocation::cloneObject() method, that is used to duplicate the parameters for mocked methods to conserve them for evaluation at the test end, I saw that if clone $object throws an exception, the object is not cloned. This still makes sense, because if an object defines a __clone magic method that throws an exception it essentially states "I am not cloneable. Maybe I am a big god object or singleton or other bad stuff that does not want to be duplicated, or the world will end". The method catches the exception and saves the original parameter, considering it still best than nothing.
So I wrote a small subclass of Doctrine_Query for the purpose of this test that will throw an exception if cloned:

class Stub_Doctrine_Query extends Doctrine_Query
{
public function __clone()
{
throw new Exception('PHPUnit, you cannot clone me: I need to be passed by reference so that mockQuery() can modify me.');
}
}

... and watched the test fail anyway. Why? Because Doctrine_Query clones itself when the query is executed for some wacky stuff. I have no idea why, but if it wants, it has to. It still.. makes sense. Why the hell would I want to only pass parameters that can't be cloned? If I needed to clone it in a mocked callback, what would I do?
So I proceed with a little introspection on the call stack, and rewrite the query class as this:

class Stub_Doctrine_Query extends Doctrine_Query
{
public function __clone()
{
$backtrace = debug_backtrace(false);
// $backtrace[0] is current function.
$previousCall = $backtrace[1];
if ($previousCall["function"] == "cloneObject"
&& $previousCall["class"] == "PHPUnit_Framework_MockObject_Invocation") {
throw new Exception('PHPUnit, you cannot clone me: I need to be passed by reference so that mockQuery() can modify me.');
}
}
}

debug_backtrace() is a useful php function that returns the stack of calls made to reach the point in execution where it is called. Php manual explains it better: it is like throwing an exception, catching it immediately and read the stack trace.
What __clone overriding does is essentially "if cloneObject of that class called me, throw an exception so it will stop and use the correct instance".
Before screaming about bad practices of ugly code that behaves differently in testing, please note that in the context where it works differently from the normal (being passed as a parameter to a mock) it was bahaving differently, as it was an object passed by copy (this is really ugly). I am just restoring its normal behavior: if I pass a query to a parameter mapper, I expects it works on the same, identical instance I have (something that === would result true with).

When design an api, a library, or what else, try to helpful and useful, but not too smart. In this case a simple option in the construction of the mocked objects could decide whether to clone parameters, without choosing to always clone.
And that's why fooling PHPUnit. :)

The world diffusion of Php

This article was updated at Sep 14, 2009.

Php, the language of choice of many web applications, powers many of the website that you visit every day: for instance, Yahoo use php. A rapid search on Wikipedia tells that Facebook, Wordpress and Digg are developed in php; Wikipedia itself run MediaWiki, a software written in php. And, obviously, Ossigeno is pure php.

A php based application, if backed by enough processing power, can serve up to million of users, and Facebook and Yahoo stands as a live example of what can be done in this language. The classic point of view for scalability investigate these case studies for know-how on mantaining performance and service availability over a minimum threshold, even if thousand and thousand of requests per second hit the servers. The scalability wall is sometimes hit by other web applications, such as Twitter. However, Twitter does not run on a php platform.
From now on, I'll call the problem scaling up: facing a growing user base which will increase traffic, http requests and bandwidth usage.
No application, written using Php or Ruby on Rails, can scale up over a certain point in the user scale without revisiting completely parts of the architecture. Tipically improvements to the stateless nature of php scripts like memcached and apc are integrated in this phase. Netlog went from using a single database to multiple database and hosts, separating static and dinamic pages access and sharding users data over various servers: extreme measures which are used in very high traffic websites, while for small ones throwing more hardware at the application can solve the issues in an easier way.
Php has already prove to being capable, with proper architecture, to scaling up in the most visited websites of the world. Nowadays there's so much talk and discussion about, everyone thinking to produce the new Twitter, which besides the frequent outages has grown more and more. But the real life of the average web developer it's different, and the majority of the php community is constituted by average Joe's of the web.

What if we don't own Yahoo? Or we are not a Facebook engineer? Php would be still the right choice. Because, Php can also scale down. Let's see an example of the meaning of this two words.
A while ago, I was working on a questionnaire that will be used from small firms to see if they are affected by new Italian business laws - and so needing my customer to outsourcing him waste disposal. PMIs in my country, where the legislative landscape change every five years, frequently outsource some processes to not infringe laws and don't have to worry a bit.
I made a simple application that basing on a handful of questions produce a basic proposal and gives out my customer contacts: a single php script that in five minutes I deployed on my customer website.
- No compiling needed: besided the application setup, php scripts do not need any particular treatment to become executable.
- Many compatible webservers, also as free hosting; the ability to deploy example projects or prototypes upon outsorced, free web servers is indeed valuable.
- Easy deploying with ftp, svn or scp: only a way to place file on the webserve is needed.
- Easy editing if the customer want a new revision: even if you don't go trough svn, simple hotfixed can be performed directly on the web server with editors like Coda.
Scaling down is the ability of maintain a constant features to time ratio even when the project become smaller and smaller.
If you want to build a full stack web application, we can discuss on the technology which should be used. But if you are developing a two hours project, the choices are limited. If I were a Java programmer I likely would have spent an entire day on it, while compiling and debugging servlets. While if I notice an error in my php single-file script, I correct it and refresh the page. Be pragmatic and use the simplest thing that could possibly work.

Make sure you use the right tool for the job: demagogical and simplicistic rule, but still a very helpful one. Your productivity is certainly affected by technology choices and small applications need to consider php to not become big and complex ones.

Factory for everything

Why using a factory to produce simple objects? Because new keyword has to be isolated, even if it is only new Form();
We all know (or I hope so) that from a testability point of view, we should write classes that ask for dependencies in their constructor, and have factories that build the object and return a complete instance, ready to do the work. In the tipical php mvc application workflow, this is done in the controller (or in helpers).
What I have produced is action and view helpers that are created during the startup of the script and abstract away the creation of scaffolding forms; all done in the context of Zend Framework mvc implementation.
This approach is particularly useful as some business objects like repositories have dependency on the database connection and some other components: for instance the mailer that sends emails on new comments is required by their repository.
The descripted design is classic test-driven development: ask for dependencies in the constructor, so you can test the class in isolation by passing in mocks objects; then build factories that have methods for creating the top-level objects that you use in your script (or controllers, or views). We're not going to complicate it by talking about dependency injection containers.

Dive into the code
This can be an example of what I'm taliking about.

class My_CommentsRepository
{
public function __construct(Doctrine_Connection $conn, My_Mailer $mailer)
{
// ...
}

class My_Factory
{
public function createCommentsRepository()
{
return new My_CommentsRepository($this->_conn, new My_Mailer());
}
}

Only top level objects need a factory method: supposing that I'm not using My_Mailer directly in my controllers or scripts, there is no need for a public method create it. You can consider a private one if it is used multiple times, or a lazy creation method in the factory that creates only one My_Mailer object to share between the multiple dependent objects.
Given that the factory is created at the startup, this kills singleton as only one instance of My_Mailer class is present in the application, but this uniqueness is controlled by the factory, as it is not a strict responsibility of My_Mailer. And it lets you test the mail sending on certain repository events by mocking out My_Mailer with PHPUnit.

What's new?
Now I get to the point of this article: what about an object with no dependencies?

class My_ScaffoldingForm
{
public function __construct(Doctrine_Record $model)
{
// ...
}

My_ScaffoldingForm is an object that builds inputs and selects to reflect an active record object. The focus of this example is that its unique dependency - the record - is necessarily passed by the application level. There is no point in making a factory method that hides the Doctrine_Record instance as it is not factory responsibility to find the right model to edit.
So we can do something like this in our application script:

// find the model
$record = $someRepository->find($_GET['id']);
// instancing the form
$form = new My_ScaffoldingForm($record);

Wrong. What I suggest is to always wrap creation of an high level object in a factory method, also if the creation process is dumb as the class has no dependency or all the dependencies are not obvious and cannot be generated by an application-wide factory.

class My_Factory
{
// ..other factory methods
public createScaffoldingForm(Doctrine_Record $model)
{
return new My_ScaffoldingForm($record);
}
}

// in the script or controller
$form = $myFactory->createScaffoldingForm($record);

Why? Because the point is in abstracting away the process, much like the Law of Demeter suggest to do with composed/aggregated object methods. If the constructor is going to change, just one line of code has to be edited to accomodate new depencies.
My_ScaffoldingForm (in reality it is named Otk_Form_Doctrine) was becoming larger and larger and as it touches 500 lines of code I felt the need to refactor away some of the responsibility (strategy for creation of form elements basing on column types and for the population of selects with options), and so the constructor become:

public function __construct(Doctrine_Record $model, Otk_Form_Doctrine_Strategy $elementStrategy, Otk_Form_Doctrine_OptionsManager $manager)
{
// ...

but the new Otk_Form_Doctrine() was in all my controller and I had to grep it and substitute it with a call to a new factory method for the form. Imagine what happens if controllers using this scaffolding form were spread in various applications over the internet. This change would break compatibility with previous versions and it would be a disaster.
I have now learned that new not only must not be in models and library classes, but it should not be in controllers too. Put all your news in factories. It will soon pay.