Tuesday, December 29, 2009

Packing for Florence

Tomorrow I am leaving for Florence, in the occasion of New Year's Eve. I will be back in 2010 (just a few days :) and I'll return to post regularly on the week of January 4.
Happy partying!

Monday, December 28, 2009

Practical Php Testing exercises

As you probably know there is a Creative Commons licensed ebook available on this blog, Practical Php Testing.
Tomaž Muraus wrote to me yesterday about solving the TDD exercises contained in the various chapters:
I had some time during the past few days so I read your book and solved the exercises found in the book.
I'm pretty sure not all of my solutions are the best and some could probably be improved / changed so I decided to create a Github repository with my solutions (http://github.com/Kami/practical-php-testing-exercise-solutions) so others can fork it and improve / refactor my solutions.
I hope you can find these examples useful, but try not to read a solution before actually trying to solve an exercise. I have not proof read this code but it seems that its quality is good. There is always room for improvement, especially in the stubs section of the tests, and in refactoring of the production code.

Friday, December 25, 2009

Merry Christmas

Saying it with John Lennon, merry Christmas and a happy new year to all the Invisible to the eye readers!

Tuesday, December 22, 2009

Asking the community: a standard for @return array

It would have certainly happened to you to define a phpDocumentor annotation on a function or a method:
<?php
/**
 * @param string $a   name your parameter better than this one
 * @return boolean
 */
function doSomething($a)
{
    // code... 
}
These annotations are parsed by phpDocumentor to automatically produce Api documentation in various formats, such as html and pdf.
It is also true that you can specify a class name as a data type:
<?php
/**
 * @return Zend_Form_Element
 */
function doSomething()
{
    // code... 
}
Since this is a widely employed standard for php frameworks, I decided to rely on @return annotations as the mean to define domain model relationships in my project NakedPhp. This is not different from the relationships phpDocumentor infers to generate links between html documents: for instance the Zend_Form_Element return type definition would be printed as a link to its actual Api documentation html page, to allow fast navigation.
But what happens when you want to specify that a methods return an array or a collection of elements?
<?php
/**
 * @return array
 */
function doSomething()
{
    // code... 
}
Not very clear, as the question that arises is "What is the type of array elements?"
Note that Php is a dynamic language and arrays can be heterogenous, but very often they contain elements of the same type for the sake of consistency; consider for example an array of Zend_Form_Element instances: even if they are different elements they share a common superclass whose methods you can call without fear.
Note also that Php lacks a real collection class or interface, and even if a generic one is provided by a framework, the annotation would not be much clear.
/**
 * @return Doctrine\Common\Collections\Collection
 */
or:
/**
 * @return ArrayObject
 */
At least in the former case you know that there are homogeneous elements in the returned collection, but the situation is the same.
Since arrays and collections are used as dumb containers, the first thing you will do on an array is to iterate on it, and then you will need to know what is the class of the contained elements to find out which methods to call, or which methods accept this kind of elements.
Of course you can do something like this:
/**
 * @return array   of @see Zend_Form_Element
 */
But this is not a standard, and different developers would use different annotations:
/**
 * @return array   this contains Zend_Form_Element instances
 */
/**
 * @return array   of Zend_Form_Element 
 */
These annotations would be parsed by phpDocumentor, but the class name would be mangled in a string and not manageable anymore. It's like scraping a blog versus using a feed.
PHPLint documentation says it recognizes annotations like array[K]E, as in this example:
/**
 * @return array[string]Zend_Form_Element 
 */
They also say that phpDocumentor already support it, but there is no trace of that in its own documentation:
The datatype should be a valid PHP type (int, string, bool, etc), a class name for the type of object returned, or simply "mixed".
The original Naked Objects implementation is written in Java and takes advantage of generics (not available in Php):
/**
 * @return List<FormElement>
 */
When javadoc or Naked Objects parse annotations, they know instantly the collection elements type, thanks to a reasonable standard that imitates the language syntax: I would be glad to do the same in Php, but there is no syntax to refer to.
I turn thus to the community, which comprehends millions of talented developers. My question is: how would you specify @return annotations for containers of elements in a way to include the elements type? I hope to grasp a de facto standard, which I can then require to follow in NakedPhp applications.

Monday, December 21, 2009

The Advent that became Tribulation

There is a nice initiative in the php community whose goal is to put together a set of best practices for php programmers and to having them explained every day of December from experienced developers: the PHP Advent.
However, this year I have been disappointed by the quality of the articles in the advent. First, there is no Add Comment form. What? What advent is this, 1995's one? Are the articles stored as static html files? I remember there were definitely comments in 2007.
After having asked the reason on Twitter, I got this response:
@giorgiosironi we've considered adding it, but like the idea of crosslinking with our authors' blogs when they create an entry. /cc @chwenz
The problem is that there is no duplicate post on the author's blog, at least for the majority of them. The only place where leaving a comment makes sense is on the original article, which admits no feedback.
Now we can start to list what I, and probably many other developers, would have commented on some of the articles.

December 8: Testing with PHPT and xUnit
The author suggests to abandon the standard testing framework for php, PHPUnit, to use PHPT, the testing harness for the php core itself.
PHPT has a very simple plain text interface where you specify text files containing the code to execute and its expected output. Not as variables - as text output, usually of var_dump(). This tool is used to testing php itself because using a real framework with more features, like PHPUnit, means that it will probably break while running the test suite if a regression is introduced in php.
Then what are the reasons to dismiss PHPUnit if you are not developing the php interpreter? The author say "with PHPT you can test for expected fatal errors!". Why I would want an expected fatal error in my application? With a nice blank page shown to the end user? I can't even think of renouncing to PHPUnit.

December 12: JSON Gotchas
Towards the end of the article, the author says something like: You know what is really cool? Passing an anonymous function in a Json string.
var jsonString = "({name: 'value', toString: function() { return this.name; }})";
What's the problem? That this is not Json, it's plain javascript syntax... Json.org says a value is string|number|object|array|true|false|null and an object is something which is enclosed in {}. Do you see anonymous functions in this specification?
This Json string would probably work if passed to eval(), but not to secure Json parsers. The author recognizes that you cannot avoid eval() in this case, but it's not a mystery why: because it's not Json anymore, and calling it Json causes confusion, much like $array = new stdClass(); and other obfuscated code.

December 18: You Don't Need All That
Classes are slow! MVC is even slower! Real men do not use autoloading! We should stick to require_once(). The best part?
Or, put some effort into writing more general, commonly-used classes rather than lots of little ones.
Never, ever, heard of Single Responsibility Principle? Cohesion? Decoupling? That design should not be driven by how fast including source files will be? I actually refactor every class that surpasses 400-500 lines of code.
If you're thinking to follow the advice of discarding autoloading, maybe you'd like to know that PEAR 2 will drop require_once() statements. As for the post, if you do not believe me as a source of information, Roman Borschel, the lead developer of Doctrine 2, says:
@giorgiosironi agreed. complete crap. Another meaningless artificial benchmark. Not even with repro code. that post is a big #fail.
There are very valid posts in this year's PHP Advent, but these hidden "gems" are dangerous and I wonder why no feedback is accepted. Open discussion would benefit the community more than lectures on php practices.

Friday, December 18, 2009

Angry monkeys and other stories

First, a story contained in The Productive Programmer, which I find really interesting and helpful. Neither I nor the author know if the story is real, but it has a powerful moral. Telling a story is often the best mean to communicate an interesting concept.

Angry monkeys
Once upon a time, there was a group of scientists who were experimenting on monkeys. They placed some of them in a closed room, along with a ladder that allowed them to grab a bunch of bananas hanging from the ceiling. The catch was - whenever a monkey went near the ladder, cold water was sprayed in the room. What the scientist get as the result of this experiment? Angry monkeys.
Then they remove one monkey from the group, and put in a brand new animal which was not aware of the cold water trap. His instinct suggested him to climb the ladder... Only to be suddenly beaten up by the other angry monkeys, which were tired of the cold feeling.
Continuing the experiment, they replaced one more monkey, and one more, until they have in the room only animals who have never experienced the cold water trap, which was now turned off. But still, if a monkey approached the ladder, he would have been stopped and beaten by his companions.
The moral is: why some practices are followed today? Because if they were not, a bunch of angry monkeys would yell at you. Some examples?
  • Primitives in Java, supported because it was very strange to create classes for simple values (now recognized as immutable Value Objects).
  • Making every entity class a bean or an Active Record was common practice between angry monkeys, but the situation has changed in the last years.
  • constructors that perform real work, or that even creates other objects, because someone thinks that object-oriented programming means writing programs consting only of the line new ApplicationInstance().
Sometimes a standard is enforced because it provides consistency and interoperability; some other standards are relics of the past, craved by angry monkeys. Dare to not always follow the same road of others.

Here are similar software stories, in the form of fables. The power of metaphors allow us to explain software problems even to naive people.
How to kill a dragon? This is something many knigths would want to know. And what if they could use their preferred programming language to complete the quest?
In Deadline and Technical Debt, a valorous knight attempts to satisfy the requests of the king to marry his daughter, the princess Caroline. Will he be successful?
The Stone Soup story, reported also in the original Pragmatic Programmer book, teach us that people find easy to join an ongoing project and this is a powerful way to cooperate.

Wednesday, December 16, 2009

The object graph

Stefano wrote to me with the intention to expand the discussion on the object graph concept, which I referred to earlier. As always, I think that sharing my thoughts can help other readers and also provide some feedback about these ideas.

A bit of theory
What is a graph? According to my math courses and to Wikipedia, it is an abstract structure defined by two sets. The variant of graph that interests us is the directed graph, because it resembles the Von Neumann representation of objects better than a non-directed one.
The two sets that define a graph are the vertices and the arcs:
V = {2, 3, 5, 7, 8, 9, 10, 11};
A = {(3, 8), (3, 10), (5, 11), (7, 8), (7, 11), (8, 9), (11, 9), (11, 2)};
The elements of A are ordered pairs whose elements are elements of V.
The term directed means that the graph's arcs present a specified direction (if they hadn't, they would have been called edges and the elements of A would have been two-element sets instead of ordered pairs).

How does it apply to computer science?
Well, suppose you have an object-oriented application in execution. The complete data structure is presented to us with various abstractions as an object graph, a graph where the V vertices are objects and the A arcs are their connections by field references. Actually, arcs could be represented by pointers in low-level languages like C++, and by more complicate handlers in higher-level environments such as the Php interpreter or a Java virtual machine.
For instance, consider the FrontController object of an ordinary php framework application. It has references to the Request and Response objects, and to the chosen Controller instance. The controller may have other references - to connection objects, Repositories, User entities and so on. There can be cycles and links spreaded all over the graph, which may be very complicated.
Of course to obtain a useful representation we may omit from a graph some objects which are actually reachable, as they are not "pertinent" to the current discussion. In a formal context, however, we ought not to leave out anything.
The first time I heard the object graph term was on Misko Hevery's blog, used to describe the structure of an object-oriented application.

Why talking about an object graph?
Because it is a mathematical abstraction on the raw pointers and memory segments.
Stefano said in his email:
Probabilmente ancora non siamo riusciti a formalizzare una analisi teoretica sopra gli oggetti che descrivono software. Non so nemmeno se la cosa, allo stato attuale sia verosimile o abbia un senso. Tuttavia, cominciare a pensare in questa direzione credo possa essere un punto di partenza proprio per trasformare la Programmazione da "Arte" a "Scienza",  obiettivo perseguito anche dallo stesso Misko.
Maybe we have not yet formalized a theorical framework on software objects. I don't even know if this would make sense at this point. However, I think beginning to move in this direction can be a starting point to transform programming from Art to Science, an objective pursued from Misko too. (translation of mine)
An abstraction such an object graph let us make statements which do not depend on the technology (Java or Php) but only on the object-oriented paradigm, and that thus will be true in many languages and platforms, or 10 years from now when Php 9.3 and Java 14.0 will be released (provided that we maintain the OO paradigm; considering that Smalltalk is from the 1970s, it may last for a long time).
For instance, here is a list of the concepts which involve a generic object graph:
  • object graph building and business logic separation. To produce seams for easy unit testing where we can inject collaborators, classes that build or expand the object graph should be separated from classes which contain logic.
  • Serialization; given an object O, the graph of all the objects reachable from O should be serialized with it to allow its reconstitution.
  • The state of an application which should be stored in a database is an entity graph, composed of User, Group, Post instances; Orms such as Doctrine 2 implement persistence by reachability on a generic object graph. Reachability is a mathematical property.
  • Why entities should not contain field references to service classes? Because they reach out of the entity graph and complicate the storage process.
  • The Observer pattern can be described as a partitioned graph that improves decoupling between objects of the same partition (observed or observating side). Other patterns are often explained with the help of an Uml class diagram, which is a similar (but more specific) concept.
Note that if we demonstrate a rule or a theorem for an object graph (or a graph with certain characteristics), it will be valid for every other instance of that graph even in different applications. That's why mathematicians love abstractions as much as programmers: they save time to both categories.

Let me know your thoughts. There are many mathematical formulations of the object-oriented paradigm, but talking about a structure such as a graph can help explaining advanced concepts, taking advantage of this simple abstraction.

Tuesday, December 15, 2009

Learning how to refactor

Refactoring is the process of improving the design and the flow of existing, working code by applying common patterns, like extracting a superclass or a method, or even introducing new classes as well as deleting existing ones.
Probably if you are here you have already experienced the refactoring process, but I want to clarify the common iterative method I use, to get feedback and being helpful to developers who are naive in this practice.

This is the general process to learn how to refactor code, which wraps the basic refactoring cycle.
Step 1: get the book Refactoring: Improving the Design of Existing Code by Martin Fowler (a classic) or a refactoring catalogue on Wikipedia or Fowler's website. In the book or in a similar guide, there are two lists which are very boring if read sequentially: smells and refactorings. Smells are situations that arise in an architecture, while refactorings are the standard solutions to eliminate smells. It is annoying to keep something that smells in your office.
Since it is very boring to read the Refactoring book if you have even a small experience with the practice, the best way to extract all Fowler's knowledge is to apply it directly.
Step 2: For S in 'smells':
  • Read about S; understand what is the problem behind a practice that you may have used without worries.
  • Loof for S in one of your personal projects or where you have commit access and responsibility for the code base; get convinced that this smell should be eliminated. If you are not convinced, stop here this iteration and go to the next smell; your existing solution can be pragmatically correct in the context of your style or in your architecture. Note that there are no absolute reference points and refactorings often come in pairs: it is up to you to choose if refactor in a direction or in the opposite one (Extract Method or Inline Method?)
  • Find an appropriate refactoring R; there are multiple solutions that can eliminate a smell. Be consistent in your choice in different places.
  • Make sure there are unit tests for the code you're going to edit. No further action should be taken before you are sure functionality is preserved. This is the answer to the question "Why change something that works?"... Because it will still work, but much better.
  • Apply R in small steps, running focused tests every time to ensure you have not break anything.
  • Once the refactoring is complete, run the entire test suite to find out if anything is not working. Note that failures in points distant from refactored code constitute a smell too: they are a symptom of coupling.
  • svn diff will calculate a picture of your modifications; ensure that debug statements or workarounds are not in place anymore.
  • svn commit (or git equivalent commands) pushes your improvements to the repository. Using version control is also fundamental in case you get in a not recoverable state: svn revert -R . is the time machine button (no, Apple has nothing to do with it) to restore the original code.
The goal of learning various refactoring techniques is to easily see smells in the future, to improve the efficiency of the Refactor phase in the Red-Green-Refactor cycle. Your bricks (classes) are very malleable when fresh, but when they solidifiy it becomes harder to add further modifications: it is good to refactor as much as possible just as you have finished adding code for functional purposes.

Monday, December 14, 2009

How an Orm works

Some readers have been confused by the terms I use often in reference to Object relational mappers, so I want to describe some concepts of Orms and make some definitions. Particularly I want to focus on how a real Orm works and lets you write classes that do not extend anything (Plain Old Php Objects or Plain Old <insert oo-language here> Objects).
The persistence-abstraction standard is Java Persistence Api, which was extracted from Hibernate, and I will refer to it in this post. Doctrine 2 is the Orm which ports the specification in the php world and will be the reference implementation of these concepts in the explanation that follows.

The primary classification of Domain Model classes consists in dividing them in two categories: Entities, which primary responsibility is to maintain the state of the application, and Services, which responsibility is to perform operations that involves more than one Entity, and to link to the outside of the domain model, breaking direct dependencies. This distinction leaves out Specifications, Value Objects, etc., which add richness to a model but are less crucial parts of it. Repositories and Factories are still a particular kind of Service.
I know that primary responsibility of a class sounds bad, since a class should have only one responsibility; though, there is a trade-off between responsibility and encapsulation and an Entity class should certainly hide the workings of many operations that involve only its private data.
Examples of Entity class are User, Group, Post, Forum, Section, and so on. Typical Service class names can be UserRepository, UserFactory, HttpManager, TwitterClient, MyMailer. You can often recognize entities from their serializability.
Imagining that you are going to take advantage of an Orm's features, once you have your Entity classes defined it's up to you to define their mapping to relational tables in a format that the Orm understands - xml, yaml, ini files, or simple annotations. The Orm will use this information not only to move objects back and forth from the database, but also to create and maintain your schema, thus without introducing duplication.
The mapping consists of metadata that describe what properties of an entity you want to store, and how. There are multiple ways to map objects to tables and an Orm should not just invent how to fit them in a database.
Java annotations are objects which provide compile-time checks, while in php they are only comments included in the docblock due to lack of native support. This also means that with Doctrine 2 there is no dependency from the Entity class file to the Orm source code.
This is the simplest Entity I can think of, a City class, complete with mapping for Doctrine 2:
<?php
/**
 * Naked Php is a framework that implements the Naked Objects pattern.
 * @copyright Copyright (C) 2009  Giorgio Sironi
 * @license http://www.gnu.org/licenses/lgpl-2.1.txt
 *
 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * @category   Example
 * @package    Example_Model
 */

/**
 * @Entity
 */
class Example_Model_City
{
    /**
     * @Id @Column(type="integer")
     * @GeneratedValue(strategy="AUTO")
     */
    private $_id;

    /**
     * @Column(type="string")
     */
    private $_name;

    public function __construct($name)
    {
        $this->setName($name);
    }

    /**
     * @return string   the name
     */
    public function getName()
    {
        return $this->_name;
    }

    public function setName($name)
    {
        $this->_name = $name;
    }

    public function __toString()
    {
        return (string) $this->_name;
    }
}
Private properties are accessed via reflection.

A JPA-compliant Orm presents a single point of access to its functionalities: the Entity Manager, which is a Facade class. You should now understand the meaning of its name.
The Entity Manager object usually has two important collaborators: the Identity Map and the Unit Of Work, plus the generated proxy classes which serve for many purposes:
  • the Identity Map is - as the name suggests - a Map which maintains a reference to every object which has been actually reconstituted from the database, or that the Orm knows somehow (e.g. because it has been told to persist it explicitly).
  • Proxies, whose classes are generated on the fly, substitute a regular object in the graph with a subclass instance capable of lazy loading itself if and only if needed. The methods of the Entity class are overridden to execute the loading procedure before dispatching the call to the original versions.
  • The Unit Of Work calculates (or maintains) a diff between the object graph and the relational database; it commits everything at the end of a request, or session, or when the developers requires so.
The shift in the workflow is from the classic ActiveRecord::save() method to the EntityManager::flush() one. It is a developer's responsibility to maintain a correct object graph, but it is the Orm's one to reflect the changes to the relational database. The power of this approach resides in letting you work on an object graph as it were the (almost) only version you know of the Domain model.

Sunday, December 13, 2009

Php technologies' grades

 Last week I was asked by a client:
You used Zend Framework for this small php application. From 1 to 10 [which is the grade framework in Italian secondary schools], how much is this technology sophisticated comparing to the other ones in the php world?
I answered:
8 or 9, I can't think about a more advanced php technology (and with a so much steep learning curve), unless I think about Doctrine 2.
Before Symfony and CodeIgniter developers bite me: given the occasion, I would say quite the same of applications built with your frameworks, since I'm making a comparison with the legacy code I had to deal with in the past.

Stimulated by the question, I decided to rank common technologies and practices I (and many developers) chose (and still choose) for php applications architecture. Note that these ranks describe the complexity and inherent power of the different approaches/technologies, but by no means low ranked solutions should be deprecated: they still get the job done when something more elaborated it's not necessary, and we are not in the mood of killing a fly with the Death Star.
Here is my evaluation:
  • 1: Welcome, today is <?php echo date("Y-m-d"); ?>. 0 is the same but with <? instead of <?php.
  • 2: Html page with embedded php code. Very useful in 1990s and still work sometimes because of its simplicity for temporary and corner-case pages.
  • 3: Set of semi-static php scripts with no code reuse.
  • 4: header.php and footer.php applications; this is the structure of the website my application partners with.
  • 5: header/footer inclusion but with business logic reuse, for instance application that comprehend modules, classes and functions.
  • 6: Procedural open-source frameworks and Cms, for instance Drupal 6. They are becoming not pretty to the eye, but they do the job.
  • 7: Object-oriented applications, that rely for example on in-house frameworks.
  • 8: Zend Framework 1.x applications: object-oriented, more or less testable, little duplication when done right. But the inherent singletons prevent them to rank higher. See you in 2.x...
  • 9: Doctrine 2: Data Mapper for persistent-agnostic domain models.
  • 10: No such technology has been produced in php at the moment, primarily because of the slowly real object-oriented paradigm adoption.
Or do you think there is already a 10 to assign?

Saturday, December 12, 2009

Saturday question: testing in .NET

A reader wrote to me asking resources for learning how to implement Test-Driven Development in an .NET environment:
Please pardon me for my unsolicited email, but I saw your blog and I believe that you are one of the best in the software community. My name is [omissis], and I'm a C#/ASP.NET programmer from the Philippines, but I really want to learn and understand Unit Testing and TDD the right way. I didn't take Computer Science or a similar course in college. I really want to learn software design and development, on how to develop an application from ground-up using TDD. I hope you can give me advices, since I'm not able to afford a good book.
I am no particular expert in C# since I mostly work in php. As you may know, I have written a free CreativeCommons-licensed ebook on php applications testing.
For the .Net case, if you are a beginner, there is a book I reviewed which is a good starting point: The Art Of Unit Testing, which has lots of .NET examples included.
It costs $26 on Amazon now, which you can consider an investment since the knowledge contained could make you earn more in the future. It is a very complete book.
You can also obtain the book for free via other means, such as public libraries. I personally use a lot my university's library to look for information in technical books like Design Patterns when I am not going to buy a copy at the moment, as they are not diffused in normal libraries. You already pay for libraries with your taxes so you'd better take advantage of them.

Once you have the basis, the best way to improve is practicing... Someone said that a developer becomes proficient in unit testing after having written 1500 tests.
For general advice, you may also follow this blog and the Google Testing one, although they are focused on technologies different from .NET.
The principles of testable and decoupled design are the same in all object-oriented languages, and the distinction between C# and php resides in how and when an application object graph is created.
I hope you can find this references useful to start your journey.

Thursday, December 10, 2009

Who else wants to have free documentation? A readable test code sample

It's fun to TDD classes for your projects because you try your class as its client will do even before both are written at all: interfaces and abstractions are by far the most important part of an application's design. But often test classes grow, and we should ruthlessly refactor them as we will do with production code. One of the most important factors to consider in test refactoring is preserving or improving readability: unit tests are documentation for the smallest components of an application, its classes. New developers that come in contact with production classes take unit tests as the reference point for understanding what a class is supposed to do and which operations it supports.
To give an example, I will report here an excerpt of a personal project of mine, NakedPhp. This code sample is a test case that seems to me particularly well written.

The NakedPhp framework has a container for entity classes (for instance User, City, Post classes). This container is saved in the session and it should be ported to the database for permanent storage when the user wants to save his work.
This is the context where the system under test, the NakedPhp\Storage\Doctrine class, has to work: it is one of the first infrastructure adapter I am introducing. In the test, User entities are stored in a container and they should be merged with the database basing on their state, which can be:
  • new (not present in db);
  • detached (present in db but totally disconnected from the Orm due to their previous serialization; no proxies are references from a detached object and these entities are not kept in the identity map);
  • removed (present in db, but should be deleted as the user decided so).
The NakedPhpStorage\Doctrine::save() method takes a EntityCollection instance and processes the contained objects bridging the application and the database with the help of the Doctrine 2 EntityManager.

This test class is also an example about how to test your classes which require a database, such as Repository implementations. I usually create throw-away sqlite databases, but Doctrine 2 can port the schema to nearly every platform. Using a fake database allows you to write unit test that run independently from database daemons and without having to mock an EntityManager, which has a very long interface. Classes that calculate bowling game scores are nice but classes that store your data a whole lot more.
Finally, I warn you that this example is still basic and will be expanded during future development of NakedPhp. What I want to show here is where the Test-Driven Development style leads and an example of elimination of code duplication and clutter in a test suite.
<?php
/**
 * Naked Php is a framework that implements the Naked Objects pattern.
 * @copyright Copyright (C) 2009  Giorgio Sironi
 * @license http://www.gnu.org/licenses/lgpl-2.1.txt
 *
 * This library is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public
 * License as published by the Free Software Foundation; either
 * version 2.1 of the License, or (at your option) any later version.
 *
 * @category   NakedPhp
 * @package    NakedPhp_Storage
 */

namespace NakedPhp\Storage;
use Doctrine\ORM\UnitOfWork;
use NakedPhp\Mvc\EntityContainer;
use NakedPhp\Stubs\User;

/**
 * Exercise the Doctrine storage driver, which should reflect to the database
 * the changes in entities kept in an EntityCollection.
 */
class DoctrineTest extends \PHPUnit_Framework_TestCase
{
    private $_storage;

    public function setUp()
    {
        $config = new \Doctrine\ORM\Configuration();
        $config->setMetadataCacheImpl(new \Doctrine\Common\Cache\ArrayCache);
        $config->setProxyDir('/NOTUSED/Proxies');
        $config->setProxyNamespace('StubsProxies');

        $connectionOptions = array(
            'driver' => 'pdo_sqlite',
            'path' => '/var/www/nakedphp/tests/database.sqlite'
        );

        $this->_em = \Doctrine\ORM\EntityManager::create($connectionOptions, $config);
        $this->_regenerateSchema();

        $this->_storage = new Doctrine($this->_em);
    }

    private function _regenerateSchema()
    {
        $tool = new \Doctrine\ORM\Tools\SchemaTool($this->_em);
        $classes = array(
            $this->_em->getClassMetadata('NakedPhp\Stubs\User')
        );
        $tool->dropSchema($classes);
        $tool->createSchema($classes);
    }

    public function testSavesNewEntities()
    {
        $container = $this->_getContainer(array(
            'Picard' => EntityContainer::STATE_NEW
        ));
        $this->_storage->save($container);

        $this->_assertExistsOne('Picard');
    }

    /**
     * @depends testSavesNewEntities
     */
    public function testSavesIdempotently()
    {
        $container = $this->_getContainer(array(
            'Picard' => EntityContainer::STATE_NEW
        ));
        $this->_storage->save($container);

        $this->_simulateNewPage();
        $this->_storage->save($container);

        $this->_assertExistsOne('Picard');
    }

    public function testSavesUpdatedEntities()
    {
        $picard = $this->_getDetachedUser('Picard');
        $picard->setName('Locutus');
        $container = $this->_getContainer();
        $key = $container->add($picard, EntityContainer::STATE_DETACHED);
        $this->_storage->save($container);

        $this->_assertExistsOne('Locutus');
        $this->_assertNotExists('Picard');
    }

    public function testRemovesPreviouslySavedEntities()
    {
        $picard = $this->_getDetachedUser('Picard');
        $container = $this->_getContainer();

        $key = $container->add($picard, EntityContainer::STATE_REMOVED);
        $this->_storage->save($container);

        $this->_assertNotExists('Picard');
        $this->assertFalse($container->contains($picard));
    }

    private function _getNewUser($name)
    {
        $user = new User();
        $user->setName($name);
        return $user;
    }

    private function _getDetachedUser($name)
    {
        $user = $this->_getNewUser($name);
        $this->_em->persist($user);
        $this->_em->flush();
        $this->_em->detach($user);
        return $user;
    }

    private function _getContainer(array $fixture = array())
    {
        $container = new EntityContainer;
        foreach ($fixture as $name => $state) {
            $user = $this->_getNewUser($name);
            $key = $container->add($user);
            $container->setState($key, $state);
        }
        return $container;
    }

    private function _assertExistsOne($name)
    {
        $this->_howMany($name, 1);
    }

    private function _assertNotExists($name)
    {
        $this->_howMany($name, 0);
    }

    private function _howMany($name, $number)
    {
        $q = $this->_em->createQuery("SELECT COUNT(u._id) FROM NakedPhp\Stubs\User u WHERE u._name = '$name'");
        $result = $q->getSingleScalarResult();
        $this->assertEquals($number, $result, "There are $result instances of $name saved instead of $number.");
    }

    private function _simulateNewPage()
    {
        $this->_em->clear(); // detach all entities
    }
}
Do you have other suggestions to further refactor this code?

Wednesday, December 09, 2009

What everybody ought to know about storing objects in a database

Probably during your career you have heard the term impedance mismatch, which is commonly referred to the problems that arise in converting data between different models (or between different cables if you are into electrical engineering).
Usually the complete expression is object-relational impedance mismatch, which indicates the difficulties of the translation process between two versions of the same domain model: the former resides in memory and it consists in an object graph, while the latter is used for storage and it is a relational model stored in a database. The conversion between parts of both models happens many times while an application runs, and in php's case at least once for every http request.

Object-relational mappers like Hibernate and Doctrine are infrastructure applications which deal with the mismatch, doing their best to implement a transparent mechanism and providing the abstracted illusion of a in-memory model, like the Repository pattern. These particular Orms are the best of breed because they do not force your object graph to depend on infrastructure classes like base Active Records.
The connection between the two models is defined by the developer, by providing metadata about its classes properties: for instance you can annotate a private field specifying the column type you want to use to store its value. But what are the translation rules the developers provide configuration for? Here is a basic set of the tasks an Orm performs for you.
  • Entity classes are translated to single tables as a general rule, with a one-to-one mapping. The class private or public fields which are configured for storage define the columns of a particular table.
  • Objects which you pass to the Orm for being stored become rows of the correspondent table. A User class becomes a User table containing one row for every registered user of your website.
  • A primary key is defined by choosing between the existing fields or inserted ex-novo. Often as a requirement the developer should explicitly define a field.
  • Repeated single (or multiple) class fields become new tables, and the problem of representing them is shifted to representing relationships; in the domain model this kind of objects are Value Objects, which is semantically different from Entities, but databases only care about homogeneous data and such objects receive no special treatment.
  • One-to-one and many-to-one relationships can be represented with a foreign key on the source entity that resembles the original pointer to a memory location.
  • One-to-many relationships are a bit trickier because they require a foreign key on what is called the owning side, in this case the target entity. What can seem strange at first glance is that even if in the domain the relationship is unidirectional (pointer to a collection), elements of a collection need to have a reference to the owner to unequivocally identify it. The mutual registration pattern can be used to build a correct Api starting from this constraint; I will write about it in one of the next posts.
  • Many-to-many relationships are managed by creating an association table that references with foreign keys the participating entities. Every row constitutes a link between different objects; sometimes it may be the case to use such a table also for one-to-many associations, to avoid having a back reference field on collection elements.
  • Inheritance is by far the most complex semantic to maintain as it is not supported at all by relational databases: Single/Class/Concrete table inheritance are three famous patterns which organize hierarchical objects in tables, but I prefer to avoid inheritance altogether if not strictly necessary.
And this list is one of the reasons why in your college courses they told you that a model should be as small and simple as possible: a simple model can undergo much more simple transformations for storage and transmission than a complex one.
Note that some contaminations leak from the database side to the object graph, such as the bidirectionality of one-to-many relationships, present even when it is not required by the domain model.
Orms take care for you of this translation process and can even generate the tables from the classes source code, but they only perform automatically the tedious part of object-relational mapping. You should know very well how the mapping works if you plan to use such powerful tools without reducing your database to a list of key/value pair.

The image at the top is an Entity-Relationship model used to design database schemas. I find it not useful anymore as I now prefer to think in classes terms, with Uml diagrams.

Tuesday, December 08, 2009

5 reasons to be happy in a terminal

I am not talking about an airport terminal, but about one of the terminal emulators which are provided by modern window managers, like gnome-terminal for Gnome and the similar Konsole for Kde, along with the minimal xterm. These are all unix applications but equivalent applications exist for other platforms like Windows, although their integration with the underlying operating system and with specific programs can be tricky.
Why should you, a software developer/engineer, want to pass most of your time in a dumb terminal instead of in a powerful and costly IDE? I have five reasons to convince you.
  • instant access to unix programs. GUIs facilitate the job of naive users but the real power resides in the command line tools which perform the real work; moreover, cli programs can be chained in infinite ways using their universal plain text interface.
  • the classic 80x25 terminal has short lines and a short number of lines: too much logic in a line stands out because the line wraps on the subsequent one. Too long methods are spotted as well, because they don't fit in a single or double screen and require multiple scrolling.
  • transparent remoting with ssh. The same can be said for VNC, but it can be very slow and it's not always supported while many servers have a ssh daemon. It is so fast I did not notice the latency between my local machine and other boxes in my Lan, so I have given them different colored prompts to easily distinguish between environments.
  • uninterrupted flow; not using the mouse makes you move very quickly in the cli environment, once you know what to write and how to leverage the text- based tools.
  • every executed command is registered for possible future repetition and modification. Try record a procedure of 90 control panel clicks instead.
As a side note, thanks to history, it's also very simple to calculate statistics on which are the most popular commands you type on your development box:
[12:29:32][giorgio@Indy:~]$ history | awk '{print $2;}' | sort | uniq -c |
> sort -nr | head -n 10
   4924 vim
   1326 svn
    879 nakedphpunit    // it's an alias for phpunit --bootstrap=...
    616 sudo
    438 cd
    266 ls
    238 osstest_sqlite
    207 phing
    135 ./scripts/regenerate
    127 grep
Of course most of them were only typed the first time and then recalled. From these data you can infer that I use the command line interface a lot, and I've never been more productive. This statistic is a typical leverage of command line tools in a construct that took me less than a minute to write and that I can repeat whenever I want in a few seconds.

Sooner or later, the time comes when a developer feels constrained by his graphical interfaces and resorts to use the command line directly. If he avoids the command line, probably it's because he does not know how to work with it. Don't be so proud like this developer and take some time to learn: the cli will repay you soon.

Monday, December 07, 2009

PHPUnit and Phing cohabitation

During the publication of Practical Php Testing some readers asked me to include information on how to make PHPUnit and Phing work together. It was not possible due to time constraints to include an appendix on this topic, so I will talk about it here.

First, some background:
  • PHPUnit is the leading testing harness in the php world: it consists in a small powerful framework for defining test cases, making assertion and mocking classes.
  • Phing is an Ant clone written in php, that should become the standard solution for automating php applications targets such as deployment, running different test suites at the same time and generating documentation. Why using Phing instead of Ant? Because it interfaces well with php applications.
Integrating these two tools means giving Phing access to a PHPUnit test suite and letting the phing build files, which manage configuration, contain also information on how to run the test suite. In the build.xml file of an application you should find different targets like generate-documentation, test-all, compile-all (if php were a compiled language), and so on.

There are two ways for accessing PHPUnit test suites via phing: exec and phpunit tasks.
At the time of this writing, the phpunit task bundled in stable releases of Phing lacks functionalities, primarily the ability to define a bootstrap file to execute before the test suite is run. I can't live without --boostrap and I look forward to a release of Phing that lets me specify this file in the configuration.
This release will be Phing 2.4.0 (at least a Release Candidate 3 version, while on December 2009 it is in RC2). There are two things that are being fixed and that would annoy the average developer a lot:
  • There was a bug in the last RC release affecting the bootstrap parameter, and the inclusion took place too late in the process, producing fatal errors when the suite for instance relies on autoloading. This bug is fixed in the repositories and will be gone in the next RC release. I downloaded the simple patch and applied it manually to try out the bootstrap functionality and it works very well. (http://phing.info/trac/ticket/378)
  • The summary formatter is not a summary: it uses the wrong hook method, producing a report for every single test case and resulting in an output hundreds of lines long. I opened a ticket to tackle this issue. (http://phing.info/trac/ticket/401)
What we will be able to do
<target name="test">
    <phpunit bootstrap="tests/bootstrap.php">
        <formatter type="summary" usefile="false" />
        <batchtest>
            <fileset dir="tests">
                <include name="**/*Test.php"/>
            </fileset>
        </batchtest>
    </phpunit> 
</target> 
When you push the big test button on your desktop (from the cli type phing test), this xml configuration will hopefully produce a report while your test suite runs.
The problems with this approach are it does not work yet due to the bugs I have listed earlier, and that it eats quite a bit of memory, forcing me to increase the limit to 128 Megabyte for a suite composed of 144 unit tests.

What we do now
Until a stable version of Phing 2.4 is released, we should rely on exec commands, which directly call the phpunit binary executable (not so binary: it is in fact a php script):
   <target name="test">
        <exec command="phpunit --bootstrap tests/bootstrap.php 
--configuration tests/phpunit.xml --colors"
dir="${srcRoot}" passthru="true" />
        <exec command="phpunit --bootstrap=example/application/bootstrap.php  
--configuration example/application/tests/phpunit.xml --colors"
dir="${srcRoot}" passthru="true" />
    </target>
$srcRoot is a property that specifies the working directory to run the phpunit command in. passthru makes the task echo the output of the command.
This approach is sometimes more flexible than using the specialized phpunit task. More flexible in the sense that you don't have to expect that phing includes in its tasks options for configuring new phpunit features, because you can use them just as they are available from the command line. On the other hand, it may be difficult to perform different actions (like lighting up a red semaphore in your office) basing on the last build state (red or green).

So I'm relying on exec tasks for now. By the way, the result is pretty and colors are even conserved, but I have to expect the end of the exec command to see any output (no dots slowly piling up on the screen).
If you enjoy using Phing and PHPUnit, please provide feedback and contribute to the projects, especially in the case of Phing. It is a project that deserves more attention from the community due to its integration tasks.
UPDATE: Phing 2.4.0 was released on January 17, 2010.

Friday, December 04, 2009

Evolution of inclusion

Once upon a time, there was the php include() construct. Reusing part of pages and other code was as simply as:
<?php
include 'table.php';
Include files gained the same scope of the parent script, with no need to look for global variables somewhere.
The problem came when there were parameters that influenced the final result, and the included html was a template expecting its variables to assume a value:
<?php
$user = new User();
$entries = array(...);
$showCaption = true; 
include 'table.php';
This programming style is also known as Accumulate and fire, and it exposes a not very good Api. There is no way to learn the variables needed to the template, nor to immediately signal errors in population: if I comment $user assignment or set it to an array, the script would not notice until the variable is used deep in table.php.

So the php programmers approach evolved in writing functions and classes that generate html as their only responsibility. These classes are called View Helpers.
Since the generation process involves calling a method, the method's signature takes care of exposing the list of parameters and to validate them.
Some dependencies can be one-time injected via the constructor of a view helper (or via setters), but not every piece of code is prone to being put in a view helper because not every line of code is actually reused. Often this low in logic php code is placed in View classes (usually view scripts in php). View scripts are very simple for designers to modify, although some people think that they should absolutely interpose a layer between the php code and the front-end designers.

View scripts can include an header.php or footer.php to avoid duplication of common code, but this solution does not remove duplication: it only reduces it. Try to change the filename of header.php and you will see.
Thanks to url rewriting, the single point of entry pattern became trivial to implement, and now every serious framework has only one or two top-level php files which are loaded in the browser with different parameters, and that determine which action to perform and which view script to show to the end-user (MVC paradigm in php applications).

Still, there was the problem of configuration: sometimes we need pages with the menu on the right and sometimes not (maybe a forum index is too large). In printable versions no navigation has to be shown; in other pages some submenu can be open or closed depending on the context.
And so the famous Two Step View pattern was implemented, and its process now minimizes code duplication:
  • the action chosen by url parameters is executed and its output consist of some variables, which populates the view. Still there is no Api to refer to, but if you keep view scripts small the problem does not arise often, and there is no mandatory scope mixing by direct include() usage.
  • the chosen first view is rendered and its generated content is saved in a variable, usually via output buffering. In Zend Framework, Zend_View is the object that manages the rendering of a script and which acts as a generic view class. Using view scripts instead of view classes eliminates the need for a templating language: imagine web designers modifying a php class.
  • then the chosen second view is rendered, passing the first result as a variable named by convention, for example $content. In Zend Framework, the second view is a script which is managed by Zend_Layout.
  • both view scripts, which we shall call the view and the layout from now on, have access to view helpers, object injected in the way and in the form you prefer, so that different kinds of tedious html generation can be kept in their cohesive classes and tested independently.
Sometimes a view script still include()s another... But if the code gets complex the latter view script can usually be refactored and transformed in a view helper.

Thursday, December 03, 2009

Sequels and reboots

In The Mythical Man-Month, one of the most famous book about software engineering, Fred Brooks says, referring to software projects:
Chemical engineers learned long ago that a process that works in the laboratory cannot be implemented in a factory in one step. An intermediate step called the pilot plant is necessary [...] Hence, plan to throw one away; you will, anyhow.
I see this as a perfect reverse situation of movies production.
It is indeed true that in the motion picture industry sequels often ruin the feelings of the original movie and at least do not come close to its perfection (though they can surpass it in success due to advertising campaigns and public's expectations).
Consider these science-fiction movies as an example:
  • Star Wars: it is widely believed that no sequel or prequel can measure up with the original A New Hope. The final, which I don't want to spoil here, is probably the most scene in the history of the genre.
  • The Terminator: although every sequel contains the catch-phrase Come with me if you want to live, the original 1984 movie is still the most revolutionary.
  • The Matrix: should I say anything?
In software projects instead the 1.x version is usually the first version to throw away before upgrading to one of the subsequent world-changing releases:
  • The Linux kernel: today at version 2.6 it finally supports the majority of devices and will never require you to insert a driver cd (in the worst case to compile drivers, which is really annoying).
  • OpenOffice.org, whose 3.x version is becoming more and more diffused and has recently reached 100 million downloads.
Open source projects often are reluctant to mark the 1.x release of a software before it is really complete and tested. Although this humbleness, often they experience success in the releases that come later, maybe for the fact that is the wide adoption of the first stable version which exposes for the first time architectural problems and other issues in a large installed base. The developers effort is alzo minimized since a 2.x version sees the light only if there is enough momentum and following from the first release.

So why not citing php sequels, which has reached version 5.3?
Extending the cinematographic comparison, often we watch reboots instead of sequels. This kind of movies is very fashionable nowadays:
Rewrites is the right word, since a movie franchise reboot corresponds to a major code rewrite. A rewrite is different from an upgrade since not only it breaks binary and Api compatibility, but it consists also in throwing away entangled and coupled code to write from scratch a new solution. The border between the two is not strictly marked, and many major releases are in fact rewrites which maintain the same name for brand popularity reasons.
Such rewrite is different from a movie reboot in the way that it maintains some continuity between the old and the new version, for example in the Ubiquitous language, but it is much more dangerous and it can lead to a never-ending development phase (anyone knows Mozilla suite's fate.)
Ports can also be considered reboots if started from empty source files, and often the original source code is unavailable. Today the most famous ports start from a commercial application to reimplement as an open source one. Forks and ports of open-source application instead recycle code and remain connected with the original project.
Besides the issues in rewriting from scratch, there are successful attempts of reboots in the open source software ecosystem:
  • Apache 2 is a substantial rewrite according to Wikipedia; Apache original duct-taped version gained its name from the phrase a patchy web server.
  • Php rewrite before version 3 and 4, which saw the introduction of the Zend Engine.
  • Grub, the bootloader for GNU/Linux machines, has recently been upgraded to the complete rewrited Grub 2 in Ubuntu, but no one I have seen using Karmic Koala noticed the difference.
Are you writing a reboot, a sequel or an original movie? :)

Wednesday, December 02, 2009

Practical Php Testing is here


Practical Php Testing, my ebook on testing php applications, is finally here as promised, in the first days of December.
How many times in the last month have you seen a broken screen in the browser? How many times did you have to debug in the browser, by looking at the output, inserting debug statements and breaking redirects? How many times did you perform manual testing, by loading a staging version of your application and tried out different workflows in the browser?
If the answer to these questions is more than very few, it's likely that
you should give automated testing a chance.
This book is aimed to php developers and features the articles from the Practical php testing series, while the other half of it is composed by new content:
  • bonus chapter on TDD theory;
  • a case study on testing a php function;
  • working code samples, some of whom were originally kept on pastebin.com;
  • sets of TDD exercises at the end of each chapter;
  • glossary that substitutes external links to wiki and other posts, to not interrupt your reading with terms lookup.
The book comes for free and is licensed under Creative Commons. This phrase means you are free to copy it and give it to anyone. If you find my work useful and you want to be supportive, you can make a donation with the link on the right menu or with the one provided in the book.

Tuesday, December 01, 2009

Practical Php Testing errata

This is the errata page of the Practical Php Testing ebook, where typos and other errors will be listed and corrected. This page is linked to in the to Errata section of the book, to provide updates if errors are found without releasing different and confusing versions of the book.

Practical Php Testing will be published on December 2, 2009.

Monday, November 30, 2009

Asserting out of tests

In programming, assertions are statements that should always evaluate to true, being invariant assumptions in respect to the input data of a program. From the mathematical point of view assertions are tautologies for the implementation which they are sunk in.
For instance, you can code assertions which verify that the input data consist of strings, or that the result of a calculation is coherent with the program flow. A failed assertion usually marks a logical bug.
Unit tests are disseminated with assertions, given the advantage that xUnit assertions are contained in test cases and thus separated from production code. However there are other places where assertions are used; many compiler or interpreter checks in modern programming languages are implicit assertions that provide type safety or other automatic controls:
  • As I said earlier, assert*() methods like assertEquals() and assertTrue() are provided by instances of test cases to allow specification of behavior. These assertions are treated in detail in the relative testing series article.
  • The assert() function (sometimes implemented as a macro in languages like C) in production code is the fastest way to check the correct flow of the program. The php assert() function takes as an argument a php expression (which is simply code that evaluates to a boolean) encoded in a string; the encapsulation in a string variable allows for the assertions to be skipped where particular flags are set.
  • Type hinting on function parameters is actually a masqueraded assert(). In php, the assertion code would be assert('$param instanceof MyClass');
  • Database constraints are commonly declared in the form of assertion in Sql code. For example, the result of some queries should be invariant or the value entered in a column should match a restriction of the field domain.
Given the various assertions you can make in different parts of your application, you should be cautious in inserting them in production environments. While in unit tests assertions are fundamental (but you shouldn't exaggerate with their number to simplify maintenance), typically explicit assertions are disactivated in live deployments. This behavior is preferred to leave checks in place, to avoid exposing errors to the real user and to speed up code execution.
Also caution should be used with database constraints if you work with an object model and an Orm, as they may result in logic duplication.

Failed assertions should be managed someway. Php lets you declare an assertion handler, which I usually set to a small function that throws a special exception with a message containing the error generated by the assertion code. I see failed assertions as a very serious problem which may indicate a bug, while normal exceptions often are used to signal incorrect inputs or state conditions that cause an error.
Some assertions get in the way of tests too: when we encounter such assertions we should demand why they exist in the first place, if they should be disactivated in production and also in testing. A common example is the null check/type hinting:
<?php 
class MyClass
{
public function __construct(MyClass $param)
{
    ... 
While Java would allow the client code to pass null as the value of $param, php raises a catchable fatal error, which stops the constructor execution. This means that if we don't need some collaborators in testing a particular method, we are forced to subclass MyClass to override the constructor, or to create fake collaborators only to fill the parameters list. If these collaborators require not-null parameters in the constructor, the problem becomes recursive.
So I prefer not to make unnecessary assumptions on the input parameters of constructors:
<?php 
class MyClass
{
public function __construct(MyClass $param = null)
{
    ...
The same problem arises in a different form for scalar type hinting, because it is not available in the syntax but can be implemented by the programmer:
public function __construct($config)
{
    assert('is_string($config)');
    ... 
}
Just do not assume $config is a string if there is even the remote possibility that tests will exercise the class without a config variable. Only if $config is invariably needed for the class to work we should check its type and structure.
Of course, we should also test in some way that the class is correctly instantiated, but this part should be covered by integration tests, or unit tests for the factory or the container. Integration errors are easy to spot since calling a method on null is not allowed, and if the construction process is not complete the first access to the missing collaborator stops the execution of the entire suite.
Now that you know the power of assertions, try to take the best out of them as they are not substitutes for separate unit testing, but can be replaced by unit tests in many cases.