Wednesday, December 16, 2009

The object graph

Stefano wrote to me with the intention to expand the discussion on the object graph concept, which I referred to earlier. As always, I think that sharing my thoughts can help other readers and also provide some feedback about these ideas.

A bit of theory
What is a graph? According to my math courses and to Wikipedia, it is an abstract structure defined by two sets. The variant of graph that interests us is the directed graph, because it resembles the Von Neumann representation of objects better than a non-directed one.
The two sets that define a graph are the vertices and the arcs:
V = {2, 3, 5, 7, 8, 9, 10, 11};
A = {(3, 8), (3, 10), (5, 11), (7, 8), (7, 11), (8, 9), (11, 9), (11, 2)};
The elements of A are ordered pairs whose elements are elements of V.
The term directed means that the graph's arcs present a specified direction (if they hadn't, they would have been called edges and the elements of A would have been two-element sets instead of ordered pairs).

How does it apply to computer science?
Well, suppose you have an object-oriented application in execution. The complete data structure is presented to us with various abstractions as an object graph, a graph where the V vertices are objects and the A arcs are their connections by field references. Actually, arcs could be represented by pointers in low-level languages like C++, and by more complicate handlers in higher-level environments such as the Php interpreter or a Java virtual machine.
For instance, consider the FrontController object of an ordinary php framework application. It has references to the Request and Response objects, and to the chosen Controller instance. The controller may have other references - to connection objects, Repositories, User entities and so on. There can be cycles and links spreaded all over the graph, which may be very complicated.
Of course to obtain a useful representation we may omit from a graph some objects which are actually reachable, as they are not "pertinent" to the current discussion. In a formal context, however, we ought not to leave out anything.
The first time I heard the object graph term was on Misko Hevery's blog, used to describe the structure of an object-oriented application.

Why talking about an object graph?
Because it is a mathematical abstraction on the raw pointers and memory segments.
Stefano said in his email:
Probabilmente ancora non siamo riusciti a formalizzare una analisi teoretica sopra gli oggetti che descrivono software. Non so nemmeno se la cosa, allo stato attuale sia verosimile o abbia un senso. Tuttavia, cominciare a pensare in questa direzione credo possa essere un punto di partenza proprio per trasformare la Programmazione da "Arte" a "Scienza",  obiettivo perseguito anche dallo stesso Misko.
Maybe we have not yet formalized a theorical framework on software objects. I don't even know if this would make sense at this point. However, I think beginning to move in this direction can be a starting point to transform programming from Art to Science, an objective pursued from Misko too. (translation of mine)
An abstraction such an object graph let us make statements which do not depend on the technology (Java or Php) but only on the object-oriented paradigm, and that thus will be true in many languages and platforms, or 10 years from now when Php 9.3 and Java 14.0 will be released (provided that we maintain the OO paradigm; considering that Smalltalk is from the 1970s, it may last for a long time).
For instance, here is a list of the concepts which involve a generic object graph:
  • object graph building and business logic separation. To produce seams for easy unit testing where we can inject collaborators, classes that build or expand the object graph should be separated from classes which contain logic.
  • Serialization; given an object O, the graph of all the objects reachable from O should be serialized with it to allow its reconstitution.
  • The state of an application which should be stored in a database is an entity graph, composed of User, Group, Post instances; Orms such as Doctrine 2 implement persistence by reachability on a generic object graph. Reachability is a mathematical property.
  • Why entities should not contain field references to service classes? Because they reach out of the entity graph and complicate the storage process.
  • The Observer pattern can be described as a partitioned graph that improves decoupling between objects of the same partition (observed or observating side). Other patterns are often explained with the help of an Uml class diagram, which is a similar (but more specific) concept.
Note that if we demonstrate a rule or a theorem for an object graph (or a graph with certain characteristics), it will be valid for every other instance of that graph even in different applications. That's why mathematicians love abstractions as much as programmers: they save time to both categories.

Let me know your thoughts. There are many mathematical formulations of the object-oriented paradigm, but talking about a structure such as a graph can help explaining advanced concepts, taking advantage of this simple abstraction.

1 comment: