Are We Having Fun Yet?
After about eight years of studying at two universities, I'm glad the courses have not succeeded in killing my love for computer science. They almost did – in spite of really encouraging grades. Over the past months I reanimated my drive to think freely and take the time to follow ideas through to the bitter end. Am I the only one who finds it awkward that this is not exactly what they're doing at universities these days?
Anyway, it's awesome fun to research what my mind brings up and what I care about instead of what is appointed to me. And it's even more awesome to find that, all those times when I was sceptical of what (or how-) we were taught, my gut feeling pointed in a fruitful direction. Gut feeling: That's something I should listen to more often (it's amazingly fast).
What surprised me the most are the things university didn't teach us at all. In particular, I felt that process- and architecture related courses did, in no way, reflect modern reality. Software engineering as it was presented to us is kind of dead. So, I went on my own journey, starting at architectural basics, and came up with a theoretical framework that I'm applying to all my software development: the Diamond Tree. It's a little abstract but simple and it helps me (re-)factoring my projects. Feel free to let me know what I'm missing or what else you think.
What probably comes to mind when hearing of "software architecture" is OOD with its principles and patterns. But even the omnipresent MVC pattern is not expressed at a level of architectural foundation. The driving force behind all design patterns is dependence. In fact, dependence is the underlying ordering principle of everything. Therefore, it should be the starting point of any theoretical or practical trip into the depths and heights of serious coding.
This is how dependence between classes typically looks like: Class A depends on class B when header- or implementation file of A import (include) some file of B. For the fact of dependence, it doesn't matter what kind of relation A and B have. A might extend, subclass, contain or just reference B. The important thing is that any usage of A depends at least on the availability, but most often also on the interface and implementation of B. In the dependence graph, we draw an arrow from A to B. Not very exciting, is it? Now, how do you imagine what the class dependence graph of a whole project looks like? I saw (and drew) a lot of UML diagrams and many looked like piles of spaghettis:
- They lack an obvious system. Everything seems to talk to everything.
- There is no obvious hint about program flow. Who starts the talking?
- All classes seem to be of equal importance.
Breaking it Down
I mentioned a class that depends on another and thereby assumed that it makes sense to have different classes in the first place. This is self-evident to any software developer but try to question it for a moment: Why not put all code in one class? Here begins software architecture. We structure code to enable change. The ability to change code enables us to write more. Good structure (design) is the precondition of software development.
The structure that enables change is one in which interdependent (coupled) parts are closer together than independent parts. This is what the 'S' and 'I' in SOLID are about. A class subsumes heavily interdependent methods. Classes are also the most important structural entity since they're typically stored in exclusive files. If you don't structure enough you get some large class containing unrelated methods.
Worse than not to structure enough is to structure in the wrong way and spread interdependent methods over different classes. The clearest symptom of that are cycles in the dependence graph. If A depends on B and B on A, it makes no sense to have two different classes. In contrast, if the dependence goes only from A to B, the implementation of B and the interface of A can be changed more freely and B can be used in isolation.
Cycles in the dependence graph are the purest design evil. Many patterns and language features were mainly invented to avoid those cycles. The good news is that cyclic dependencies are among the few design evils that some compilers can warn us about. We only have to make all dependencies explicit and be aware that forward declarations may suppress compiler warnings.
Side Note: Property Rights
Related to the overarching principle of acyclic dependence between classes (files) is the principle of ownership between class instances. An object B should have exactly one owner A. A depends on B and is responsible for it. Only A can initiate the destruction of B and so only A can know in advance whether B still exists. Other objects can reference B but not own it.
Acyclic dependence does not enforce acyclic ownership, let alone single ownership. Let's say the class of A is a subclass of class C and B has a reference to an object of class C. This object can actually be A, so B can theoretically own A although its class does not depend on the class of A. Therefore, an OOP language that allows to make ownership explicit has a fundamental advantage there. Objective-c, for example, distinguishes weak and strong properties.
Wisdom in the Tree
Accepting the fact that most class models and all good ones have no (strong) cycles gets us out into the greenery: The class model should be a tree. Now that's something. In spite of all the design patterns, especially callback- and other communication mechanisms, we're still on a tree! That demystifies UML-, entity-relationship- and other spaghetti diagrams: If it's not laid out as a tree, it either reflects a bad design or fails its purpose to convey the most important information about the class model.
So, instead of asking "How does it work?" one should ask "Where is the root?". The root is necessarily the entrance point, which is typically the main function. No class invokes it or depends on it, so it's the top of the hierarchy. The main function creates and uses the first objects and, therefore, depends on them. It is the starting point for understanding a class model. From there you can read the application in a top down fashion, following control and program flow.
Ensuring to have a dependence hierarchy (as opposed to a possibly cyclic dependence graph), is a fundamental task of any serious software engineering. It does not limit the applicability or power of other development activities like OOD, domain modelling or coding.
Trapped Between User and Mashine
We enforce the independence of building-blocks for being able to change even the large-scale structure of what we build. The building-blocks are useful in different contexts like legos. We might say they are reusable but note that the concept of reusability is more general than the idea to use the same class in different projects. Still, I choose to illustrate my points talking about classes.
Back to the minimal hierarchy: class A depends on class B. In that case, B can be used in different contexts by different classes, independently of A. We say B is on a lower level, more general or abstract and less specific. Now, on what does a general class not depend on? What can a specific class be specific about? I wanna distinguish two kinds of independence that give rise to different meanings of reusability:
- system independence … is the independence from specific tools like the programming language and any API that you didn't write yourself. In the most extreme case, the code is independent of any library, platform or device. A programming language is more system independent than the APIs on top of it.
- application independence … is the independence from the specific application, project, domain, user or customer. The code can be used beyond the application you're currently working on. A general table view class (no matter how system specific) is more application independent than the controller that uses it.
When speaking of reusability, most people mean application independence because the application for which we develop changes more often than the system. But don't underestimate the meaning of system independence. Portability may be for canoes, but designing for portability is good for any software:
- System dependence is not just platform dependence. It often comes down to the dependence on libraries, and libraries are often not as reliable as we expect and under-documented. Also they are changing and may need to be exchanged completely. Ironically, the use of powerful libraries makes your code more dependent because "powerful" also means "high-level" and "specific". In contrast, your low level programming language is generally applicable, highly reusable and always there for you.
- Clean interfaces do not just enable reuse of what they hide below them, they also improve all the code above them. To write better code, you want to use clean interfaces that encapsulate lower levels. Basically, it's more about good design and less about cross-platform code.
- System independent code is closer to the application domain, easier to understand and easier to unit test.
Digging for Diamonds
The degrees of system- and application independence do not depend on each other. A piece of code can have all combinations of high or low application- and system independence. Therefore it can be located in a 2-dimensional "dependency space":
In this diagram, dependencies can only go from top to bottom and from right to left. Also, diagonal dependencies from right-top to left-bottom are usually a sign of bad design because the dependent entity would add application- and system specifics. Both can usually be pulled out into separate lower level classes. If we like to see the hierarchy tree laid out with the root on top and all dependency arrows going strictly down, we just turn the diagram and get a diamond shaped tree:
Portfolio code can be used for different customers and applications, because it is application independent. This is where a developer (company) builds his profile and expertise. The model comprises all code that is system independent. The communication with the user is always system dependent and gets done by controllers and views.
Whether one designs for it or not, the dividing lines between those layers run through the code of every software. Unfortunately, if only one part of a class is system- or application dependent, it makes the whole class dependent. Therefore, a class should be explicitly dedicated to a set of dependencies. If, while coding, you suddenly require another dependency, you'll know that the particular code belongs in a different class. Also, the potential independence of a piece of code should match the independence of its class. So if code can be moved down to a lower dependency level, it should be moved. Where these principles are ignored, the portfolio- and model layers don't get all the code that belongs to them, causing all sorts of problems.
The Emancipation of Models
I found it hard to wrap my mind around how I would achieve such a clean architecture. The thing that bugged me the most is how to create system independent classes. I'm used to the fact that controllers depend on models - never the other way around. But mostly, I wasn't creating true domain model classes because they still depended on system specifics, which, by definition, they shouldn't.
What I intuitively aimed for was to satisfy the Dependency Inversion Principle, which is the most discussed one from SOLID. Only I couldn't implement it consequently. The DIP demands that details (system specific implementation) ought to depend on abstractions (the domain). This requires some thought because a classic top-down decomposition of a problem yields quite the opposite.
Inheritance obviously plays well with the DIP, but I found it counter-intuitive at first that containment can also implement it. I thought of the contained element or class as being automatically more specific than its "general container" but that's wrong. A wheel is more abstract than the car that happens to contain four wheels. Composed structures like classes are most often more specific than the types they contain. The primitive types of your language are the most abstract ones you have. You might wrap them in domain-oriented interfaces but those wrappers would be more application specific.
Here is another observation: Dynamic code like a method implementation is more specific than static code like data structure- and type declarations. That sounds trivial but, as object oriented programmers, our types are classes and we're used to make their general interfaces depend on their specific implementations. That is problematic because classes also act as data types with an internal state. When implementation code becomes more specific than the concept represented by that state, we don't want the class to depend on that code anymore. We need to extract that code in a way that makes the class completely independent of it, so pure encapsulation is not an option. This problem is, of course, most relevant to the layer that holds the state of the domain: the model layer.
What helps with designing true model classes is to first think of them as low level data structures that have hardly any dynamic behaviour except for state validation. Following that, you nicely wrap system specific services in domain-oriented interfaces and inject them into the model objects. This is called Dependency Injection. It's a typical technique to satisfy the DIP and often a precondition of unit testing.
Note that a pure domain model class is considered "low level" because it is a leaf in the dependence hierarchy (left bottom edge of the diamond). But, regarding the domain, it represents some rather "high level" concept. That is no coincidence because these abstract domain policies should be most independent of any specific realisation in the domain or in software. That is why, in the original formulation of DIP, Robert C. Martin wrote: "High-level modules should not depend on low-level modules." He applies the term "high-level" to the domain and not to a dependence hierarchy in software. Explicating this distinction is necessary to any discussion on the subject because people tend to mean very different things when speaking of "high-" and "low level" code.
The important thing to remember about DIP is that it's not about just making the interface of a dependency more domain-conform and hide system specifics. That would make the dependency more indirect but would not reverse it. What we, instead, frequently need is kind of a plugin pattern. Delegates for example let us plug details (the delegate) into a more general framework (the delegator). Delegate protocols are often declared in the file of the delegator class, making the dependency direction very clear: The delegate depends on the delegator. Details depend on abstraction.
Wrapping it Up
The Diamond Tree reflects my perspective on good architecture. I think it stems from the same philosophy as Domain Driven Design although I'm not an expert on the latter. The blue book on DDD goes into depth on how to translate a domain into model classes and stresses the importance of decoupling the model (domain-) layer from everything else. There's also a free downloadable book that summarises the blue one on about 100 short pages. It still contains enough repetition and gives a nice overview.
A classic temptation is to let controllers do business logic, but the DDD approach reminds us that the important part of an application is actually the model. You should move as much code as possible there by disentangling it from system specifics.
The same is true for the other dimension: Make code as application independent as possible, even if you don't plan to reuse it in the future. Getting that general code out of the way lets you (and others) see the actual application more clearly. But anyway, you should seize every project as an opportunity to extend your code portfolio.
I conclude: The Diamond Tree, although quite abstract at first glance, promotes a pragmatic way of thinking about software architecture and design. It is, of course, not entirely new, instead it relates to ideas like agile development, domain driven design, MVC and DCI.
Update: June 6, 2013
I'm delighted to see, that Uncle Bob Martin is about to publish his book "Clean Architecture". He has been advocating the separation of the application from its frameworks and delivery mechanisms for some time now and referenced the Onion Architecture, which is quite similar to the Diamond Tree. However, like all architecture models I've seen, it doesn't distinguish application- and system dimension. For instance, Martin's use cases layer is roughly equivalent to the application layer in the DDD literature or the context- and interaction layers of DCI. Those layers add application specifics. Then, the outer (higher) layers add system specifics. These kind of architectural models neglect the right side of the diamond because they don't have a place for system specific code that is application independent, which I find overly simplifying or just not very clean. As long as dependency direction is preserved, there can be more combinations of system- and application independence. Also, the Diamond Tree allows to really lay out a concrete application in a diagram with both dimensions being meaningful. Now, I'm excited about what is to come in Uncle Bob's new book.
Here are some sources that I found particularly interesting in one way or another:
- The Big Ball of Mud and Other Architectural Disasters
- Software Engineering: Dead?
- The M in MVC: Why Models are Misunderstood and Unappreciated
- Interactive Application Architecture Patterns
- The Onion Architecture
- The Clean Architecture
- Keynote: Architecture - the Lost Years (by Robert Martin)
- Martin Fowler's (popular) Articles
- Domain-Driven Design Quickly (free downloadable book)
- The Dependency Inversion Principle
- The DCI Architecture: A New Vision of Object-Oriented Programming
- Jim Coplien: Why DCI is the Right Architecture for Right Now