The Repository Pattern

Today’s post is about more than a simple punctual matter that I’ve come across at some moment in my day to day activities or on my restless nights of endless hacking.

Despise my overall remarkable programmer geekiness, I am first and foremost a philosopher. As a consequence to that, I love to I/O - big essays - so if you want to skip the background story and get right into the fun part, just jump straight to last tab below, ok? If you want to be even more hasty, you can go straight to the github project page and check it out there.

Historical Motivation

Historical Motivation

I’ve been doing software for about 5 years now - be it professionally or academically. My first professional endeavor dates from early 2005, a freelance in PHP while I was still coursing the Computer Science Bachelor degree at UFSCar. Professionally, I’ve had two years at Accenture, doing mainly Java and Pro*C projects, and now I’m moving pretty fast for completing a year (on October) at Locaweb.

In all that very short timed but very diversified experience, one major concern (maybe even the biggest concern of all) that often showed up while developing software is what kind of persistence engine the application will make use of. The vast majority of real world applications you will see out there uses a relational database (name it: Oracle, SQL Server, MySQL or whatever else) as its main persistence engine. Even if I try to remember, I can barely recall half a dozen applications that I worked with which didn’t use relational databases (one good example of these exceptions would be LDAP authentication applications - although even those might end up using some relational back-end).

Given that, how you will deal with your persistence engine should be one of the top priorities on your design plan for virtually any application. A bad choice on this matter might very easily lead to catastrophic results, no matter how good the rest of the overall design is. Since the software development niche is a wide boundless world, it should be expected that such a crucial concern has some good and fairly established frameworks to deal with it. And there surely are, quite a bunch indeed - if you pick a widespread popular language like Java, it even becomes a pain to decide amidst of so many options.

[Be advised that this specific paragraph below is a very outdated opinion (I don't do drug… erm Java anymore)]
At Accenture I’ve worked a little bit with two of the most widely accepted Java options out there: Hibernate and Spring (there’s also JPA, but I’ve never tried it). But, none of these seemed to be very pleasant to work with [reinforcing - that was the situation two years ago] - specially in a situation when you have no control over the database, it is just there, and you have to live with it. There was tons of configuration, tons of giving up some control of what is being done by the application to the framework. That last part always sounded pretty bad, specially in the kind of project I was working on: tons of data, where if a select took one millisecond more than necessary to complete, it would sum up over the iterations, making a huge difference (sometimes hours) on the final execution time. So, for me that was always a no go.

Since I’ve seen the idea of Active Record for the first time while studying Ruby on Rails last year (I’ve heard that you can do something quite similar with the latest versions of Hibernate too), I’ve been in love with the idea. No pollution in your model, no endless configuration, straightforward persistence on the fly. Seems to good to be true. And actually it is, it won’t solve half of any real business application persistence problem. If you have a beautiful database, with ID’s in all tables, and good, ubiquitous naming conventions - you’re good. But, just how many real world business projects can say that their databases are like that? Most of today’s software was written quite a while ago, with those big badass monolithic databases - with tables and columns using the legendary Hungarian notation and all that. In those cases, you’re back to endless configuration hell. I’ve put together a very quick "hello world" with the castle implementation of Active Record for C#, using NHibernate as a back-end and it seemed to work fine, although falling into the same problems aforementioned.

One very common saying amongst fellow developers is that "there is no silver bullet" in this activity. That means, no absolute rule, no solve-all-your-problems idea. But, I would say that although there are no such things that can guarantee your success, there are certainly those ones that you should not do. One of those things that in almost any case will most likely result in catastrophic failure is: never, ever, fight your framework.

That’s basically the background story and historical motivation behind the whole idea that will be shown in the following (many) paragraphs.

Understanding the Repository pattern

Understanding the Repository pattern

For each type of object that needs global access, create an object that can provide the
illusion of an in-memory collection of all objects of that type. Set up access through a
well-known global interface. Provide methods to add and remove objects, which will
encapsulate the actual insertion or removal of data in the data store. Provide methods
that select objects based on some criteria and return fully instantiated objects or
collections of objects whose attribute values meet the criteria, thereby encapsulating
the actual storage and query technology (…)

Evans, Eric (2003) Domain Driven Design - Tackling Complexity in the Heart of Software

The above quotation gives an overview of the main concept of the responsibility attributed to the repository pattern. Oversimplifying maybe, a repository is a mechanism of dealing with the lifecycle of an entity (or aggregate root) - providing a clear, centralized interface to access, modify, or destroy it.

One of the main reasons to use a repository is to clearly decouple the persistence logic from the model and application layers. A repository isn’t a DAO, it contains business logic of how each type of object in your model can be dealt with, it might even use a DAO implementation behind the scenes to do the actual queries. It also will most likely make use of factories to create the fully instantiated entities, once the necessary data has been retrieved from the database.

Henceforth, while developing the application and model layers, the developer can focus his energy on the domain itself, since the entire persistence concern has been delegated to the repositories.

Any big enough application in which you are not planning on using a full fledged persistence framework will most likely have some sort of implementation of a repository. Otherwise the domain would be very polluted with infra-structure and persistence logic. So, the idea here is to show an implementation of this pattern that is been working quite well for us in a real business scenario - since it might help other people too, if they happen to be working on a similar platform.

Implementing the Repository

Implementing the Repository

Okay, this is going to be very specific. On our current projects, for various reasons, we’re using the following platform:

  • Language: Microsoft .NET Framework 3.5 SP1 (C#)
  • Database: Microsoft SQL Server 2005
  • IDE: Microsoft Visual Studio 2008 Team System
  • Source Control: Git

On a given point during a sprint a couple of months ago, one of my teammates (kitaly) came up with an example of a repository implementation which seemed to be going to a very interesting direction. So, on our first all new project after that, we’ve decided to try and check it out if we could make it work for our business scenario.

The main idea was to provide the methods: "save", "find" and "delete". To gain that, each specific repository would have to implement as few as possible methods, for data mapping, generating custom fields and eventually logic for business access constraints.

Even if our brand new application had a pretty good expertly crafted database, some parts of it still had to deal with the legacy tables. So, we’ve chosen to go with LINQ to SQL, since it is very close to doing pure SQL (but not quite), and gave us all the flexibility (and customizability for performance) we shall need. After a incredibly productive and rather enjoyable session of massive brainstorming we’ve managed to do a working demo.

After being amused for its elegant and succinct implementation and seeing it working well on production environment, I’ve decided to dedicate some time to improve it even more, providing more functionality "out of the box", therefore requiring less code on each specific repository.

There is, although, a slightly upsetting tradeoff we had to take while developing the repository. Since LINQ relies on expressions going straight to queries, we’ve decided to expose the actual auto built DTO objects from LINQ’s data context. The problem here is that we broke one of the main tenants of the repository pattern: decoupling with the database. What we gained though was tons of flexibility and quite less code if we think what the same functionality would take if it was done using some sort of query object implementation. After all, programming is just about trade offs, and picking the ones that are better for your specific case. That’s why technical books can never be read as recipes, but sources of knowledge base.

Enough with small talk, time to see some actual code. Please note that I won’t post written code here (just pictures, since they are a lot easier to manage) - but don’t worry, you can download the entire solution on github, which includes the repository implementation and an extra project with a very easy to get example of usage.

For the forthcoming sections I’ll assume you have a good understanding of the C# language, its functional constructs, lambda expressions, and template (aka Generics) implementations - besides solid OO concepts, of course.

The heart of this repository implementation is its main interface:

image

The idea here is pretty simple:

  • Call "Find" passing an expression to get whichever entries you want. Note that you are going to refer to the DTO on the expression, but you are going to get a list of your model given entity or aggregate root. For the sake of keeping the repository’s client code as simple as possible, there are methods to find a list of records and a single specific record. If you try to call "FindSingle" with a condition that matches more than one entry on your database an exception will be thrown.
  • Call "Save" passing an entity to persist it. If it isn’t already on the database, it’s going to be created, otherwise the existing entry will be updated. Note that you can also pass an optional transaction to control transactions - in this case, the operation will occur inside the transaction, and you can commit or rollback it later.
  • Call "Delete" to remove an entity from the database. Note that similarly to the "Save" method, you can pass an optional transaction to handle transactions.
  • The repository is defined using two template parameters (I know it should be called Generics in C#, but I got so used to call it Template back in the C++ days…): the "TRow" should be mapped to a LINQ DTO, while the "TEntity" should be mapped to an entity of your model.

The repository project defines a basic abstract implementation of this interface, so that, if you have a decent enough database (each entity mapped to a table) you gain some pretty nice functionality out of the box. I’m not going to get into the details of the basic repository implementation since it’s meant to just be used as is.

What’s next is an example of what you would need to do to create a repository specific for one of the entities on your model:

imageI Believe that the "BeforeSave" and "AfterSave" methods should be self explanatory. But the "FindEntity" probably is not - the idea behind it is to establish a way to find the specific record that represents an in memory model entity on the database - so that the "Save" and "Delete" template methods can do their jobs.

An example of how to actually use the repository shall be worth a couple thousand words, so there it goes: imagePlease note that the code shown above is just for the sake of exemplifying what can be accomplished with the proposed repository implementation - I mean, don’t ever write unit tests like that on your projects, each unit test should test one specific piece of functionality.

And that’s pretty much it, feel free to download the source code on github and check the whole sample usage project (including all the snippets used in this article).

Feel free to criticize and leave your remarks on the commentaries - we’d love to hear your opinion.

Tags: , , , , , , , , , , , , , , ,

3 People have left comments on this post



» Koiti Takahashi said: { Jul 4, 2009 - 09:07:06 }

Although it breaks an important characteristic of repositories (complete separation of data and model), the gain earned when using lambda expressions for querying objects really worths it. Since you have a well defined data mapping with Linq to SQL and you are sure that you won’t change your DBMS, this solution is a considerable option if you’re looking for NHibernate or Castle ActiveRecord.

Hey, MACSkeptic, we should name it!

» macskeptic said: { Jul 4, 2009 - 05:07:12 }

If it was my own project I would surely name it some GUNDAM acronym - maybe something like “General Unified Neat Database Active Mapping”.

Since that’s not the case, we surely need to think of a name, too bad we all seem to suck at that.

» Narazana said: { Nov 28, 2009 - 10:11:52 }

Is there any VB.NET version of this ? Thank you.