Logic in the database, to be or not to be?
Recently I have participated to a couple of discussions about whether the
stored procedures should be used or not in a “real” project, so I decided to write this post to express my point of view. This said, I warn you, there is no a “right” answer, but most probably a “pro” and “cons” about anything written in this post.
An old problem
There are more than a handful of developers who are not really clear (from the architectural point of view) when to use triggers, views, stored procedures,etc… let’s just call this “database logic”.
Being so easy to change, and so close to the data, one is really tempted to go to the “source” and start coding.
I think that conceptually all developers agree with “using the right tool for the right job” rule, but when it comes to put this law into a practice, then is where all the problems begin.
I think that in general there are three types of developers:
- Database oriented developers: Usually this is someone that is very familiar with the database and thinks that the database is not merely a repository, but as well the place where to put all the logic.
- Code oriented developers: Would place everything outside the database (code, ORM, etc..). Usually this is a developer that is in love with ORM mappers, either because this is a trend or simply because don’t like working with SQL, because SQL is an old thing, isn’t it?
- Pragmatic developers: A thinker that applies the knowledge in order to get the best out of the two worlds.
Which of the three groups best describes you?
Layered architecture to the resque
Before start answering the original question, I have to mention the
layered architecture. There are many types of applications out there, but it’s possible to have some sort of a formally accepted layers, otherwise we are going to have a monolithic application.
There are many ways of naming those layers depending to who are you talking too, but in general we may distinguish the following:
A multilayered (software) architecture is using different layers for allocating the responsibilities of an application. share this quote on twitter
The presentation layer provides the application’s user interface (UI). Typically, this involves the use of Desktop/Mobile GUI (Windows Forms, IPhone/Android Shells etc..) or Web UI (ASP.NET, ASP.NET MVC, HTML,..) technologies for browser-based interaction.
The business layer implements the business functionality of the application. The domain layer is typically composed of a number of components implemented using one or more programming languages. These components may be augmented with distributed component solutions other kind of workflow orchestration.
The data layer provides access to external systems such as databases. When talking about Microsoft.NET the primary .NET technology involved at this layer is ADO.NET, but there are many other possibilities like using ORM mappers (Microsoft Entity Framework, NHibernate, etc..) or some other kind of “lightweight” database access libraries like PetaPoco, Massive, or others…
A repository of data.
Various types of architectures.
As we have seen above, there are several layers usually used within one application, and each layer has its own responsibility. The hearth of the application is the “Business Layer” because this is something that gives the meaning to the application.
Unfortunately things are not that easy, because, yes, we may layer our application, but not every application is layered in the same way.
Let’s simply name a few typical kind of applications:
- Client-Server applications: Usually this kind of application should contain all of the layers described above, but there are several variations depending on the complexity of the application. The point is that the database is running on a separate server from the GUI itself.
- Reporting applications: Usually simply display the data to the user in forms of reports. Typically here we can find the Presentation layer, but the business layer usually would stay in the database directly
- Web Application: A web application could be seen as a Client-Server application because at it’s heart is executing the code on the server and the database would be typically on another server.
The definition of the scalability is:
A characteristic of a system, model or function that describes its capability to cope and perform under an increased or expanding workload. A system that scales well will be able to maintain or even increase its level of performance or efficiency when tested by larger operational demands.
I really like this topic, and I think that is really important when it comes to the decision of putting the important part of the logic in the database.
In general, RDBMS databases are not scalable!
Why RDBMS are not scalable?
Let’s put it this way: The CAP theorem states that a distributed (i.e. scalable) system cannot guarantee all of the following properties at the same time:
- Partition tolerance
RDBMS systems guarantee consistency. That’s why a standard RDBMS cannot scale very well: it won’t be able to guarantee availability. And what good is a database if you can’t access it?
Database pioneer and researcher Michael Stonebraker co-wrote a paper that discusses the limitations of traditional database architectures. RDBMS scale up with more expensive hardware, but have difficulty scaling out with more commodity hardware in parallel. He contends that the BigData era requires multiple new database architectures that take advantage of modern infrastructure and optimize for a particular workload.
You may see straightforward one big limitation of the databases itself, as in some larger applications those can become a bottleneck, especially in the newest era of computing (see big data, social platforms like twitter, Facebook, google, that are mainly not using RDBMS’s for keeping their data)…
What is the alternative?
The alternative for some enterprise applications, is to have very well separated layers, and well separated tiers, that would be designed for scalability.
Adding new “Web services” by increasing a number of servers where the most of calculations (business logic) would be placed, we remove the burden from the database that would be then able to do what it does best, collect and store the data in an consistent way, otherwise if we would put the calculation logic in the database, that would need to server thousands of concurrent transaction, this would soon be unusable.
That’s why it is very common to have the “reading” database separated from the “writing” database as shown in the image below.
I think I just gave you an idea of why I think that the database should not contain business logic. This is based on my experience working on some large applications, where i see that the database becomes more increasingly a bottleneck and after some time it is really hard to detach and refactor the logic to some other layers, as there is another not written rule of “don’t change it if it works”. I am sure there is ton of people that doesn’t think it in this way (please comment this post then!), and there are tons of other reasons that would go against what described here. You decide ultimately!
Let me know what do you think?