MongoDB – Thinking in Documents
As we all know document databases (such as MongoDB) are very different from the so-got-used-to Relational Databases (RDBMS).
In RDBMS when designing the database, one goes through the process of the so called data normalization. Normalization involves arranging attributes in relations based on dependencies between attributes, ensuring that the dependencies are properly enforced by database integrity constraints. Normalization is accomplished through applying some formal rules either by a process of synthesis or decomposition.
- Synthesis creates a normalized database design based on a known set of dependencies.
- Decomposition takes an existing (insufficiently normalized) database design and improves it based on the known set of dependencies.
Once done, what is the result of this is a nice set of tables, interlinked through foreign keys, where the redundant data are banned from existence, unless needed for some edge-case-scenario (At least this is how it should be :))
In MongoDB (or any other document database) above stated should not be strictly followed as some very complex data can be “packed” into a single document. A document in reality can be a set of sub-documents, and all of it would just work seamlessly. What can be expressed with many inter-related tables in a relational database, it can simply be one type of a document in a document database.
MongoDB supports mainly two ways of representing documents, by Referencing documents ( a bit like in a relational database) , and embedding documents.
Referencing Documents in MongoDB
MongoDB permits the referencing documents is very similar to the data normalization in the RDMBS, where the tables are linked by the foreign key. In MongoDB in this sense is not any different.
What is very different is that this relationship is not enforced in any way by the database itself, and the relationships are handled fully by the application code itself.
When embedding documents we are de-facto combining all of the parts into one bigger unit, a document itself.
The same example as used above can be represented in a different way.
We can see that the document now contains all the data aggregated.
Document Design Strategy
As we have seen, there are mainly two ways of representing the relationships. However, the need for one or the other would need to be carefully weighted as it obviously can have some side effects. As a rule of thumb the following can be recommended:
Embed as much as possible
Document database should eliminate quite a lot of joins, and therefore the very option we have is to put as much as possible in a single document. In this way, the really great advantage is that the saving and retrieving document is atomic and very fast (See below. Consistency). There is no need to normalize data. Therefore “embed” as much as possible, especially the data that is not being used by other documents.
Normalize data that can be referred to from multiple places into its own collection. This means, create reusable collections (i.e.: country, user, etc.). In this way is definitively more efficient way to handle duplicate values in only one place.
Keep in mind that the maximum document size in MongoDB is of 16MB. The limit is imposed mainly in order to ensure that a single document cannot use excessive amount of RAM or bandwidth. 16MB is quite a large quantity of data (just think how much data is usually displayed on a single web page). In most of the case this limit is not a problem, however it’s good to keep it in mind and avoid premature optimizations.
Complex data structures and queries
MongoDB can store arbitrary deep nested data structures, but cannot search them efficiently. If your data forms a tree, forest or graph, you effectively need to store each node and its edges in a separate document
MongoDB makes a trade-off between efficiency and consistency. The rule is changes to a single document are atomic, while updates to multiple documents should never be assumed to be atomic. When designing the schema consider how to keep your data consistent! Generally, the more that you keep in a document the better, as referred in the first point of this list.