Apr 092017
 
Share...Tweet about this on TwitterShare on FacebookShare on Google+Share on StumbleUponShare on LinkedInPin on PinterestShare on TumblrShare on RedditDigg this

As we all know document databases are very different from the so-got-used-to Relational Databases (RDBMS).

In RDBMS when designing the database, one goes through the process of the so called data normalization. Normalization involves arranging attributes in relations based on dependencies between attributes, ensuring that the dependencies are properly enforced by database integrity constraints. Normalization is accomplished through applying some formal rules either by a process of synthesis or decomposition.

  • Synthesis creates a normalized database design based on a known set of dependencies.
  • Decomposition takes an existing (insufficiently normalized) database design and improves it based on the known set of dependencies.

Once done, what is the result of this is a nice set of tables, interlinked through foreign keys, where the redundant data are banned from existence, unless needed for some edge-case-scenario (At least this is how it should be :))

In MongoDB (or any other document database) above stated should not be strictly followed as some very complex data can be “packed” into a single document. A document in reality can be a set of sub-documents, and all of it would just work seamlessly. What can be expressed with many inter-related tables in a relational database, it can simply be one type of a document in a document database.

MongoDB supports mainly two ways of representing documents, by Referencing documents ( a bit like in a relational database) , and embedding documents.

Referencing Documents

MongoDB permits the referencing documents is very similar to the data normalization in the RDMBS, where the tables are linked by the foreign key. In MongoDB in this sense is not any different.

What is very different is that this relationship is not enforced in any way by the database itself, and the relationships are handled fully by the application code itself.

Embedding Documents

When embedding documents we are de-facto combining all of the parts into one bigger unit, a document itself.

The same example as used above can be represented in a different way.

We can see that the document now contains all the data aggregated.

Document Design Strategy

As we have seen, there are mainly two ways of representing the relationships. However, the need for one or the other would need to be carefully weighted as it obviously can have some side effects. As a rule of thumb the following can be recommended:

Embed as much as possible

Document database should eliminate quite a lot of joins, and therefore the very option we have is to put as much as possible in a single document. In this way, the really great advantage is that the saving and retrieving document is atomic and very fast (See below. Consistency). There is no need to normalize data. Therefore ‚Äúembed‚ÄĚ as much as possible, especially the data that is not being used by other documents.

Normalize Data

Normalize data that can be referred to from multiple places into its own collection. This means, create reusable collections (i.e.: country, user, etc.). In this way is definitively more efficient way to handle duplicate values in only one place.

Document size

Keep in mind that the maximum document size in MongoDB is of 16MB. The limit is imposed mainly in order to ensure that a single document cannot use excessive amount of RAM or bandwidth. 16MB is quite a large quantity of data (just think how much data is usually displayed on a single web page). In most of the case this limit is not a problem, however it’s good to keep it in mind and avoid premature optimizations.

Complex data structures and queries

MongoDB can store arbitrary deep nested data structures, but cannot search them efficiently. If your data forms a tree, forest or graph, you effectively need to store each node and its edges in a separate document

Consistency

MongoDB makes a trade-off between efficiency and consistency. The rule is changes to a single document are atomic, while updates to multiple documents should never be assumed to be atomic. When designing the schema consider how to keep your data consistent! Generally, the more that you keep in a document the better, as referred in the first point of this list.

    Share...Tweet about this on TwitterShare on FacebookShare on Google+Share on StumbleUponShare on LinkedInPin on PinterestShare on TumblrShare on RedditDigg this

    I'm a Software Developer and Solution Architect interested in Software Development, Object-Oriented Design and Software Architecture all this especially bound to the Microsoft.NET platform.Feel free to contact me or know more in the about section

      2 Responses to “MongoDB – Thinking in Documents”

    1. Hi Zoran, thanks for the overview. When would you recommend using a document based DB over an RDMS and vice versa?

      • Hi Jon,
        thanks for posting your comment!

        I don’t think there is a straight answer. as usually “it depends”:)
        Choosing a database system involves expertise and careful consideration of a vast amount of highly technical details and factors, any of which can be an article of its own.

        Some of the considerations:
        – Consistency, availability, and partition tolerance (CAP Theorem)
        – Robustness and reliability
        – Scalability
        – Performant and highly available functioning regardless of concurrent demands on the system
        – Performance and speed
        – Partitioning ability
        – Distributed data partitions of a complete database across multiple separate nodes in order to spread load and increase performance
        – Horizontal (sharding) and/or vertical partitioning
        – Distributability
        – In-database analytics and monitoring
        – Operational and querying capabilities
        – Storage management
        – Talent pool and availability of relevant skills
        – Database integrity and constraints
        – Data model flexibility
        – Database security
        – Database vendor/system funding, stability, community, and level of establishment
        – Familiarity with the technology.

        Some hints:
        SQL vs NoSQL : How to choose: https://www.sitepoint.com/sql-vs-nosql-choose/
        Rdbms vs nosql how do you pick: http://www.zdnet.com/article/rdbms-vs-nosql-how-do-you-pick/
        Microsoft answer to this: https://docs.microsoft.com/en-us/azure/documentdb/documentdb-nosql-vs-sql

        I hope it helps!
        Zoran

    Leave a Reply

    klamankrysta@mailxu.com jannell