Saturday, 28 January 2017

MongoDb Schema Design

MongoDB is NoSQL Open Source database. It's document based database.

Let's take an example to store cricket bat object:

Cricket bat table can have following columns:

category, model, name, brand, type, price, length, width, matrial_used etc.

The name of the column equals to the property of bat. Suppose, we have to store gloves. Then we have another table gloves

The gloves table can have following columns:
category, model, name, brand, price, length, width, matrial_used etc

If we want to find for which type of bats which gloves are suitable. We need to identify the relationship between this table. We can add bat_id in gloves table. So Now we have added reference key relationship. Each time, we have to find gloves for a bat, we have to take a Join. Joins are expensive operations.

To Avoid Joins we can duplicate data of gloves into bat column (demoralization), It leads to a nasty sparse table, Where many columns value potentially null.

There have to be better Solutions!

In a Document database, We can do as following:

Cricket_bat = {
                 category: "10KgBat",
                 model: "ImVirat",
                 name: "MRF",
                 .......,
                 gloves:{
                 category:"SoftFabric",
                 model:"Gloves012B",
                 name:"imVirat",
                 ...
            }
}

As we can store bat in form of document(JSON).

Let's take One to One relationship:

Customer entity:

customer = {
    id = 1,
    name: "Prashant",
        ....
}

Address Entity:

address = {
    street_no: "6(A)/2",
    "District": "Aurangabad",
    "State": "Maharashtra",
         ....
}

We can store address into the customer, Assuming each customer can have at most one address:


customer = {
    id = 1,
    name: "prashant",
        ...
    address:{
        street_no: "6(A)/2",
        "District": "Aurangabad",
        "State": "Maharashtra",
                ....
    }
}

One to Many Relationship:

If a customer has much address:

customer = {
    id = 1,
    name: "prashant",
        .....
    address: [{
        street_no: "6(A)/2",
        "District": "Aurangabad",
        "State": "Maharashtra"
                ...
    },
    {
        street_no: "New Dp Road",
        "District": "Pune",
        "State": "Maharashtra",
               ...
    }
 ]
}

Let's take a book management system:
We have two more entities:

publisher = {
    id:"_Oreilly"
    name:"O'Reielly",
    founded:"1980",
    location:"CA"
}

book = {
    id: "_Yevgeniy_Brikman_Hello_startup"
    title: "Hello Startup",
    pages: 216,
    author: "Yevgeniy Brikman",
    language:"English"
}

We have following problem statement:
Book can be checked out by customer one at a time and a customer can check out many books as can

customer = {
    id = 1,
    name: "Prashant",
    address: [{...},{..}],
    checked_out:[
        {
            book_id:"001", checkout_date:"10-11-2016"
        },
        {
            book_id:"002", checkout_date:"10-11-2017"
        }
    ]
}

Alternatively, we can duplicate book data as below:

customer = {
    id = 1;
    name: "Prashant",
    address: [{...},{..}],
    checked_out:[
        {
                   id: "_Yevgeniy_Brikman_Hello_startup"
                   title: "Hello Startup",
                   pages: 216,
                   author: "Yevgeniy Brikman",
                    language:"English"
                   checkout_date:ISODate("10-11-2016")
        },
        {
                    id: "_Joshua_Bloch_Effective_Java"
                   title: "Effective Java",
                   pages: 314,
                   author: "Joshua Bloch",
                   language:"English"
                  checkout_date:ISODate("10-11-2016")
        }
    ]
}

As we can see, It's easy to deal with natural data using MongoDB. In the RDBMS we can't store multiple values in a single row, For that, we have to introduce new table which keeps the track of multiple values with a table. It's easy to store same data in form of document in NoSQL database.

When to use MongoDB

  • For Natural unstructured database.
  • Constantly changing schema
  • Location based database 
    • MongoDB Provides Spatial function which used to filter data by location
  • When your data is really big.(Above 1TB per table) 
  • Scaling: Using mongo Sharding, it's easy to replicate data.
When not use MongoDB
  • When the schema is very tightly coupled.
An example of unstructured schema: Consider, you want to store the product catalog into the database for a e-commerce website.

In mongoDB you can store as below:

product_category:
{
 Name: "Mobile",
 description: "",
 status: "",
sub_category:  [
     {
          name: "Sony",
         description: ""
     },


     {
          name: "Moto",
         description: ""
     } 

 ]

},

{
 Name: "TV",
 description: "",
 status: "",
sub_category:  [
     {
          name: "LG",
         description: ""
     },


     {
          name: "Videocon",
         description: ""
     } 

 ]

}

Now try to store above category into MySQL, your table should contain "parent" column to store nested sub_category. Because of the document structure, MongoDB makes Life Easy!










2 comments:

  1. Dude Nice Article, but why Mongodb over SQL db that was my question ?

    ReplyDelete
  2. Thank you!
    @Sumit Pawar, If you consider, the last example which I have mentioned about category "product_category", It's very easy to store such nested, natural and unstructured data in form of Document. Yes, We can store same data into MySQL with more efforts. As MongoDB don't care about constraints (ACID property) makes MongoDB faster that MySQL. Read about "Sharding" in MongoDB which helps for horizontal scaling. To summarize It's up to application requirement to choose between SQL of NoSQL E-commerce application would be one use case for MongoDB. In a case of Banking related application where the schema is tightly coupled MySQL will be sensible.

    I hope, I have clear your point.

    ReplyDelete