Learning: database

Monday, 5 June 2017

Optimize SQL Query

We generally have dev server and production Server. Usually, Dev server has less data(test data). While working on a Complex application, Simple Query may need to joins of 6 to 8 tables, each table might have 70 to 80 columns.

In such cases, for fewer data, query gives result in less time, When the same query is executed on Production Server, We face performance Issue. To Optimized query, we have to look into depth, how a query is working internally. How many tables getting used? How many columns getting fetch? Does indexing works?

EXPLAIN Keyword is to find all the answer, for all the above question.

While using INNER JOIN, We should follow thumb rule as The Entity (table) which has less volume should be on left side and other should be on the right side.

mysql> select count(*) from employees;
+----------+
| count(*) |
+----------+
|   300024 |
+----------+
1 row in set (0.09 sec)

mysql> select count(*) from dept_emp;
+----------+
| count(*) |
+----------+
|   331603 |
+----------+
1 row in set (0.10 sec)

mysql> select count(*) from employees e JOIN dept_emp ed where e.emp_no = ed.emp_no;
+----------+
| count(*) |
+----------+
|   331603 |
+----------+
1 row in set (1.11 sec)

mysql> select count(*) from dept_emp ed JOIN employees e where e.emp_no = ed.emp_no;
+----------+
| count(*) |
+----------+
|   331603 |
+----------+
1 row in set (1.19 sec)

The reason is obvious as INNER JOIN gives a result only if a condition is matched so the record should be in both tables, So we should keep a table with less volume on Left Side.

So, now we know, How to identify, optimized query! using MYSQL Execution Plan.

Happy Learning!

Saturday, 28 January 2017

MongoDb Schema Design

MongoDB is NoSQL Open Source database. It's document based database.

Let's take an example to store cricket bat object:

Cricket bat table can have following columns:

category, model, name, brand, type, price, length, width, matrial_used etc.

The name of the column equals to the property of bat. Suppose, we have to store gloves. Then we have another table gloves

The gloves table can have following columns:
category, model, name, brand, price, length, width, matrial_used etc

If we want to find for which type of bats which gloves are suitable. We need to identify the relationship between this table. We can add bat_id in gloves table. So Now we have added reference key relationship. Each time, we have to find gloves for a bat, we have to take a Join. Joins are expensive operations.

To Avoid Joins we can duplicate data of gloves into bat column (demoralization), It leads to a nasty sparse table, Where many columns value potentially null.

There have to be better Solutions!

In a Document database, We can do as following:

Cricket_bat = {
category: "10KgBat",
model: "ImVirat",
name: "MRF",
.......,
gloves:{
category:"SoftFabric",
model:"Gloves012B",
name:"imVirat",
...
}
}

As we can store bat in form of document(JSON).

Let's take One to One relationship:

Customer entity:

customer = {
id = 1,
name: "Prashant",
....
}

Address Entity:

address = {
street_no: "6(A)/2",
"District": "Aurangabad",
"State": "Maharashtra",
....
}

We can store address into the customer, Assuming each customer can have at most one address:

customer = {
id = 1,
name: "prashant",
...
address:{
street_no: "6(A)/2",
"District": "Aurangabad",
"State": "Maharashtra",
....
}
}

One to Many Relationship:

If a customer has much address:

customer = {
id = 1,
name: "prashant",
.....
address: [{
street_no: "6(A)/2",
"District": "Aurangabad",
"State": "Maharashtra"
...
},
{
street_no: "New Dp Road",
"District": "Pune",
"State": "Maharashtra",
...
}
]
}

Let's take a book management system:
We have two more entities:

publisher = {
id:"_Oreilly"
name:"O'Reielly",
founded:"1980",
location:"CA"
}

book = {
id: "_Yevgeniy_Brikman_Hello_startup"
title: "Hello Startup",
pages: 216,
author: "Yevgeniy Brikman",
language:"English"
}

We have following problem statement:
Book can be checked out by customer one at a time and a customer can check out many books as can

customer = {
id = 1,
name: "Prashant",
address: [{...},{..}],
checked_out:[
{
book_id:"001", checkout_date:"10-11-2016"
},
{
book_id:"002", checkout_date:"10-11-2017"
}
]
}

Alternatively, we can duplicate book data as below:

customer = {
id = 1;
name: "Prashant",
address: [{...},{..}],
checked_out:[
{
id: "_Yevgeniy_Brikman_Hello_startup"
title: "Hello Startup",
pages: 216,
author: "Yevgeniy Brikman",
language:"English"
checkout_date:ISODate("10-11-2016")
},
{
id: "_Joshua_Bloch_Effective_Java"
title: "Effective Java",
pages: 314,
author: "Joshua Bloch",
language:"English"
checkout_date:ISODate("10-11-2016")
}
]
}

As we can see, It's easy to deal with natural data using MongoDB. In the RDBMS we can't store multiple values in a single row, For that, we have to introduce new table which keeps the track of multiple values with a table. It's easy to store same data in form of document in NoSQL database.

When to use MongoDB

For Natural unstructured database.
Constantly changing schema
Location based database

MongoDB Provides Spatial function which used to filter data by location

When your data is really big.(Above 1TB per table)
Scaling: Using mongo Sharding, it's easy to replicate data.

When not use MongoDB

When the schema is very tightly coupled.

An example of unstructured schema: Consider, you want to store the product catalog into the database for a e-commerce website.

In mongoDB you can store as below:

product_category:

{

Name: "Mobile",

description: "",

status: "",

sub_category: [

{

description: ""

{

description: ""

}

]

{

Name: "TV",

description: "",

status: "",

sub_category: [

{

description: ""

{

description: ""

}

]

}

Now try to store above category into MySQL, your table should contain "parent" column to store nested sub_category. Because of the document structure, MongoDB makes Life Easy!

Scalare

Monday, 5 June 2017

Optimize SQL Query

Saturday, 28 January 2017

MongoDb Schema Design