Database NORMALIZATION is used to reduce data redundancy and eliminate undesirable characteristics like Database Anomalies (Update anomaly, Insertion anomaly, Deletion anomaly). This also divides larger tables into smaller tables and links them using relationships and follows simple data structure design rules to make the data easy to understand, maintain and query.
Rule of Clarity – Clarity is better than cleverness.
Rule of Simplicity – Design for simplicity; add complexity only where you must.
Rule of Transparency – Design for visibility to make inspection and debugging easier.
Rule of Robustness – Robustness is the child of transparency and simplicity.
Here is a list of Normal Forms
The Normal Forms are designed to avoid database anomalies, and they help in following the listed rules seen before.
The database is in First normal form if
EMP NO | LAST NAME | FIRST NAME | DEPT NAME |
1 | Carcia | Cheryl | Maths, Physics |
2 | Aube | Norman | Physics |
3 | Frugman | Leonaid | Botany |
4 | Spivak | Ian | Zoology |
5 | Dohn | Linda | Chemistry |
Here Cheryl Carcia is associated with two departments (Maths & Physics) which is making DEPT NAME non atomic. To bring it to first normal form we need to break it into two tables on basis of Employee Data and Department Data.
Below two tables are holding the atomic values and duplicates removed.
Table: Employee Table
EMP NO | LAST NAME | FIRST NAME |
1 | Carcia | Cheryl |
2 | Aube | Norman |
3 | Frugman | Leonaid |
4 | Spivak | Ian |
5 | Dohn | Linda |
Table: Department table
EMP NO | DEPT NAME |
1 | Maths |
1 | Physics |
2 | Physics |
3 | Botany |
4 | Zoology |
5 | Chemistry |
A table is in 2NF
Since a partial dependency occurs when a non-key attribute is dependent on only a part of the composite key, the definition of 2NF is sometimes phrased as: “A table is in 2NF if it is in 1NF and if it has no partial dependencies.”
EMP NO | DEPT NO | LAST NAME | FIRST NAME | DEPT NAME |
1 | 111 | Carcia | Cheryl | Maths |
2 | 222 | Aube | Norman | Physics |
3 | 111 | Frugman | Leonaid | Maths |
4 | 222 | Spivak | Ian | Physics |
5 | 333 | Dohn | Linda | Chemistry |
Composite key (EMP NO,DEPT NO). LAST NAME and FIRST NAME dependent on EMP NO. DEPT NAME is dependent on DEPT NO. So this table can be broken into two tables as given below
Table: Employee Table
EMP NO | DEPT NO | LAST NAME | FIRST NAME |
1 | 111 | Carcia | Cheryl |
2 | 222 | Aube | Norman |
3 | 111 | Frugman | Leonaid |
4 | 222 | Spivak | Ian |
5 | 333 | Dohn | Linda |
Table: Department Table
DEPT NO | DEPT NAME |
111 | Maths |
222 | Physics |
333 | Chemistry |
A table is in 3NF if it is in 2NF and if it has no transitive dependencies.
A table is in BCNF if it is in 3NF and if every determinant is a candidate key. BCNF does not allow dependencies between attributes that belong to candidate keys. It drops the restriction of the non-key attributes from the third normal form.
EMP NO | RECORD NO | LAST NAME | FIRST NAME | ATTENDANCE |
1 | A001 | Carcia | Cheryl | 100 |
2 | B001 | Aube | Norman | 50 |
3 | C001 | Frugman | Leonaid | 25 |
4 | D001 | Spivak | Ian | 15 |
5 | E001 | Dohn | Linda | 10 |
In the above table EMP NO determines the RECORD NO and RECORD NO determines LAST NAME & FIRST NAME. Hence EMP NO determines LAST NAME & FIRST NAME. RECORD NO determines the ATTENDANCE. This structure is not satisfying the Third Normal Form.
Table: Employee Table
EMP NO | RECORD NO | LAST NAME | FIRST NAME |
1 | A001 | Carcia | Cheryl |
2 | B001 | Aube | Norman |
3 | C001 | Frugman | Leonaid |
4 | D001 | Spivak | Ian |
5 | E001 | Dohn | Linda |
Table: Attendance Table
RECORD NO | ATTENDANCE |
A001 | 100 |
B001 | 50 |
C001 | 25 |
D001 | 15 |
E001 | 10 |
If no database table instance contains two or more, independent and multivalued data describing the relevant entity, then it is in 4th Normal Form.
A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed into any number of smaller tables without loss of data.
6th Normal Form is not standardized, yet however, it is being discussed by database experts for some time. Hopefully, we would have a clear & standardized definition for 6th Normal Form in the near future
A primary is a single column value used to identify a database record uniquely.
It has the following attributes
A composite key is a primary key composed of multiple columns used to identify a record uniquely.
Foreign Key references the primary key of another Table! It helps connect your Tables
Database DE-NORMALIZATION doesn’t mean not normalizing. It’s a step in the process. First we normalize, then we realize that we now have hundreds or thousands of small tables and that the performance cost of all those joins will be prohibitive, and then we carefully apply some denormalization techniques so that our application will return results before the werewolves have overrun the city.
NORMALIZATION | DE-NORMALIZATION |
Database Normalization is the process of dividing the data into multiple tables so that data redundancy and data integrities are achieved. | Database De-Normalization is the opposite process of normalization where the data from multiple tables are combined into one table so that data retrieval will be faster. |
It removes data redundancy i.e. it eliminates any duplicate data from the same table and puts it into a separate new table. | It creates data redundancy i.e.; duplicate data may be found in the same table. |
It maintains data integrity i.e.any addition or deletion of data from the table will not create any mismatch in the relationship of the tables. | It may not retain the data integrity. |
It increases the number of tables in the database and hence the joins to get the result. | It reduces the number of tables and hence reduces the number of joins. Hence the performance of the query is faster here compared to normalized tables. |
Even though it creates multiple tables, inserts, updates and deletes are more efficient in this case. If we have to insert/update/delete any data, we have to perform the transaction in that particular table. Hence there is no fear of data loss or data integrity. | In this case, all the duplicate data are at a single table and care should be taken to insert/delete/update all the related data in that table. Failing to do so will create data integrity issues. |
Use normalized tables where more number of insert/update/delete operations are performed and joins of those tables are not expensive. | Use de-normalization where joins are expensive and frequent query is executed on the tables. |
A well-maintained product backlog is crucial for successful product development. It serves as a single…
Incremental value to the customer refers to the gradual delivery of small, functional parts of…
A Product Market refers to the group of potential customers who might be interested in…
The Professional Agile Leadership - Evidence-Based Management (PAL-EBM) certification offered by Scrum.org is designed for…
The Professional Agile Leadership (PAL I) certification, offered by Scrum.org, is designed to equip leaders…
Choosing the right Scrum Master Certification depends on your current experience and career goals. If…