What Is Database Normalization?

Normalization is a multi-step process used to organize a relational database into tables and columns and to:

  • Eliminate redundant (useless) data.
  • Ensure data dependencies make sense
  • Eliminate undesirable characteristics like insertion, update and deletion anomalies

Why would you want to normalize your database?

The purpose of normalization is to reduce the amount of space a database consumes and ensure that data is logically stored. Database normalization is a manual process of rearranging the data, and it requires time and effort, so why would you do it?

Data with duplicate values leads to anomalies, but when information is stored in one place and one place only, the possibility of inconsistent data is reduced.

Properly normalized database will:

  • require less storage space.
  • be easier to handle and update, without facing data loss.
  • be more informative to users

Are there any disadvantages of normalization?

After normalization, data is not duplicated, so table joins are required. This makes queries more complicated. Since joins are required, queries can take more time - especially if indexes aren't well thought out.

For read-intensive applications it may make sense to denormalize tables in strategic places to ensure performance level we need.

What are normal forms?

Normal forms are rules that transform the data into a certain structure. Every normal form is achieved by following its specific rules. By transforming the data into the higher normal form, it becomes more normalized. The more normalized it is, less duplication we get - but it is usually perfectly fine to allow some duplication instead of complicating our applications and schema for conceptual purity.

Informally, there are 11 normal forms, but first three are usually enough to describe relation database as "normalized."

  • UNF: Unnormalized form
  • 1NF: First normal form
  • 2NF: Second normal form
  • 3NF: Third normal form
  • EKNF: Elementary key normal form
  • BCNF: Boyce–Codd normal form
  • 4NF: Fourth normal form
  • ETNF: Essential tuple normal form
  • 5NF: Fifth normal form
  • DKNF: Domain-key normal form
  • 6NF: Sixth normal form

What are undesirable anomalies you want to avoid by normalization?

Table Invoice: 0NF
  1. An update anomaly is a data inconsistency that results from data redundancy and partial update. In the table, we have two rows for client Liam as he bought two products. If Liam changes his phone number we have to update it in two rows else Liam will appear to have two different phone numbers. Omission to update the Liam's phone number in all rows leads to inconsistent data.
  2. A deletion anomaly is caused by unintended loss of data due to deletion of other data. If Liam doesn't want to be this shop’s client anymore and the owner deletes him in only one instance (one row of the shown table), then deletion is incomplete because it leaves some residual instances. More importantly, if the owner wants to delete all of the Liam's orders for some reason, then all of the Liam's basic data (name, phone) will also be lost.
  3. An insertion anomaly is the inability to insert data to the database due to an absence of other data. Let's suppose that the owner wants to add a new product to the shop's inventory - he or she would not be able to insert the product data into the table until someone buys it first.

First normal form

The table will be in 1NF when you apply the first rule of normalization:

1.An attribute (column) of a table cannot store multiple values. It should store only atomic values.

Any column of your table should not contain multiple values. Non-key attributes must be functionally dependent on the primary key.

2.There are no repeating groups of columns

Avoiding repeating groups means that a table should not contain repeating groups of columns such as [Client_name1],[Client_name2], [Client_name3],...

Table Invoice: 0NF

Invoice (INVOICE_ID, CLIENT_ID, CLIENT_NAME, CLIENT_PHONE, PRODUCT_ID, PRODUCT_NAME, PRODUCT_PRICE, QUANTITY, AMOUNT, TOTAL_AMOUNT)

(*note that the underlined "INVOICE_ID" is a convention for pointing out that the column is a part of the primary key for that table)

Our table satisfies the second rule, but not the first one.

Primary key is "INVOICE_ID", so every other non-key attribute must functionally be dependent on the "INVOICE_ID".

This table is not in first normal form because the "PRODUCT_ID" isn't functionally dependent on "INVOICE_ID".

To satisfy the first normal form,  you have to create a new table with all non-key attributes that don't have functionally dependent on the primary key with the primary key of an original table.

First, you have to be able to uniquely identify a row.

There could not be two invoices with the same ID, so "INVOICE_ID" uniquely identifies a row. "INVOICE_ID" is a primary key for table Invoice.

As well, there could not be two products with the same ID, so "PRODUCT_ID" uniquely identifies a row. "PRODUCT_ID" is a primary key for table Product. Table Product should have "CLIENT_ID"  as a foreign key.

  1. Table "Product " with primary key "PRODUCT_ID":

Product (PRODUCT_ID, PRODUCT_NAME)

Table Product: 1NF

2. Table "Invoice" with primary key "INVOICE_ID" and "PRODUCT_ID"  as a foreign key.

Table Invoice: 1NF

Invoice (INVOICE_ID, PRODUCT_ID, CLIENT_ID, CLIENT_NAME, CLIENT_PHONE, PRODUCT_PRICE, QUANTITY, AMOUNT, TOTAL_AMOUNT)

Entity relationship diagram (ERD)

The ERD diagram represents that one Invoice can have minimum one and maximum many products. And a product can be on minimum one and maximum on many Invoice.



Second normal form

The table will be in 2NF when you apply the second rule of normalization:

1. The table should be in the First Normal Form

2. There should be no Partial Dependency

Take a look at our table "Invoice".

Table Invoice: 1NF

We've already known that table is in first normal form.

There should be no Partial Dependency means that all non-key attributes are fully functionally dependent on the whole primary key, not only on the part of the primary key.

Invoice table has a composite primary key INVOICE_ID and PRODUCT_ID. Non-key attributes are CLIENT_ID, CLIENT_NAME, CLIENT_PHONE, PRODUCT_PRICE, QUANTITY, AMOUNT and TOTAL_AMOUNT.

PRODUCT_PRICE doesn't depend on INVOICE_ID or PRODUCT_ID.
CLIENT_ID, CLIENT_NAME, CLIENT_PHONE only depend on INVOICE_ID,  part of the primary key. Therefore, the Invoice table does not satisfy a second normal form.
QUANTITY, AMOUNT and TOTAL_AMOUNT don't depend on INVOICE_ID or PRODUCT_ID.

To satisfy second normal form,  you have to move CLIENT_ID, CLIENT_NAME, CLIENT_PHONE in a new table with CLIENT_ID as a primary key.

Still, you have to know which client owns which invoice.

You will achieve it by saving the CLIENT_ID into "Invoice" table as a foreign key.

Invoice (INVOICE_ID, PRODUCT_ID, CLIENT_ID, PRODUCT_PRICE, QUANTITY, AMOUNT, TOTAL_AMOUNT)

Table Invoice: 2NF

Client (CLIENT_ID, CLIENT_NAME, CLIENT_PHONE)

Table Client: 2NF

Entity relationship diagram (ERD)

The ERD diagram represents that one Invoice can have minimum one and maximum many products. And a product can be on minimum one and maximum on many Invoice.

One product can be bought by a minimum one and maximum many clients. One Client can buy a minimum one and maximum many products.

While one client can have minimum one and maximum many invoices, one invoice can belong to one and only one client.


Third normal form

For a table to be in the third normal form :

  1. Table has to be in the Second Normal form.
  2. Table should not contain transitive dependency on the primary key.

Table Invoice: 2NF

If column B depend on column A and column C depend on column B, Therefore, column C has a transitive dependency on C.

As all non-primary-key fields depend on a primary key, you have to search do any non-primary-key fields depend on another non-primary key?

As AMOUNT is calculated by PRODUCT_PRICE   and QUANTITY, then it depends on them.

Move the AMOUNT to another table "Invoice details", along with everything it identifies. Use INVOICE_ID and PRODUCT_ID as a composite primary key made up from foreign keys "Invoice" and "Product" tables.

Table Invoice details: 3NF

Now, all data are stored in one place only, and your tables don't have insertion, update or deletion anomalies.

Entity relationship diagram (ERD)

The ERD diagram represents that one Invoice can have
One product can be bought by a minimum one and maximum many clients. One Client can buy a minimum one and maximum many products.

While one client can have minimum one and maximum many invoices, one invoice can belong to one and only one client.

One product can be on minimum one and maximum many invoice details, but one invoice details is written for one and only one product.

One invoice can have minimum one and maximum many invoice details, but one invoice details is written for one and only one invoice.