Data Normalization in Relational Databases

Prakhar S
6 min readFeb 4, 2022

--

https://phoenixnap.com/kb/database-normalization

Data Normalisation is a bottom-up technique for database design, as opposed to Entity-Relationship Diagrams, which are a top-down technique for the same. These two approaches generally complement each other, and a balanced approach to database design will use both these approaches.

Data normalisation is the technique of organising data into multiple related tables, to minimize Data Redundancy.

But what is Data Redundancy and why should one remove it?

Data Redundancy is the repetition of similar data at multiple places. We want to reduce this not just because repetition of similar data increases the size of the databases, but also it creates other issues such as

  • Insertion Anomalies
  • Deletion Anomalies
  • Update Anomalies

Example of Data Redundancy

Consider the below table:

For students id 1, 4 and 5, the Department, Dept_id, Professor name and Professor_id are the same. Lets us see how this leads to the anomalies mentioned above.

  • Insertion anomaly

If we have to insert data for 100 more students having the CS department, we will have to enter the same values for department, dept_id, professor and professor_id 100 times. This repetition is because we are saving different but related data in the same table.

  • Deletion Anomaly

If we delete the data for a student named Tom in the above table, we also end up losing the department and the professor information, if we have not stored this in another table. So all the branch and professor information will also be lost along with the student information, even though we only wanted to delete the particular student record.

  • Update Anomaly

If Professor Moody leaves, and Professor Hagrid is appointed the new head of the CS department, then we will have to update each row for the student in the CS department. If we have 1000 rows for CS, means we need to update 1000 rows, just to change the name of the department head.

How Normalization will solve the problems mentioned above?

Using Normalisation, we will break the above table into 3 different tables as shown below:

Here we see save the student information in the student table, department information in the department table and the Professor information in the Professor table. Lets us now see how this structure helps us tackle the anomalies mentioned above:

  • Insertion anomaly

Now if we want to insert any information for a new student, all we need to provide is the Roll No, Name and Dept_id for the new student in the student table. We do not need to provide repetitive information for department name, Professor id and Professor name

  • Deletion anomaly

If we delete a record for a student from the student table, only the student information is deleted. The department and professor information remains intact in their respective tables.

  • Update anomaly

If we have to update a department head, all we need to do is change the corresponding professor id in the department table, which is only a single row operation and this information will be reflected everywhere else.

So normalization helps us in addressing these anomalies. Now normalisation can be of many forms including :

  • 1st Normal Form
  • 2nd Normal Form
  • 3rd Normal Form
  • Boyes Codd Normal Form
  • 4th, 5th, 6th Normal Form

The last ones are rarely used so I will discuss the first 4 normal forms here.

➢ 1st Normal Form

Every table in a database should at least be in the 1st normal form. To achieve 1st normal form, the database table should follow the following rule:

  • Each column should contain a single value. For example, the following table consists of multiple values in the column for email

which violates this rule

To convert this to 1st normal form, we will transform the above table as below:

2nd Normal Form

For a table to be in 2nd Normal Form :

** The table should be in 1st Normal Form, and

** There should not be any partial dependencies. For example, consider the following table:

Here the employee id and department id can be taken as a composite primary key. But the Location column is dependent only on the department id column. This is an example of partial dependency. To achieve 2nd Normal Form, we need to split the table as below:

Now each non-key column in the above tables has full functional dependency on the primary key.

3rd Normal Form

For a table to be in the 3rd Normal Form :

** It has to be in the 2nd Normal Form, and

** There should not be any transitive dependency for non-prime attributes, which means that all non-prime attributes should depend only on the prime attribute, and X being dependent on Y and Y being dependent on Z should not imply X being dependent on Z. Consider the following table:

Here the subject is dependent on the subject id which in turn is dependent on the student id. To convert this table to the 3rd Normal Form, we need to remove these transitive dependencies as follows:

The resulting 2 tables do not have any transitive dependencies.

Boyce-Codd Normal Form(BCNF)

Boyce-Codd Normal Form, BCNF also known as 3.5 Normal Form, requires that

** The table is in 3rd Normal Form, and

** For any dependency, A ➛ B, A should be a super key.

Consider the following table:

The table above satisfies the 1st NF as every column contains singular values,

It satisfies 2nd NF as there are no partial dependencies and also 3 NF as there are no transitive dependencies.

Here student_id + subject columns can act as composite primary key and the professor column is a non-primary key.

But the subject column can be derived from the professor column, that is, if we know the professor name, we can find the subject name. So the subject is dependent on the professor, even though it is a non-prime key. Hence the above table does not satisfy BCNF. To achieve BCNF, we need to split the table as below:

Here student id will be the primary key for the 1st table and Professor id will be the primary key for the 2nd, and every column is dependent on the super key only.

The above are the normal forms most commonly used in practice. Though 4th and 5th Normal Forms also exist, they are rarely used and hence I am not going into the details of those in this article

Thanks for reading!

--

--