Normalisation is the process of using a wide table with many columns and few rows and reconstructing that table into several subset tables with fewer columns and more rows. The objective is to use storage space efficiently, eliminate redundant data, reduce or eliminate inconsistent data and ease future maintenance. The most important rule of normalisation is that the normalised data must be reconstructable into the original unnormalised or flat spreadsheet-like view of the data.
The forms of normalisation are called normal forms.
First Normal Form - removal of repeating groups.
Second Normal Form - no non-key attributes must depend on a portion of the primary key.
Third Normal Form - no attributes must depend on other non-key attributes.
A database must be in first normal form to be in second normal form and in second normal form to be in third normal form. There are additional normal forms but these additional normal forms are rarely applied. In fact it is sometimes best not to apply third and even second normal forms in order to preserve processing speed and efficiency. The concept of a data warehouse is effectively a partially denormalised database. In fact the process of denormalisation into data warehouses exhibits where relational database technology falls short of practicality. This is one possible area of implementation where object databases may become more widely used in the future.
By definition, First Normal Form eliminates repeating groups. What this means is that a separate table must be made for each set of related attributes and give each table its own primary key. The best way to explain this is by example. Let us say we have a customer database where customers make orders. In a spreadsheet we would have the customer name and address repeated for every order. Application of first normal form would divide the customers and orders into two separate entities or tables. The customer table would contain customer details without order details and the order table would contain order details with only a reference to the appropriate customer.
Second Normal Form eliminates redundant data. Thus if an attribute relies on only part of a multi-valued key, that attribute must be removed to a separate table. More specifically the Second normal form requires that no non-key attributes are dependant upon a portion of the primary key. A primary key uniquely identifies a instance within an entity. Second normal form applies to tables where constituents of the primary key effectively refer to repeating groups within the same table. These duplications should be removed to an additional table. For instance, if a student takes a course then the course that the student is taking is uniquely identifed by the student and the course as the primary key. Within this entitiy we also have descriptions of the course. Therefore the course decriptions will make up the contents of the additional entity and the descriptive details of the course would be removed from the table containing the student details. The result would be two tables, one with student details and the other with course details.
Third Normal Form eliminates columns not dependant on a table's primary key. This translates to the fact that if an attribute does not contribute to the description of a key then it must be moved to another table. Third normal form requires that all columns in a table contain data about the entity that is defined by the primary key. In other words each entity should have attributes applicable to itself only.
Fourth and Fifth Normal Forms are rarely implemented.
Fourth Normal Form eliminates multiple independant relationships. Thus no table can have more than a single many-to-one or many-to-many relationships which are not directly related.
Fifth Normal Form requires isolation of logically related many-to-many relationships.
Using normalisation extensively can seriously affect database performance. What is theoretically elegant and mathematically correct is not always acceptable or practical in a commercial environment.
Due to the fact that I have personally had many years experience designing and constructing relational databases I tend to ignore the seperate steps of first, second and third normal form. I usually simply jump to a combination of all three by using all three at once. Normalisation is usually best explained by example. Examples will ensue in following pages.