Access Database Normalisation levels 1, 2, 3; geting normal about it!
The process of Database Normalisation was developed by E.F.Codd who is widely considered the father of relational database theory.
There are several rules which provide theoretical structures and disciplines which are not always practical to follow but help provide the main goals which are:
- Eliminate redundant information
- Increase data integrity
- Make systems more efficient
Modern databases should be in BCNF Boyce-Codd Normal Form which is deemed to be at third normal form of which there are considered to be five in all. This article focuses on what I believe is considered a good balance to applying some of these rules and covers up to the third norm of database normalisation.
Before the First Normal Form refers to several tables managed into smaller units that have a potential relationship and have the following attributes:
- Each table is described as one entity so that an example of an ‘Order Processing’ database may have several tables which would be divided into logical units (tables) including Customers and Orders and not stored in one complete table (known as a flat-file).
- Each record in a table is not duplicated so that a field in a record can be set as unique – primary key.
- The fields (columns) are in no particular order.
- The records (rows) are in no particular order.
The key is the key and each table should have a primary key set though this is optional in Access it is highly recommended that you at least set this type of key for each table you intend to relate.
First Normal Form (1NF) says that all column values must be atomic. Atomic means indivisible and refers to row-by-column position where you should have multiple records (or repeating groups) and not all values stored in one record. For example, an order raised for a customer may contain one or more items. You shouldn’t store the order in one row for multiple items as you would have a difficult time retrieving information from the record.
By storing each item in a separate record with a common link to the ID field follows the principle of the First Normal (INF) of database normalisation. Even if there is no unique ID in the illustration above, you can either add a unique audit trail ID field and set the primary key or consider the composite of the two fields which will make it unique too i.e. combining the ‘Order ID’ and ‘Order Item Id’ fields. This will help with writing queries later on where you can filter by an item and more importantly calculate with the ‘Quantity’ field.
Second Normal Form (2NF) This follows on from the First Normal (1NF) level where by tables should store data relating to only one thing (entity) and that this entity should be fully described by it’s primary key.
For example, an Orders database system may contain the following four tables:
- Customers which has a primary key (Customer ID) and the data relates to the customer profile namely name, address and contact details.
- Orders which holds the order header information including order reference (as the primary key), date and shipping information but not the details for each item.
- Order Details which has a secondary key (Order ID as defined from the Orders table) and the data relates to the main order header showing the items of the order namely product, price and quantity.
- Products which has a primary key (Product ID) and the data relates to the product profile namely name, price and stock levels.
Focusing on the ‘Orders’, ‘Order Details’ and ‘Products’ tables, there is a relationship between them that observe this level of database normalisation where each record is distinct and the data relating to only one entity. As an order is placed (by the customer), it is assigned a unique Order ID being the primary key. The second table records this unique ID but is deemed the secondary key for the Order Details table as it can not be unique. This is because this order may have several items (products) and is stored in the Order Details table along with price and quantity. The product information comes from the Products table which has a unique value known as the Product ID.
Therefore, the moving table ‘Order Detail’ sits between the ‘Order’ and the ‘Product’ so that database normalisation allows you to re-use the same order for multiple items and re-use the same product across different orders and is only stored as a single record once in the outer tables. Having only two or even the single table would de-normalise the database and prove very difficult to run reports efficiently.
Note: The only real point to remember when achieving this level of database normalisation is how tables will relate to each other using the simple one-to-many relationship. One order, many items and one product used many times. Natural groups of repeating records are the key here that help split data sets into small and more manageable units.
Third Normal Form (3NF) This follows on from the Second Normal Form of database normalisation and deals with descriptive information of a primary key field.
Using the ‘Customers’ table which has a ‘Customer ID’ field (primary key) has a field called ‘Company Name’ which is the narrative/description identifying what the customer is known by. You wouldn’t store the company name each time in the ‘Orders’ table thus repeating data value as the ‘Customer ID’ field controls this. Therefore, if the company changes its name, then it is modified once in one place. There can however, be a problem should you need to retain historical records of previous names and an additional moving may need to intercede.
The ‘Products’ table also has the same relationship of the ‘Product Name’ field with the ‘Product ID’ being the unique key and is only stored in the ‘Products’ table. This rule also deals with dependency fields which are calculated fields. There is no benefit to store the ‘Order Total’ for a transaction if it can be determined by calculating the ‘Quantity’ by ‘Price’. Storing extra unnecessary fields i.e. Total Order adds extra overhead to a table and can affect performance. Using Queries, Forms and Reports to calculate a new field is the better approach and therefore only raw data should be stored in a table.
Note: Sometimes violating this level of database normalisation regarding calculated fields can be a good balance between performance and usability.
Having the ‘Order Total’ stored as a real field can help with the design process for queries, forms and reports as it can reduce the need to have multiple tables in a relationship making it easier for development. Don’t get too wrapped up with the theory. Try and achieve to get to a database normalisation of level three as it makes common sense when designing tables and fields. There are higher levels (four and five) which must have the previous levels enforced first as each level follows on in the hierarchy. If you feel there are inadequacies with the first three levels, then take to the next level (which is not covered here – search the web!) Finally, remember the main normalisation objective. I found an article that I came across many years ago which emphasised these rules which keeps all the theory in perspective…
“The key, the whole key, and nothing but the key, so help me Codd.”
About the author
Ben Beitler has been building, consulting and training with Microsoft Access databases for over 17 years and is an expert in this field. Being based in London, UK he continually spends his time helping companies design, build and impement Access databases teaching users along the way.
Article Source: http://EzineArticles.com/?expert=Ben_S_Beitler
This article is for the more ‘techno’ user but new users may have or will come across the term ‘Normalisation’ and how this applies to any database system.
The author is me! no need to explain if I agree and need to add anything further!