Pages

Tuesday 9 April 2013

Precautions while using Clustered and Non-Clustered Indexes on a Table

Precautions while using Clustered and Non-Clustered Indexes on a Table
 
Without an index SQL Server / Oracle has to scan entire tables to return requested data. It is like the index page in a book. You check for the keyword you want to read about in the index and you jump directly to the page where the content belongs, instead of scanning page by page for the material you want to read.
 
Similarly a table index allows you to locate data without the need to scan the entire table. You create indexes on one or more columns in a table to help SQL Server / Oracle find the data quickly in a query.
 
Types of Index
 
1. Clustered Index
 
A clustered index alters the way that the rows are stored. When you create a clustered index on a column (or a number of columns), SQL Server / Oracle sorts the table’s rows by that column(s). It is like a dictionary, where all words are sorted in alphabetical order in the entire book. Since it alters the physical storage of the table, only one clustered index can be created per table.
 
Oracle automatically creates clustered index on primary key column.
 
A clustered index should be used on a column that will be used for sorting. A clustered index is used to sort the rows on disk, so you can only have one per table.
 
2. Non Clustered Index
 
Non-clustered index, on the other hand, does not alter the way the rows are stored in the table. It creates a completely different object within the table that contains the column(s) selected for indexing and a pointer back to the table’s rows containing the data. It is like an index in the last pages of a book, where keywords are sorted and contain the page number to the material of the book for faster reference.
 
Indexes help in Performance Optimization
 
SQL server sorts the indexes efficiently by using a B-tree, which is a tree data structure that allows SQL Server to keep data sorted, to allow searches, sequential access, insertions and deletions in logarithmic amortized time. This methodology minimizes the number of pages accessed to locate the desired index key, therefore resulting an improved performance.
 
Precaution while using Indexes on Table
 
You will not notice performance issues until you have quite a bit of data in your tables. But as your data increases, you must take care of indexes.
 
1. Index should be used on a column that's going to be used (a lot) to search the table. Most important consideration is how much time your queries are taking. If a query doesn't take much time or isn't used very often, it may not be worth adding indexes. The best option is to set your clustered index on the most used unique column, usually the primary key.
 
You should always have a well selected clustered index in your tables, unless a very compelling reason.
 
2. Indexes should not be used for columns or tables that are often updated. Clustered indexes makes SQL Server / Oracle order the rows on disk according to the index order. This implies that if you access data in the order of a clustered index, then the data will be present on disk in the correct order. However if the column(s) that have a clustered index is frequently changed, then the row(s) will move around on disk, causing overhead - which generally is not a good idea.
 
This side effect of indexes is related to the cost of INSERT, UPDATE, MERGE and DELETE statements. Such statements can take longer to execute in the presence of indexes since they alter the data on the table resulting the update of the indexes too. Imagine the situation of an INSERT statement that has to add rows to a table with a clustered index. Table rows may need to be repositioned since clustered index needs to order the data pages themselves thus creating more overhead. So, it is crucial to take into account the overhead of INSERT, UPDATE and DELETE statements before designing your indexing strategy. Although there is an overhead in the above statements, you have to take into account that many times an UPDATE or DELETE statement will have to execute in a subset of data, defined by a WHERE clause, where indexing may outweigh the additional cost of index updates since SQL server has to find the data before updating them.
 
3. Having many indexes is not good either. They cost to maintain. So start out with the obvious ones, and then profile to see which ones you miss and would benefit from. You do not need them from start, they can be added later on.
 
4. Big column datatypes can be used when indexing, but it is better to have small columns indexed than large. Also it is common to create indexes on groups of columns.
 
5. There are also storage considerations. When inserting rows into a table with no clustered index, the rows are stored back to back on the page and updating a row may result in the row being moved to the end of table, leaving empty space and fragmenting the table and indexes.
 
Scan and Seek
 
A table without a clustered-index is called a “heap table”. A heap table has no sorted data thus SQL server has to scan the entire table in order to locate the data in a process called a “scan”.
 
In the case of a clustered index the data are sorted on the key values (columns) of the index. SQL server is now able to locate the data by navigating down from the root node, to the branch and finally to the leaf nodes of the B-tree structure of the index, in a process called a “seek”. The later approach is much faster when you want to filter or sort the data you want to retrieve.
 
Difference between Clustered and Non Clustered Index
 
1. A clustered index actually describes the order in which records are physically stored on the disk, hence the reason you can only have one. A Non-Clustered Index defines a logical order that does not match the physical order on disk.
 
2. There can be only one clustered index on a table while there can be up to 249 non clustered indexes on a table.
 
3. Faster to read than non clustered as data is physically stored in clustered index.

No comments:

Post a Comment

About the Author

I have more than 10 years of experience in IT industry. Linkedin Profile

I am currently messing up with neural networks in deep learning. I am learning Python, TensorFlow and Keras.

Author: I am an author of a book on deep learning.

Quiz: I run an online quiz on machine learning and deep learning.