Common mistakes made with indexes. A guide for the developer and DBA.

All too often a performance problem can be tracked down to a basic error with an index. So with this post I’ve decided to create a reference which will be of interest to DBA’s and SQL developers.

Full table scan (FTS)
Consider:

select surname from employees;

There is a missing where clause here so Oracle will choose a FTS and the explain plan will show TABLE ACCESS FULL EMPLOYEES. If it’s a large table then a FTS is usually bad, but not always. There are cases when a FTS is better, for example when Oracle can read the whole table in one or a few I/O’s, or when the clustering factor is bad (will explain later). Note that if your db_file_multiblock_read_count is set to 32, for example, and your block size is 8K, then one I/O will be 32×8=256K. So if your table fits into 256K then a FTS is fine as the entire table will be read in one I/O.
Also:

select surname from employees
where salary>3000;

will result in a FTS if the “salary” field is not indexed.

Full index scan (FIS)
Consider:

select emp_no
from employees
order by emp_no;

With the missing where clause, this will result in an INDEX FAST FULL SCAN if you check your explain plan. This is also usually bad but not always. If your index can be read in a few I/O’s then a FIS could be a good thing.

Cartesian (merge) joins
Consider:

select *
from invoices, transactions;

In this silly example, this will result in a MERGE JOIN CARTESIAN in the explain plan. The result will match each row of the first table with each row of the second, and this is almost always a bad idea and can result in a performance loss of orders of magnitude. One way of fixing that SQL could have been:

select *
from invoices i, transactions t
where i.inv_no=t.inv_no;

with indexes on inv_no in both those tables, therefore creating a table join.

Use the correct data types in the WHERE clause!
Consider:

select surname from employees
where date_joined>20130101;

In this example Oracle had to do an implicit type conversion between the number passed to the where clause, instead of the correct to_date conversion of a date formatted variable. It will either ignore the index on date_joined or choose a FIS on that index, instead of doing a range scan.

Using functions on the left side of the where clause
Consider:

select surname from employees
where to_date(date_terminated,'YYYYMMDD') > sysdate-7;

This will not make use of the index on that varchar2 “date” field (don’t ask! there are RAD apps that do this!) and will result in a FTS.

The way to fix this is to use the function on the “right” side or “value comparison” side:

select surname from employees
where date_terminated > to_char(sysdate-7, 'YYYYMMDD');

This is because the field or column is now being compared to a value, instead of the field itself.

Skip scans, range scans, unique scans
Oracle 9i introduced skip scans on concatenated indexes (indexes with more than one field) which worked like this.
Consider:

select name, surname from employees
where surname='SMITH';

with name+surname indexed in index1. The Cost Based Optimizer (CBO) may or may not decide to perform a SKIP SCAN on this table, depending on its anticipated cost of each alternative (which you can find in great detail if you do a 10053 trace). So the DBA should test this before giving advice on skip scans, perhaps with using invisible indexes. Another common mistake with the above example is for a developer to request two indexes, index1 on name+surname and index2 on name. The index on name is redundant because that is the first field in index1. B-tree indexes are used by default and that is natural to how they work. Looking at the tree below:
1
Oracle fetches and sorts the columns to be indexed and stores the rowid along with the index value for each row. The upper blocks (branch blocks) of a B-tree index contain index data that points to lower-level index blocks. They contain the minimum key prefix needed to make a branching decision between two keys. The lowest level index blocks (leaf blocks) contain every indexed data value and a corresponding rowid used to locate the actual row. So if a name “JOHN” is searched, oracle will traverse down the branches until it reaches the leaf blocks starting with the letter J. Depending on how much data there was, and how the index was created, the selectivity could have been further refined with “JO” or even “JOH”. If there are multiple JOHNs then oracle would do a range scan. If there was one JOHN then oracle would find it in the leaf block, and retrieve the table data by ROWID. This is called a unique scan and is shown in the explain plan as TABLE ACCESS (by INDEX ROWID). This is the most efficient way of using an index if there is uniqueness in your data (and a primary key or unique index). If you have a non-unique index, then watch out for range scans. This is a matter of knowing your data. If your data has grown appreciably over time then perhaps it’s time to archive, or analyze your tables/indexes, or consider further reducing your result sets by changing your where conditions.

Careful use of not-equal and like wildcards
If you searched for name<>’JOHN’ or name!=’JOHN’ in the above example, oracle has no way of traversing the tree to find any sort of range. So it will do a FTS. The same applies to name like ‘%OHN’… a FTS will occur. Note that name like ‘JOH%’ is fine because this is the same as saying name > ‘JOH’ and a range scan will be done.

Using IS NULL
If you had said where name is null, then all indexes will be suppressed and a FTS will be done. This is because NULLs are not stored in any indexes except for bitmap indexes.

Clustering Factor (CF)… for DBA eyes only!
The CF of an index is a measure of how ordered it is compared to its table. This can be determined from the CLUSTERING_FACTOR column in DBA_INDEXES. If this number is close to the number of blocks in the table then the index is well ordered; if it is close to the number of rows then it is randomly ordered. Let’s say the CF is 1000 and the number of blocks is 100. Then we say that the CF is off by an order of magnitude of 10, and for each row in the index, about 10 table blocks would have to be read (because there can be many blocks per row). So in the real world, you might see a session that is doing a huge number of reads and you notice a ‘db file sequential read’ wait event. Looking at the session’s SQL, you notice a range scan. You know this table’s data and cannot understand why a range scan could be doing that many reads… the savvy DBA would know that the only thing left to look at is the CF. This is a difficult problem to fix. If there is only one index on the table, then the table could be rebuilt in the same order as the index. Even if this is production, you can perform an online table re-org using DBMS_REDEFINITION. However, if there is more than one index, rebuilding the table to suit that index is needless because it will adversely affect the CF of the other indexes. So what to do? In some cases you can check if a FTS is actually faster by doing some real world timings. Make sure your index is in a locally managed tablespace. If the b-tree level (BLEVEL) is high (typically over 4) then it might be a candidate to be rebuilt. This might improve the depth of the leaf blocks immediately, but the index will degrade over time especially if there are frequent changes to the table. Your final option might be to rewrite the query if performance has become completely unacceptable due to bad CF. Since this section was for DBA’s only, I would imagine a whole new world of curious developers would now have become CF experts! DBA’s brace yourselves!

Parameters affecting indexes
Probably the most important parameter is optimizer_index_cost_adj. The default value is 100, and it must be set to a lower value for OLTP systems. Setting it lower makes oracle favour index scans over table scans. In other words, if the optimizer_index_cost_adj=50 then the CBO views indexes as 50% less expensive. Therefore the lower the value is, the less expensive the index… but only in the way that it is viewed by the CBO. If you have the luxury of testing this on production, a good value to start with is 60. You can reduce it by 10 at a time then measure the overall performance. Some OLTP systems have it set to between 10 and 30 and claim to have drastically improved performance; ours is set to 60 and we seem to have a good balance between index and table scans.

What else affects the CBO and indexes? Select from V$SYS_OPTIMIZER_ENV and you will find optimizer_index_caching. Setting optimizer_index_caching higher favours nested loop joins over hash or sort merge joins. Like optimizer_index_cost_adj, this will only influence the behaviour and decision making of the CBO, so be sure to test thoroughly before attempting to make any change. I left this at the default.

Conclusion
There is a lot more to be said about indexes but to keep this post succint, I’ve stayed on subject and highlighted common mistakes. Indexes are probably the most important objects to be considered by both DBA’s and developers when it comes to optimizing SQL and performance tuning. So if they are used correctly and kept in good health, you will be well on your way to an efficiently running database.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s