Weblogic JDBC Connection Pool misconfiguration and how it can cause serious performance problems

Part 1 of 2

Connection Pool (mis)configuration has long been a pet peeve of mine, and we have only recently changed our standards after a series of serious performance problems on our main production databases. In my opinion this is so important that I will post two blogs on this. This post will be about configuration parameters to minimize connections and disconnections. The second post will be about how high concurrency can lead to such severe degradation that the only way to recover is to shut down the database.

This post is going to focus on why a high frequency of connections and disconnections is bad for your Oracle database. There was a 9i Oracle documentation article called Top Ten Mistakes Found in Oracle Systems and the first and foremost point was “Bad Connection Management” where they stated:
“The application connects and disconnects for each database interaction. This problem is common with stateless middleware in application servers. It has over two orders of magnitude impact on performance, and it is totally unscalable.”

So over ten years ago, Oracle had the vision to alert its DBA’s about this, and people chose to ignore this prophetic advice at their own peril. The middleware in this case is Weblogic, and setting up JDBC connection pools in a dynamic manner leads to a high frequency of connections and disconnections. Two orders of magnitude means that performance could be affected by over a hundred-fold, and this is obviously unscalable. What this means is that if you expect your applications to scale with data growth over time, without performance degradation, they will not. Usually it does not matter how much hardware you throw at a legacy scalability problem – it will never be powerful enough because you will just reach your bottleneck more quickly. It’s not only the data that changes, it’s the number of users, the complexity and interaction of systems, and competition for resources on a server with finite resources.

Why should repetitive connections and connections be such a bad thing for the database? This is a question that has puzzled architects and developers for years, and some still don’t believe that it is bad. Let’s look at the amount of work the Oracle RDBMS and the server have to do for each connection. A new server process has to be created, which takes CPU and at least 20M of memory for 11g. The PGA has to be grown to possibly accommodate for future work areas. Oracle then has to check its dictionary and dependancy tree semantically and syntactically for user privileges, which has a large CPU cost. Just this check can comprise of scores of dictionary reads with its associated parses, executes and fetches. After all of this work, then there is the matter of executing logon triggers, if any, and then all the overhead of finding space for that new session’s SQL in the SGA, and all future DML of the session.

If you have the space for keeping a slightly higher number of session s connected, it is really far better to do so.  It is good practice to acquire a connection, do the maximum work needed for that connection, then return that connection to the pool as soon as possible. Having a connection pool create a new session, do some work, then disconnect (shrink the pool) is the worst practice.

So, for many years I had this argument with architects and developers, and it was eventually agreed to keep connection pools static. This link states that an easy way to boost performance is to keep INIT and MAX the same. In this way they could never grow or shrink right? Wrong. If you were in a situation where the application server thought a connection was down, but Oracle though it was up, the setting of MIN and MAX to the same value is not going to prevent a flood of connection attempts when oracle becomes available. Weblogic tests connections periodically and if the result does not return timeously, Weblogic will create a new connection regardless and remove the non-functioning one from the pool. So we decided it was better to set the Shrink Frequency (Harvest Interval) to a high value like 30 minutes, so that the frequency of disconnections will not be a problem.

So these are the values we proposed in our new standards, after many discussions and meetings with architects:

  • Select the MAX (Maximum Capacity)
  • Set the MIN (Minimum Capacity) to be at least half (50%) of MAX so that the connection pool cannot shrink below this value
  • Set the INIT (Initial Capacity) to be at least quarter (25%) of MAX. This is the initial number of connections created when the application server is started
  • Set the Shrink Frequency (Harvest Interval) to 1800 (30 minutes). This setting is critical in managing the frequency of disconnections.
  • Set Connection Creation Retry Frequency to 120 (2 minutes)
  • Set Test Frequency to 120 (2 minutes). This is to poll the database to test the connection. This defaults to “SELECT 1 FROM DUAL” which bypasses the buffer cache so it’s not bad.
  • Set Seconds to trust an Idle Pool Connection to 10.
  • Cache for prepared statements must be set to 20 or higher. Setting this higher should be discussed with your DBA. The total number of cached cursors would be the cache setting * number of connections in pool * number of servers in cluster. This should be set very carefully. That calulation equates to the number of open cursors allowed per session, so if it is set too low then cursors are repeatedly closed and opened. This leads to excessive hard parsing of SQL statements which can also lead to performance degradation. In fact “Bad Use of Cursors” is the number 2 point on the “Top Ten Mistakes” list, and it says that this has an order of magnitude impact in performance, and is totally unscalable.

I am not saying that this is necessarily the most optimum way of doing things. In your situtation, setting your pools to static might be the easiest and best implementation method. What I am saying is be very careful of repetitive Oracle session connections and disconnections and make sure you that use your existing CP sessions as efficiently as possible.

In Part 2 I will discuss the more complicated topic of high concurrency where I will dive into hairy internals like latches and mutexes. Happy computing and thanks for reading!

Huge performance gains from using arrays and bulk collects.

Consider a simple select which I ran in SQL*Plus connected as SYSTEM:

select * from dba_users;

I ran a trace on the session and did a tkprof on the result.

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.29       0.76        201        205          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     1353      0.99       1.79       1303       4536          0       20276

What do we notice about these stats? Parse count is 1, which is expected, and it will still be 1 if I run that select again. Check my previous post where I’ve talked about bind variables and why it’s bad for performance to repeat parsing, especially if it is hard parsing. Also, execute count is 1, which is expected, and you want to keep that to 1 as far as possible per select. I’ve seen applications do exotic selects with rowid’s in cursors and the execute column has the same number as the fetch column, which is the most inefficient way to get your data back.

If you had done ‘set array 2‘ in SQLPlus, then the Fetch stats would have looked like this:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Fetch    10134      1.35       1.88        676      12896          0       20266

Funny enough, when I then used ‘set array 1‘, Oracle ignored the setting and used a value of 2. Somewhere in the recesses of the Oracle core there must be an AI that is intelligent enough to question and compensate for the obvious stupidity of getting back one row per fetch!

By running a simple select at the SQL prompt in SQLPlus, Oracle created an implicit cursor which you can see if you look at the raw trace file before it was tkprof’d.

PARSING IN CURSOR #18446744071486994888 len=23 dep=0 uid=5 oct=3 lid=5 tim=11879507487109 hv=665675061 ad='47520d368' sqlid='ckyh1gcmuut9p'
select * from dba_users
END OF STMT

Now, why were there only 1353 fetches done for 20276 rows of data returned? Implicit cursors will use the default array size of 15 which is the SQLPlus (client tool) setting for array[size].

Let’s write an explicit cursor via an anonymous block to see what Oracle does.

SET SERVEROUTPUT ON
declare 
  cursor c1 is
    select * from v$session;
begin   
  for c in c1 loop 
    null;
  end loop;  
end;
/

Checking the tkprof result:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.01          0          0          0           0
Execute      1      0.00       0.00          0          0          0           1
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------

That’s funny… why no result in the Fetch row? Going to Oracle’s documentation, it states that ‘Fetches are only performed for SELECT statements‘. Further more, as shown in a previous post, the anonymous block is declared and run in the session being traced, so it shows a value for Parse and Execute. The “select * from v$session” is run on the server as the cursor c1 that we had defined.

So going back to our trusty V$SQL which shows all statements executed since startup:

select 
  SQL_TEXT, PARSE_CALLS, EXECUTIONS, FETCHES
from V$SQL
where upper(sql_text) like '%V$SESSION%';

SQL_TEXT                  PARSE_CALLS   EXECUTIONS   FETCHES
SELECT * FROM V$SESSION   1             1            7

That’s interesting… Fetches=7. The select returned 696 rows so this means that the array size has been set to 100. Either that, or Oracle has performed an implicit bulk collect as stated by PL/SQL guru Steven Feuerstein here :
Oracle Database automatically optimizes cursor FOR loops to perform similarly to BULK COLLECT queries… and fetch 100 rows at a time.

From the oracle documentation of SET ARRAY, it states that “Values over approximately 100 provide little added performance.” So how can we use much larger “arrays” and get the intuitive performance benefit of collecting a large number of rows into a memory structure? This is where we have to use BULK COLLECT and FORALL.

Consider this code which will show the usage of BULK COLLECT. I am not going to exhaustively list all the BULK usage options, such as SQL%BULK_EXCEPTIONS, as that is the purpose of the documentation, and I don’t want to sidetrack from the point I am making about huge performance gains. However it is worth mentioning the LIMIT clause which limits the number of rows returned by the BULK COLLECT.

declare 
  TYPE ARRAY IS TABLE OF dba_objects%ROWTYPE;
  obj_data ARRAY;
  CURSOR c1 IS 
    SELECT * FROM dba_objects;
begin
  OPEN c1;
    LOOP
      FETCH c1 BULK COLLECT INTO obj_data;     
      EXIT WHEN c1%NOTFOUND;
    END LOOP;
  CLOSE c1;
end;
/

The trace shows the code but we have to check the stats from V$SQL:

select 
  SQL_TEXT, PARSE_CALLS, EXECUTIONS, FETCHES, BUFFER_GETS
from V$SQL
where upper(sql_text) like '%DBA_OBJECTS%';

SQL_TEXT                    PARSE_CALLS   EXECUTIONS   FETCHES   BUFFER_GETS 
SELECT * FROM DBA_OBJECTS   1             1            1         3100

Lo and behold, we get the awesome result of Fetches=1 for a select of over 100,000 rows! Why is this so significant instead of multiple fetches? If the application is client/server, then every fetch is an extra network round trip between client (thin or fat) and the database server. If the SQL is on the server then every fetch is an extra context switch. In both cases, this can result in huge performance benefits on busy production servers that execute thousands or even millions of queries per hour. Have you ever wondered about innocuous “idle” SQL*Net waits events? The majority of these could be as a result of not using bulk collects and arrays efficiently! Every one of them means an extra network round trip and this means wastage of CPU and I/O resources that could have been used for the next process in the run queue.

Now could the bulk mechanism be used for DML (insert, update, delete)? Sure, this is where the FORALL statement is used. In the example above I could have inserted this code after the FETCH:

FORALL i IN 1..obj_data.COUNT
  INSERT INTO some_table VALUES obj_data(i);

This would load the entire array with one write and therefore one roundtrip or context switch, giving the same benefit the BULK COLLECT had done.

Note that BULK COLLECT can be used with your dynamic SQL in the EXECUTE IMMEDIATE statement, but as stated in a previous post it is usually better to limit the use of dynamic sql in busy OLTP production environments.

Array usage in some other development tools

We’ve already seen the use of array[size] in SQL*Plus. This can also be used in Forms.

Pro*C can use pre-fetching transparently with:

$ proc .... PREFETCH=200

This can also be set at compilation time with ANSI C and C++ and overrides the usage of CURSIZE within the C code.

Using the property weblogic.oci.cacheRows, Weblogic JDBC implements array fetching using its Row Caching feature.

Magic” is the RAD tool we’ve been using and arrays can be set at table level. Every development language worth its salt should be able to use arrays and if their oracle drivers are properly written then the relevant OCI calls can be made for arrays and bulk collects.

Conclusion
If you want to experience a massive performance gain in both your DSS and OLTP production environments, make complete and gratuitous use of arrays and bulk collect mechanisms in your code.

Common mistakes made with indexes. A guide for the developer and DBA.

All too often a performance problem can be tracked down to a basic error with an index. So with this post I’ve decided to create a reference which will be of interest to DBA’s and SQL developers.

Full table scan (FTS)
Consider:

select surname from employees;

There is a missing where clause here so Oracle will choose a FTS and the explain plan will show TABLE ACCESS FULL EMPLOYEES. If it’s a large table then a FTS is usually bad, but not always. There are cases when a FTS is better, for example when Oracle can read the whole table in one or a few I/O’s, or when the clustering factor is bad (will explain later). Note that if your db_file_multiblock_read_count is set to 32, for example, and your block size is 8K, then one I/O will be 32×8=256K. So if your table fits into 256K then a FTS is fine as the entire table will be read in one I/O.
Also:

select surname from employees
where salary>3000;

will result in a FTS if the “salary” field is not indexed.

Full index scan (FIS)
Consider:

select emp_no
from employees
order by emp_no;

With the missing where clause, this will result in an INDEX FAST FULL SCAN if you check your explain plan. This is also usually bad but not always. If your index can be read in a few I/O’s then a FIS could be a good thing.

Cartesian (merge) joins
Consider:

select *
from invoices, transactions;

In this silly example, this will result in a MERGE JOIN CARTESIAN in the explain plan. The result will match each row of the first table with each row of the second, and this is almost always a bad idea and can result in a performance loss of orders of magnitude. One way of fixing that SQL could have been:

select *
from invoices i, transactions t
where i.inv_no=t.inv_no;

with indexes on inv_no in both those tables, therefore creating a table join.

Use the correct data types in the WHERE clause!
Consider:

select surname from employees
where date_joined>20130101;

In this example Oracle had to do an implicit type conversion between the number passed to the where clause, instead of the correct to_date conversion of a date formatted variable. It will either ignore the index on date_joined or choose a FIS on that index, instead of doing a range scan.

Using functions on the left side of the where clause
Consider:

select surname from employees
where to_date(date_terminated,'YYYYMMDD') > sysdate-7;

This will not make use of the index on that varchar2 “date” field (don’t ask! there are RAD apps that do this!) and will result in a FTS.

The way to fix this is to use the function on the “right” side or “value comparison” side:

select surname from employees
where date_terminated > to_char(sysdate-7, 'YYYYMMDD');

This is because the field or column is now being compared to a value, instead of the field itself.

Skip scans, range scans, unique scans
Oracle 9i introduced skip scans on concatenated indexes (indexes with more than one field) which worked like this.
Consider:

select name, surname from employees
where surname='SMITH';

with name+surname indexed in index1. The Cost Based Optimizer (CBO) may or may not decide to perform a SKIP SCAN on this table, depending on its anticipated cost of each alternative (which you can find in great detail if you do a 10053 trace). So the DBA should test this before giving advice on skip scans, perhaps with using invisible indexes. Another common mistake with the above example is for a developer to request two indexes, index1 on name+surname and index2 on name. The index on name is redundant because that is the first field in index1. B-tree indexes are used by default and that is natural to how they work. Looking at the tree below:
1
Oracle fetches and sorts the columns to be indexed and stores the rowid along with the index value for each row. The upper blocks (branch blocks) of a B-tree index contain index data that points to lower-level index blocks. They contain the minimum key prefix needed to make a branching decision between two keys. The lowest level index blocks (leaf blocks) contain every indexed data value and a corresponding rowid used to locate the actual row. So if a name “JOHN” is searched, oracle will traverse down the branches until it reaches the leaf blocks starting with the letter J. Depending on how much data there was, and how the index was created, the selectivity could have been further refined with “JO” or even “JOH”. If there are multiple JOHNs then oracle would do a range scan. If there was one JOHN then oracle would find it in the leaf block, and retrieve the table data by ROWID. This is called a unique scan and is shown in the explain plan as TABLE ACCESS (by INDEX ROWID). This is the most efficient way of using an index if there is uniqueness in your data (and a primary key or unique index). If you have a non-unique index, then watch out for range scans. This is a matter of knowing your data. If your data has grown appreciably over time then perhaps it’s time to archive, or analyze your tables/indexes, or consider further reducing your result sets by changing your where conditions.

Careful use of not-equal and like wildcards
If you searched for name<>’JOHN’ or name!=’JOHN’ in the above example, oracle has no way of traversing the tree to find any sort of range. So it will do a FTS. The same applies to name like ‘%OHN’… a FTS will occur. Note that name like ‘JOH%’ is fine because this is the same as saying name > ‘JOH’ and a range scan will be done.

Using IS NULL
If you had said where name is null, then all indexes will be suppressed and a FTS will be done. This is because NULLs are not stored in any indexes except for bitmap indexes.

Clustering Factor (CF)… for DBA eyes only!
The CF of an index is a measure of how ordered it is compared to its table. This can be determined from the CLUSTERING_FACTOR column in DBA_INDEXES. If this number is close to the number of blocks in the table then the index is well ordered; if it is close to the number of rows then it is randomly ordered. Let’s say the CF is 1000 and the number of blocks is 100. Then we say that the CF is off by an order of magnitude of 10, and for each row in the index, about 10 table blocks would have to be read (because there can be many blocks per row). So in the real world, you might see a session that is doing a huge number of reads and you notice a ‘db file sequential read’ wait event. Looking at the session’s SQL, you notice a range scan. You know this table’s data and cannot understand why a range scan could be doing that many reads… the savvy DBA would know that the only thing left to look at is the CF. This is a difficult problem to fix. If there is only one index on the table, then the table could be rebuilt in the same order as the index. Even if this is production, you can perform an online table re-org using DBMS_REDEFINITION. However, if there is more than one index, rebuilding the table to suit that index is needless because it will adversely affect the CF of the other indexes. So what to do? In some cases you can check if a FTS is actually faster by doing some real world timings. Make sure your index is in a locally managed tablespace. If the b-tree level (BLEVEL) is high (typically over 4) then it might be a candidate to be rebuilt. This might improve the depth of the leaf blocks immediately, but the index will degrade over time especially if there are frequent changes to the table. Your final option might be to rewrite the query if performance has become completely unacceptable due to bad CF. Since this section was for DBA’s only, I would imagine a whole new world of curious developers would now have become CF experts! DBA’s brace yourselves!

Parameters affecting indexes
Probably the most important parameter is optimizer_index_cost_adj. The default value is 100, and it must be set to a lower value for OLTP systems. Setting it lower makes oracle favour index scans over table scans. In other words, if the optimizer_index_cost_adj=50 then the CBO views indexes as 50% less expensive. Therefore the lower the value is, the less expensive the index… but only in the way that it is viewed by the CBO. If you have the luxury of testing this on production, a good value to start with is 60. You can reduce it by 10 at a time then measure the overall performance. Some OLTP systems have it set to between 10 and 30 and claim to have drastically improved performance; ours is set to 60 and we seem to have a good balance between index and table scans.

What else affects the CBO and indexes? Select from V$SYS_OPTIMIZER_ENV and you will find optimizer_index_caching. Setting optimizer_index_caching higher favours nested loop joins over hash or sort merge joins. Like optimizer_index_cost_adj, this will only influence the behaviour and decision making of the CBO, so be sure to test thoroughly before attempting to make any change. I left this at the default.

Conclusion
There is a lot more to be said about indexes but to keep this post succint, I’ve stayed on subject and highlighted common mistakes. Indexes are probably the most important objects to be considered by both DBA’s and developers when it comes to optimizing SQL and performance tuning. So if they are used correctly and kept in good health, you will be well on your way to an efficiently running database.

Optimizing Performance with Cursor Sharing and Shared Pool behaviour in Oracle 11g R2 (11.2.0.3)

Hi

With this post I’ve decided to create an introductory guide to how Oracle 11g R2 handles literals and binds in the shared pool, how cursor sharing works, and how to write your cursors to maximize performance.

The first parameter to look at is cursor_sharing. Trying to set it to SIMILAR is now deprecated and you will get a message in your alert log saying “WARNING: The parameter cursor_sharing was found to be set to the value SIMILAR. This setting will be ignored and cursor sharing will operate as though the value was set to FORCE instead.”

So let us set it to EXACT and see the behaviour. This means that statements with identical text share the same cursor, and in the documentation Oracle recommends that this is used in DSS (decision support systems / reporting / warehouses).

I created a table called TEST (xnum number) with values from 1 to 10, with an index on xnum.

begin
  for x in 1..10 loop
    insert into test values (x);
  end loop;
  commit;
end;
/

Let’s analyze:

exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'TEST', tabname=>'TEST', cascade=> TRUE);

Let’s try this statement to see how the optimizer responds:

select * from test where xnum=1;

Now selecting from V$SQL which conveniently holds statistics since instance startup:

select
  SQL_TEXT, SQL_ID, OPEN_VERSIONS, FETCHES, EXECUTIONS, PARSE_CALLS
from
  v$sql
where
  upper(sql_text) like '%TEST WHERE%';

the statement appears exactly like it is, with the literal value of 1, as expected.

I ran a few more literal selects and here are some selected rows:

SQL_TEXT                         SQL_ID        OPEN_VERSIONS FETCHES EXECUTIONS PARSE_CALLS
select * from test where xnum=2  a4hrnbrz05uwy             0       2          1           1
select * from test where xnum=1  cgsb5m608anzk             0       4          2           2
select * from test where  xnum=2 a91a8av266tb9             1       2          1           1

Notice from this that adding a single space in that last statement created a new entry in the shared pool with a new SQL_ID. The point I am really trying to make here is that if you write your applications to generate literal SQL like I’ve done, and you use cursor_sharing=exact, then this is the least optimal way to write sql. Imagine if these statements run thousands or even millions of times in an hour in a large environment… in addition to Oracle having to “hard parse” each new statement as it comes in, it will also have to find free chunks in which to place these statements in the shared pool, while ageing older statements out of the shared pool. So the CPU works extra hard in parsing and managing the shared pool. In time this creates shared pool fragmentation which can lead to many horrors such as library cache pin waits in a busy environment, which can lead to shutdowns and no way of fixing it unless you rewrite the application or change the cursor_sharing to force. However, we will get to that later.

Let’s do the same via an anonymous block and run it twice:

SET SERVEROUTPUT ON
declare
  x number;
  v_xnum number;
begin
  x:=3;
  select xnum into v_xnum from test where xnum=x;
  DBMS_OUTPUT.PUT_LINE('Answer: ' || TO_CHAR (v_xnum));
end;
/

SQL_TEXT        SQL_ID        OPEN_VERSIONS FETCHES EXECUTIONS PARSE_CALLS
declare x number; v_xnum number; begin x:=3; select xnum into v_xnum from test where xnum=x; DBMS_OUTPUT.PUT_LINE('Answer: ' || TO_CHAR (v_xnum)); end; 
                db6b351m50nkq             1       0          2           2
SELECT XNUM FROM TEST WHERE XNUM=:B1 
                7y76r2vywmrv5             1       2          2           1

This is a little better than literal SQL but not much. Oracle gives the block its own SQL_ID which uses a great deal more shareable memory than simple statements, but at least there is only one copy in the shared pool. However the problem arises when these blocks become much larger in an OLTP environment with complex systems with thousands of anonymous blocks especially if they make calls to packages. All underlying code gets loaded into free shareable memory and Oracle has to work to find space (chunks) to load them in. If they are not pinned (using DBMS_SHARED_POOL.KEEP) then you will find the loaded version (from v$sql) count increasing which results in fragmentation in a busy OLTP environment. The second row with the :B1 will be explained with the cursor example below.

Note that the same applies to dynamic SQL (typically generated in packages then executed with “execute immediate”). Badly written dynamic SQL can also cause all kinds of fragmentation problems in your shared pool. “Execute immediate” with the “using” clause does make use of bind variables and is acceptable in some cases, but in my opinion is a lazy developer’s alternative to making proper use of cursors. Cursors are natural to Oracle. Oracle loves well written cursors, and your users will love you and Oracle in return. When performance is bad, they always come to you, the omniscient DBA right? And when you fix it, it was your fault in the first place. The manager or helpdesk person or end user never knew the pain you went through to troubleshoot and find that BAD statement or application server with the memory leak or that storage layer bottleneck…

Aah. Breathe.
Let’s write simple PL/SQL using a cursor in a procedure to see what happens.

create or replace
PROCEDURE test_num (p_num IN NUMBER, p_num_out IN OUT NUMBER) IS
  CURSOR c_num IS
    select xnum from test where xnum=p_num;
BEGIN
  for c in c_num loop
    p_num_out := c.xnum;
  end loop;
END;
/

SET SERVEROUTPUT ON
DECLARE v_pnum number;
BEGIN
  test_num(5,v_pnum);
  DBMS_OUTPUT.PUT_LINE('Answer: '||TO_CHAR(v_pnum));
END;
/

I ran the procedure 5 times with values of 5 and 6.

SQL_TEXT                             SQL_ID        OPEN_VERSIONS FETCHES EXECUTIONS PARSE_CALLS
SELECT XNUM FROM TEST WHERE XNUM=:B1 7y76r2vywmrv5             1       5          5           2

Notice firstly that Oracle transforms the statement (which is the cursor in the procedure) into upper case. Then it assigns a bind variable “:B1” to the statement. When I ran the procedure with value 5, parse calls was 1. When I changed the value to 6, parse calls became 2. This is what is known as a “soft parse”. Oracle peeked, detected a different value and checked statistics, instead of doing a hard parse which has the overhead of syntax and permissions checking. Soft parses are better for performance than hard parses. The advantage of using bind variables is that there is only one version of the statement in the shared pool, which leads to huge gains in performance because of reduced parsing, fragmentation and ageing.

HOWEVER, and this is a very important caveat: using cursor_sharing=exact in busy environments will ONLY work if you have perfectly written applications using cursors as in the above example. Now, which of us can say that we support perfect applications? Invariably, applications throw in anonymous blocks and dynamic SQL. They throw the odd literal statement in. The untrained business analyst connects and runs something naughty. The DBA adds to the fun with exotic scripts. Oracle itself adds dictionary statements that are far from optimal. Monitoring tools can throw in some horrors like segment checking. Woe betide you if the CIO connects. So you get the picture… you will never work in a perfect environment.

The corollary to this is that cursor_sharing=exact is fine for DSS environments, as Oracle states in their 11g R2 documentation:
“If you set CURSOR_SHARING, then Oracle recommends the FORCE setting unless you are in a DSS environment”.
The reason for this is that you rarely get shared pool fragmentation in a reporting or warehouse environment. Users there typically run long-running reports and statements which usually RELY on an expected explain plan that worked the same way in the dev and test environment (there are exceptions to this but I don’t want to go too far off topic. In IT, for everything that is said, there is usually an exception. This is not an exact science). Note that the default value is EXACT so it’s up to you, the conscientious DBA, to use the correct setting for OLTP production.

Just a final note about cursors before we move on. We’ve shown how to reduce parsing which gives huge CPU benefits. Now every statement has to go through a parse/execute cycle, and fetch if it returns a result set. The ideal statement, in a perfect world, is 1 parse, execute many times, and fetch many rows per execute. If you find that the number of executes is too close to the number of parses, then it usually means that you’re opening and closing your cursor too often. If you are returning one row per execute, then you need to investigate the use of Oracle’s array mechanism via BULK COLLECT and FORALL. This has huge I/O benefit which I will discuss in another post.

Before I go on to talk about cursor_sharing=force, it’s worthwhile to talk a little about how oracle treated binds in the past compared to how it does now. This will be the subject of another long post when I have the energy to do so. I’ve been using Oracle since Oracle 7 and even in those days, statements were sent to the shared pool, parsed, then executed based an explain plan derived from complicated rule-based or cost-based optimizer statistics and histograms based on whether the objects were analyzed or not. As rule-based moved towards obsoletion, cost-based became the norm and in Oracle 9i, the CBO (cost based optimizer) generated explain plans based on its histogram knowledge of the data distribution of a table. All was well in DBA world until 10g came along and the CBO decided to default to bind peeking which was the bane of many a DBA and some even chose to turn it off with _optim_peek_user_binds. Bind peeking worked like this: if you had a table with skewed data (for example a table of car sales) and if someone queried a common car (e.g. Beetle) from that table, Oracle would peek at this bind at the first hard parse, then generate an explain plan with full table scan based on this peek. Worse, it would lock the explain plan. So, if someone wanted to query sales statistics of an uncommon car (e.g. Rolls Royce) then the same full table scan would be done in future. This led people to either removing statistics for certain tables, turning off bind peeking, or being really clever and writing queries with literals in them.

So then came 11g with adaptive cursor sharing, and this is Oracle’s summary from metalink 1169017.1:
“Our recommendation is to set cursor_sharing = FORCE. This setting maximizes cursor sharing while leveraging the Adaptive Cursor Sharing framework to generate multiple execution plans based on different literal value ranges if necessary.”
What this means is that instead of peeking at the first value at hard parse time, Oracle now observes ranges of values passed to the bind variables. Based on this “bind sensitivity”, Oracle decides whether to recalculate the plan and add it to the profile of the SQL ID in question. So a statement can now have multiple explain plans, which in itself can be problematic which is the topic of another discussion where I will discuss Baselines and SQL Plan Management.

OK so let’s change cursor_sharing:
alter system set cursor_sharing=force scope=both;
and bounce the database to clear the shared pool.
Let’s do this again to see what happens:

select * from test where xnum=1;

select
  SQL_TEXT, SQL_ID, OPEN_VERSIONS, FETCHES, EXECUTIONS, PARSE_CALLS, IS_BIND_SENSITIVE, IS_BIND_AWARE, IS_SHAREABLE, CHILD_NUMBER
from v$sql 
where 
  upper(sql_text) like '%TEST WHERE XNUM%';

SQL_TEXT                                 SQL_ID        OPEN_VERSIONS FETCHES EXECUTIONS PARSE_CALLS IS_BIND_SENSITIVE IS_BIND_AWARE CHILD_NUMBER
select * from test where xnum=:"SYS_B_0" 50vnp9xux8us2             1       2          1           1                 N             N            0

So instead of a literal in the shared pool, oracle now FORCEs the bind variable. This is system generated and will always have the SYS_ prefix so that you can find them in your shared pool.

Now let’s run that select again with a value of 8 to see the result:

SQL_TEXT                                 SQL_ID        OPEN_VERSIONS FETCHES EXECUTIONS PARSE_CALLS IS_BIND_SENSITIVE IS_BIND_AWARE CHILD_NUMBER
select * from test where xnum=:"SYS_B_0" 50vnp9xux8us2             1       4          2           2                 N             N            0

As expected, it was executed and parsed again. Please note that this is a hard parse. This is the problem with system generated binds which are converted from statements with literals or from dynamic sql. In the long run, one parse per execute is very heavy on CPU especially if the statements run millions of times in a busy production environment.

Let’s see what it does with our anonymous block with xnum=3:

SQL_TEXT SQL_ID        OPEN_VERSIONS FETCHES EXECUTIONS PARSE_CALLS IS_BIND_SENSITIVE IS_BIND_AWARE IS_SHAREABLE CHILD_NUMBER
declare x number; v_xnum number; begin x:=3; select xnum into v_xnum from test where xnum=x; DBMS_OUTPUT.PUT_LINE('Answer: ' || TO_CHAR (v_xnum)); end;                       
         db6b351m50nkq             1       0          1           1                 N             N            Y            0
SELECT XNUM FROM TEST WHERE XNUM=:B1  
         7y76r2vywmrv5             1       1          1           1                 Y             N            Y            0

It created the block with its own ID and then converted the select to use a bind variable. This is not a system bind because the select was done with a variable in a code block (no literal value was used in the code block). Note that IS_BIND_SENSITIVE is marked Y (yes) which means that the select can potentially get a new plan in future. If IS_BIND_AWARE=Y then it means that the plan has changed. At this point, the table has uniformly distributed data. It has 10 rows, with numbers from 1 to 10 in an index, so Oracle has no need to change the plan based on the values it has seen.

Running the select TWICE with x=5 results in TWO parses of the anonymous block but just one parse of the select, so Oracle was happy to continue with the original explain plan.

SQL_TEXT SQL_ID        OPEN_VERSIONS FETCHES EXECUTIONS PARSE_CALLS IS_BIND_SENSITIVE IS_BIND_AWARE IS_SHAREABLE CHILD_NUMBER
declare x number; v_xnum number; begin x:=3; select xnum into v_xnum from test where xnum=x; DBMS_OUTPUT.PUT_LINE('Answer: ' || TO_CHAR (v_xnum)); end; 
         db6b351m50nkq             0       0          1           1                 N             N            Y            0
declare x number; v_xnum number; begin x:=5; select xnum into v_xnum from test where xnum=x; DBMS_OUTPUT.PUT_LINE('Answer: ' || TO_CHAR (v_xnum)); end; 
         c5khy7t3xkpnk             1       0          2           2                 N             N            Y            0
SELECT XNUM FROM TEST WHERE XNUM=:B1 
         7y76r2vywmrv5             1       3          3           1                 Y             N            Y            0

Now let’s skew the data a bit…

begin
  for x in 1..10000
  loop
    insert into test values (11);
  end loop;
  commit;
end;
/

At this stage we will not analyze, just to see what Oracle does with peeking and explain plans.

Run the same select in a cursor, fetching those 10000 rows we inserted.

declare
  type t_nums is varray(10000) of number;
  a_nums t_nums;
  x number;
cursor c1 is
  select xnum from test where xnum=x;
begin
  x:=11;
  open c1;
    fetch c1 bulk collect into a_nums;
  close c1;
end;
/

SQL_TEXT SQL_ID        OPEN_VERSIONS FETCHES EXECUTIONS PARSE_CALLS IS_BIND_SENSITIVE IS_BIND_AWARE IS_SHAREABLE CHILD_NUMBER
declare type t_nums is varray(10000) of number; a_nums t_nums; x number; cursor c1 is select xnum from test where xnum=x; begin x:=11; open c1; fetch c1 bulk collect into a_nums; close c1; end; 
         fcrk1v7mrcnkr             1       0          1           1                 N             N            Y            0
SELECT XNUM FROM TEST WHERE XNUM=:B1 
         7y76r2vywmrv5             1       3          3           1                 Y             N            Y            0
SELECT XNUM FROM TEST WHERE XNUM=:B1 
         7y76r2vywmrv5             1       1          1           1                 Y             N            Y            1

Interesting… it created a new version of the statement with the same SQL_ID.  Notice that CHILD_NUMBER is now 1… which means it created a child of the parent cursor. There is a lot to be said about multiple versions and child cursors, but that is the subject of another post. This is however VERY relevant to shared pool management and behaviour.

More info in this view:

select
  child_number, bucket_id, count from v$sql_cs_histogram
where 
  sql_id = '7y76r2vywmrv5';

CHILD_NUMBER BUCKET_ID COUNT
           1         0     0
           1         1     1
           1         2     0
           0         0     3
           0         1     0
           0         2     0

and this:

select 
  child_number, bind_set_hash_value, peeked, executions, rows_processed, buffer_gets, cpu_time
from 
  v$sql_cs_statistics
where 
  sql_id = '7y76r2vywmrv5';

CHILD_NUMBER BIND_SET_HASH_VALUE PEEKED EXECUTIONS ROWS_PROCESSED BUFFER_GETS CPU_TIME
           1          2671099528      Y          1 10000                   21        0
           0          2706503459      Y          1     1                    1        0

The PEEKED column shows that Oracle did peek at the bind values passed into the SQL.

Now how to get the adaptive cursor sharing framework to change plan? I ran the xnum=11 select three times and it still does a range scan of the index:

INDEX (RANGE SCAN) of TEST.TEST_PK (INDEX)

Let’s analyze to see if it makes any difference. No, it continues with the range scan. Grr!
Let’s drop the index! Then run the select where xnum=11, three times.

SQL_TEXT SQL_ID        OPEN_VERSIONS FETCHES EXECUTIONS PARSE_CALLS IS_BIND_SENSITIVE IS_BIND_AWARE IS_SHAREABLE CHILD_NUMBER PLAN_HASH_VALUE
declare type t_nums is varray(10000) of number; a_nums t_nums; x number; cursor c1 is select xnum from test where xnum=x; begin x:=11; open c1; fetch c1 bulk collect into a_nums; close c1; end; 
         fcrk1v7mrcnkr             1       0          3           3                 N             N            Y            0               0
SELECT XNUM FROM TEST WHERE XNUM=:B1 
         7y76r2vywmrv5             1       3          3           1                 Y             N            Y            0      1873712855
SELECT XNUM FROM TEST WHERE XNUM=:B1 
         7y76r2vywmrv5             1       3          3           1                 Y             N            Y            2       217508114

That’s expected… it picks up the dropped index, change in statistics, and creates child 2. Now that the index is gone, child 1 has magically aged out of the shared pool. Child 2 now has a new PLAN_HASH_VALUE=217508114 which shows TABLE ACCESS FULL. Child 0 which is the parent cursor still remains in the shared pool with PLAN_HASH_VALUE=1873712855 which is the INDEX RANGE SCAN. These can also be seen in OEM under the SQL Details screen where Plan Hash Value can be selected. So has adaptive cursor sharing worked? It’s hard to say, as this behaviour would have happened in 10g as well.

I would still like to create an example which changes the plan based on statistics and data, and will update this post.

Mysteries with current testing:

  • IS_BIND_AWARE is supposed to change to Y with the new plan (is this the definite way to check if adaptive cursor sharing has worked?).
  • v$sql_cs_selectivity shows zero rows for that SQL ID.

Conclusion
So, what have we learned? Literals are generally bad with cursor_sharing=similar unless you have full control of your application and explain plans are better for each statement. Literals are bad for CPU with cursor_sharing=force. It is better to use anonymous blocks or dynamic SQL, but not by much. Shared memory usage is a lot higher and there is more parsing to be done by the block as well. Efficient writing of cursors are the best for any cursor_sharing setting, as the code variables are always treated as bind variables and the statements are soft-parsed at worst.

The main point of this post is to show you how to write better SQL so that it can run as efficiently as possible on Oracle. So whether you are a DBA or a developer, I hope you can take something positive away from this.

Thanks for reading!
Shanil

OraX (DBA Management and Monitoring Tool)

OraX is a customizable Oracle Database Management and Monitoring Tool written by an Oracle Certified Professional (OCP) DBA for DBA’s.  There is a Helpdesk version available for management or helpdesk people who want an overview of performance.  It runs in Windows but is fully server aware whether you run Windows, Solaris or Linux.

Screens cannot be shown currently due to confidentiality.

This tool is written by a DBA and is in constant use by many DBA’s.  It is always being updated with the latest scripts and is compatible with the latest oracle enhancements.

Shanil

Upgrade to 11g R2 (11.2.0.3)

Hi

This post is, I suppose, the reason I’ve decided to create a blog.  We’ve just been through yet another hectic oracle upgrade and I wanted to share my findings with everyone.  I would love to hear of your upgrade experiences and what you’ve done to minimize impact to the client and maintain or improve performance.

I’ve always gone the manual rather than GUI route, so I checked metalink 837570.1, took the relevant steps from there and created a custom document.  Since we use oracle 64-bit on solaris, we made sure the oracle software had the correct dependancies, ODM was enabled, ‘oracle’ binary had the sticky bit set, etc.

Before starting, I created a pfile in 10g with:

SQL> create pfile=’$ORACLE_HOME/dbs/initSID.ora’ from spfile;

then copied this file to 11g and edited it there during the upgrade process.

After running utlu112i_x.sql, it is important to check each recommendation there.  I made sure that SYSTEM and SYSAUX (and XDB if you use it)  had more than enough space before the main upgrade script was run.  For those who have TZ (timezone) errors, the steps to upgrade this to 14 are later in this post.

This is a handy script to check for duplicates.  Make sure that you remove all SYSTEM duplicates, typically created by phantom DBA’s running sys scripts as system at some point in the past…

SQL> select OBJECT_NAME, OBJECT_TYPE from DBA_OBJECTS where OBJECT_NAME||OBJECT_TYPE in (select OBJECT_NAME||OBJECT_TYPE from
DBA_OBJECTS where OWNER=’SYS’) and OWNER=’SYSTEM’ and OBJECT_NAME not in
(‘AQ$_SCHEDULES_PRIMARY’, ‘AQ$_SCHEDULES’,’DBMS_REPCAT_AUTH’);

I found that setting job_queue_processes=0 was sufficient if you had any snapshots or materialized views that had to be refreshed.

For the step to check outstanding distributed transactions:

SQL> SELECT local_tran_id FROM dba_2pc_pending;

it is normally sufficient to:

SQL> EXECUTE dbms_transaction.purge_lost_db_entry(‘id’);

SQL> commit;

but if you get an ora-6510 then you need to run this step before the purge:

SQL> alter system set “_smu_debug_mode” = 4;

Make sure sys, system and aud$ are in SYSTEM tablespace!

Then when it’s time to edit the pfile (ini file), comment out all hidden parameters and events.  Use this unix script to weed out all obsolete parameters:

for text in `cat params.txt`
do
grep -i $text init<SID>.ora
done

where params.txt contains a list of all obsolete parameters.

For diagnostic dest, this is what I did.

Set diagnostic_dest=’/ORACLE’

where /ORACLE is the ORACLE_BASE, and my ORACLE_HOME is set to /ORACLE/products/11203.

Then under /ORACLE/diag/rdbms/<db> create a link that points to /<SID>ADMIN1/admin/<SID>/diag e.g.

ln –s /<SID>ADMIN1/admin/<SID>/diag <SID>

The advantage of doing it this way is that diag sits cleanly under the admin folders and all trace files etc. can be found in a structure under diag.  If you leave oracle to do its own thing and set diag to an absolute path, it creates diag/rdbms/<db> under that and then it’s a mission to get to your alert log for example.  Where will alert log be now?  Under diag/trace or in an xml structure which can be read with the adrci utility.

I found 11g R2 very hungry for shared pool so we typically doubled SHARED_POOL_SIZE especially for production.  I was also generous with JAVA_POOL_SIZE=400M as I didn’t want that to be a reason the upgrade script crashed.

At this point make sure you have a full backup!

So after running catupgrd.sql, the main upgrade script, search the output log for ‘ERROR at’.  If you get TZ errors, the steps are coming up.  If you got other errors, address them one by one or if the script crashed, it is perfectly safe to run catupgrd again.

Steps to upgrade timezone to 14:

SQL> SELECT PROPERTY_NAME, SUBSTR(property_value, 1, 30) value
FROM DATABASE_PROPERTIES
WHERE PROPERTY_NAME LIKE ‘DST_%’
ORDER BY PROPERTY_NAME;
Must return
DST_PRIMARY_TT_VERSION 4
DST_SECONDARY_TT_VERSION 0
DST_UPGRADE_STATE NONE

SQL> purge dba_recyclebin;
SQL> alter session set “_with_subquery”=materialize;

SQL> alter session set “_simple_view_merging”=TRUE;

SQL> exec DBMS_DST.BEGIN_PREPARE(14);
Must return “PL/SQL procedure successfully completed” with no errors.

SQL> TRUNCATE TABLE SYS.DST$TRIGGER_TABLE;

SQL> TRUNCATE TABLE sys.dst$affected_tables;
SQL> TRUNCATE TABLE sys.dst$error_table;

SQL> set serveroutput on
BEGIN
DBMS_DST.FIND_AFFECTED_TABLES
(affected_tables => ‘sys.dst$affected_tables’,
log_errors => TRUE,
log_errors_table => ‘sys.dst$error_table’);
END;
/

These must return no rows:
SQL> SELECT * FROM sys.dst$affected_tables;
SQL> SELECT * FROM sys.dst$error_table;
SQL> SELECT * FROM sys.dst$error_table where ERROR_NUMBER= ‘1883’;
SQL> SELECT * FROM sys.dst$error_table where ERROR_NUMBER= ‘1878’;
SQL> SELECT * FROM sys.dst$error_table where ERROR_NUMBER not in (‘1878′,’1883’);

SQL> EXEC DBMS_DST.END_PREPARE;

sqlplus / as sysdba

SQL> shutdown immediate;
SQL> startup upgrade;
SQL> set serveroutput on
SQL> purge dba_recyclebin;
SQL> TRUNCATE TABLE SYS.DST$TRIGGER_TABLE;
SQL> TRUNCATE TABLE sys.dst$affected_tables;
SQL> TRUNCATE TABLE sys.dst$error_table;
SQL> alter session set “_with_subquery”=materialize;
SQL> alter session set “_simple_view_merging”=TRUE;
SQL> EXEC DBMS_DST.BEGIN_UPGRADE(14);
SQL> shutdown immediate

SQL> startup

SQL> alter session set “_with_subquery”=materialize;
alter session set “_simple_view_merging”=TRUE;

SQL> set serveroutput on
VAR numfail number
BEGIN
DBMS_DST.UPGRADE_DATABASE(:numfail,
parallel => TRUE,
log_errors => TRUE,
log_errors_table => ‘SYS.DST$ERROR_TABLE’,
log_triggers_table => ‘SYS.DST$TRIGGER_TABLE’,
error_on_overlap_time => FALSE,
error_on_nonexisting_time => FALSE);
DBMS_OUTPUT.PUT_LINE(‘Failures:’|| :numfail);
END;
/
Must return: Failures:0

If 0 failures then end upgrade:
SQL> VAR fail number
BEGIN
DBMS_DST.END_UPGRADE(:fail);
DBMS_OUTPUT.PUT_LINE(‘Failures:’|| :fail);
END;
/

SQL> SELECT PROPERTY_NAME, SUBSTR(property_value, 1, 30) value
FROM DATABASE_PROPERTIES
WHERE PROPERTY_NAME LIKE ‘DST_%’
ORDER BY PROPERTY_NAME;
Must return
DST_PRIMARY_TT_VERSION 14
DST_SECONDARY_TT_VERSION 0
DST_UPGRADE_STATE NONE

SQL> SELECT * FROM v$timezone_file;

timezlrg_14.dat 14

After the upgrade, make sure JOB_QUEUE_PROCESSES is set to at least 10 or 20 otherwise automatic statistics gathering jobs etc. will not run.

Unset shared_pool_reserved_size or make it 10% of shared_pool_size.  If you make this too small, oracle grows it but sometimes by not enough.  Making it too large can also cause problems.

If your database is OLTP, then by default adaptive cursor sharing is enabled.  Then set CURSOR_SHARING=FORCE.  There is a long story behind this, but in a nutshell this means that oracle will convert literals in application-generated SQL to system bind variables, and this is to prevent hard parsing that will fragment your shared pool.  Most applications are not oracle aware and don’t make use of variables (which oracle translates into bind variables). Also, ‘parse once, execute many’ is ideal, but that is another long story.

However if you want to turn off adaptive cursor sharing, for example in DSS systems, then these are the steps:

– Set CURSOR_SHARING=EXACT
– alter system set “_optimizer_extended_cursor_sharing_rel”=none scope=both;
– alter system set “_optimizer_extended_cursor_sharing”=none scope= both;
– alter system set “_optimizer_adaptive_cursor_sharing”=false scope= both;

We found that AMM (automatic memory management) was badly behaved in our development databases. It made strange decisions, such as taking away most of the space from default buffer cache and adding it to shared pool.  Obviously, we did not want unpredictable behaviour in production.  Instead, we monitored the shared pool usage and adjusted sizes dynamically as needed.  These are the parameter settings to turn AMM off:

sga_max_size=0
sga_target=0
_memory_imm_mode_without_autosga=FALSE
memory_max_target=0
memory_target=0

We also noticed 11g starting the resource manager and limited CPU on certain sessions (!!) so we turned it off with:

SQL> ALTER SYSTEM SET RESOURCE_MANAGER_PLAN = ” scope=both

We also found automatic network access in 10g, for example with running http reports, were taken away in 11g.  Now you have to add these users to the ACL (access control list) to prevent ORA-24247 errors.

Run this to get a list of users to add to ACL:
SQL> select distinct owner from dba_dependencies
where referenced_name in (‘UTL_MAIL’,’UTL_TCP’,’UTL_SMTP’,’UTL_HTTP’)
and owner not in (‘SYS’,’PUBLIC’)
order by owner;

Create the ACL and add the first user with connect privilege:
begin
DBMS_NETWORK_ACL_ADMIN.CREATE_ACL(
acl => ‘ACL.xml’,
description => ‘Users that have ACL privs’,
principal => ‘USER1’,
is_grant => true,
privilege => ‘connect’);
commit;
end;
/

Add resolve privilege to the first user:
begin
DBMS_NETWORK_ACL_ADMIN.ADD_PRIVILEGE(
acl => ‘ACL.xml’,
principal => ‘USER1’,
is_grant => true,
privilege => ‘resolve’);
commit;
end;
/

Assign * domain to ACL to make hosts accessible from anywhere.
begin
DBMS_NETWORK_ACL_ADMIN.assign_acl (
acl => ‘ACL.xml’,
host => ‘*’,
lower_port => 1,
upper_port => 9999);
commit;
end;
/

To add more users with connect and resolve, repeat these steps:
begin
DBMS_NETWORK_ACL_ADMIN.ADD_PRIVILEGE(
acl => ‘ACL.xml’,
principal => ‘NEXT_USER’,
is_grant => true,
privilege => ‘connect’);
commit;
end;
/

begin
DBMS_NETWORK_ACL_ADMIN.ADD_PRIVILEGE(
acl => ‘ACL.xml’,
principal => ‘NEXT_USER’,
is_grant => true,
privilege => ‘resolve’);
commit;
end;
/

Don’t forget some of these typical post upgrade steps:

– Startup and shutdown scripts

– Check permissions of all utlfile directories and dba_directories – make sure they are now writable by oracle11
– Move each crontab entry for this database from oracle10 crontab to oracle11 crontab.

Some wait events we found in 11g:

‘read by other session’ is now an extra granularity of the ‘buffer busy waits’ event. This is simply one process waiting for another process to complete an I/O to read a hot block from disk into memory.

‘direct path read’ – a lot said on the internet about this!

‘library cache: mutex X’ – I am still investigating how to minimize these. It is obvious that hard parsing and SQL version duplicates have to be reduced, but I wonder about this mutex behavious in 11g as compared to 10g.  I am investigating these parameters:

_cursor_obsolete_threshold=x to force obsolete parent cursors (e.g. x=20)
_mutex_wait_scheme = 0 (revert to 10g behavior??)

A final note on shared pool management. Make sure you use dbms.shared_pool.keep to ‘pin’ all commonly used objects on startup of the database.  This gives a huge benefit to prevent shared pool fragmentation.  We found that if you have enough memory on your server, that it’s better to allocated a high enough shared pool rather than the pre-10g days where you had to fine-tune and could not have it too large.  Our shared pool sits at 10G on our main production database and hovers around 15% free.

I use a custom written Windows tool called OraX to monitor and manage our databases.  If you would like to be a trialist/tester, please let me know.

I hope your upgrade has gone as smoothly as ours (previous upgrades have been nightmares)! Please let me know your thoughts and share your experiences!

Shanil