Upgrade to 11g R2 (11.2.0.3)

Hi

This post is, I suppose, the reason I’ve decided to create a blog.  We’ve just been through yet another hectic oracle upgrade and I wanted to share my findings with everyone.  I would love to hear of your upgrade experiences and what you’ve done to minimize impact to the client and maintain or improve performance.

I’ve always gone the manual rather than GUI route, so I checked metalink 837570.1, took the relevant steps from there and created a custom document.  Since we use oracle 64-bit on solaris, we made sure the oracle software had the correct dependancies, ODM was enabled, ‘oracle’ binary had the sticky bit set, etc.

Before starting, I created a pfile in 10g with:

SQL> create pfile=’$ORACLE_HOME/dbs/initSID.ora’ from spfile;

then copied this file to 11g and edited it there during the upgrade process.

After running utlu112i_x.sql, it is important to check each recommendation there.  I made sure that SYSTEM and SYSAUX (and XDB if you use it)  had more than enough space before the main upgrade script was run.  For those who have TZ (timezone) errors, the steps to upgrade this to 14 are later in this post.

This is a handy script to check for duplicates.  Make sure that you remove all SYSTEM duplicates, typically created by phantom DBA’s running sys scripts as system at some point in the past…

SQL> select OBJECT_NAME, OBJECT_TYPE from DBA_OBJECTS where OBJECT_NAME||OBJECT_TYPE in (select OBJECT_NAME||OBJECT_TYPE from
DBA_OBJECTS where OWNER=’SYS’) and OWNER=’SYSTEM’ and OBJECT_NAME not in
(‘AQ$_SCHEDULES_PRIMARY’, ‘AQ$_SCHEDULES’,’DBMS_REPCAT_AUTH’);

I found that setting job_queue_processes=0 was sufficient if you had any snapshots or materialized views that had to be refreshed.

For the step to check outstanding distributed transactions:

SQL> SELECT local_tran_id FROM dba_2pc_pending;

it is normally sufficient to:

SQL> EXECUTE dbms_transaction.purge_lost_db_entry(‘id’);

SQL> commit;

but if you get an ora-6510 then you need to run this step before the purge:

SQL> alter system set “_smu_debug_mode” = 4;

Make sure sys, system and aud$ are in SYSTEM tablespace!

Then when it’s time to edit the pfile (ini file), comment out all hidden parameters and events.  Use this unix script to weed out all obsolete parameters:

for text in `cat params.txt`
do
grep -i $text init<SID>.ora
done

where params.txt contains a list of all obsolete parameters.

For diagnostic dest, this is what I did.

Set diagnostic_dest=’/ORACLE’

where /ORACLE is the ORACLE_BASE, and my ORACLE_HOME is set to /ORACLE/products/11203.

Then under /ORACLE/diag/rdbms/<db> create a link that points to /<SID>ADMIN1/admin/<SID>/diag e.g.

ln –s /<SID>ADMIN1/admin/<SID>/diag <SID>

The advantage of doing it this way is that diag sits cleanly under the admin folders and all trace files etc. can be found in a structure under diag.  If you leave oracle to do its own thing and set diag to an absolute path, it creates diag/rdbms/<db> under that and then it’s a mission to get to your alert log for example.  Where will alert log be now?  Under diag/trace or in an xml structure which can be read with the adrci utility.

I found 11g R2 very hungry for shared pool so we typically doubled SHARED_POOL_SIZE especially for production.  I was also generous with JAVA_POOL_SIZE=400M as I didn’t want that to be a reason the upgrade script crashed.

At this point make sure you have a full backup!

So after running catupgrd.sql, the main upgrade script, search the output log for ‘ERROR at’.  If you get TZ errors, the steps are coming up.  If you got other errors, address them one by one or if the script crashed, it is perfectly safe to run catupgrd again.

Steps to upgrade timezone to 14:

SQL> SELECT PROPERTY_NAME, SUBSTR(property_value, 1, 30) value
FROM DATABASE_PROPERTIES
WHERE PROPERTY_NAME LIKE ‘DST_%’
ORDER BY PROPERTY_NAME;
Must return
DST_PRIMARY_TT_VERSION 4
DST_SECONDARY_TT_VERSION 0
DST_UPGRADE_STATE NONE

SQL> purge dba_recyclebin;
SQL> alter session set “_with_subquery”=materialize;

SQL> alter session set “_simple_view_merging”=TRUE;

SQL> exec DBMS_DST.BEGIN_PREPARE(14);
Must return “PL/SQL procedure successfully completed” with no errors.

SQL> TRUNCATE TABLE SYS.DST$TRIGGER_TABLE;

SQL> TRUNCATE TABLE sys.dst$affected_tables;
SQL> TRUNCATE TABLE sys.dst$error_table;

SQL> set serveroutput on
BEGIN
DBMS_DST.FIND_AFFECTED_TABLES
(affected_tables => ‘sys.dst$affected_tables’,
log_errors => TRUE,
log_errors_table => ‘sys.dst$error_table’);
END;
/

These must return no rows:
SQL> SELECT * FROM sys.dst$affected_tables;
SQL> SELECT * FROM sys.dst$error_table;
SQL> SELECT * FROM sys.dst$error_table where ERROR_NUMBER= ‘1883’;
SQL> SELECT * FROM sys.dst$error_table where ERROR_NUMBER= ‘1878’;
SQL> SELECT * FROM sys.dst$error_table where ERROR_NUMBER not in (‘1878′,’1883’);

SQL> EXEC DBMS_DST.END_PREPARE;

sqlplus / as sysdba

SQL> shutdown immediate;
SQL> startup upgrade;
SQL> set serveroutput on
SQL> purge dba_recyclebin;
SQL> TRUNCATE TABLE SYS.DST$TRIGGER_TABLE;
SQL> TRUNCATE TABLE sys.dst$affected_tables;
SQL> TRUNCATE TABLE sys.dst$error_table;
SQL> alter session set “_with_subquery”=materialize;
SQL> alter session set “_simple_view_merging”=TRUE;
SQL> EXEC DBMS_DST.BEGIN_UPGRADE(14);
SQL> shutdown immediate

SQL> startup

SQL> alter session set “_with_subquery”=materialize;
alter session set “_simple_view_merging”=TRUE;

SQL> set serveroutput on
VAR numfail number
BEGIN
DBMS_DST.UPGRADE_DATABASE(:numfail,
parallel => TRUE,
log_errors => TRUE,
log_errors_table => ‘SYS.DST$ERROR_TABLE’,
log_triggers_table => ‘SYS.DST$TRIGGER_TABLE’,
error_on_overlap_time => FALSE,
error_on_nonexisting_time => FALSE);
DBMS_OUTPUT.PUT_LINE(‘Failures:’|| :numfail);
END;
/
Must return: Failures:0

If 0 failures then end upgrade:
SQL> VAR fail number
BEGIN
DBMS_DST.END_UPGRADE(:fail);
DBMS_OUTPUT.PUT_LINE(‘Failures:’|| :fail);
END;
/

SQL> SELECT PROPERTY_NAME, SUBSTR(property_value, 1, 30) value
FROM DATABASE_PROPERTIES
WHERE PROPERTY_NAME LIKE ‘DST_%’
ORDER BY PROPERTY_NAME;
Must return
DST_PRIMARY_TT_VERSION 14
DST_SECONDARY_TT_VERSION 0
DST_UPGRADE_STATE NONE

SQL> SELECT * FROM v$timezone_file;

timezlrg_14.dat 14

After the upgrade, make sure JOB_QUEUE_PROCESSES is set to at least 10 or 20 otherwise automatic statistics gathering jobs etc. will not run.

Unset shared_pool_reserved_size or make it 10% of shared_pool_size.  If you make this too small, oracle grows it but sometimes by not enough.  Making it too large can also cause problems.

If your database is OLTP, then by default adaptive cursor sharing is enabled.  Then set CURSOR_SHARING=FORCE.  There is a long story behind this, but in a nutshell this means that oracle will convert literals in application-generated SQL to system bind variables, and this is to prevent hard parsing that will fragment your shared pool.  Most applications are not oracle aware and don’t make use of variables (which oracle translates into bind variables). Also, ‘parse once, execute many’ is ideal, but that is another long story.

However if you want to turn off adaptive cursor sharing, for example in DSS systems, then these are the steps:

– Set CURSOR_SHARING=EXACT
– alter system set “_optimizer_extended_cursor_sharing_rel”=none scope=both;
– alter system set “_optimizer_extended_cursor_sharing”=none scope= both;
– alter system set “_optimizer_adaptive_cursor_sharing”=false scope= both;

We found that AMM (automatic memory management) was badly behaved in our development databases. It made strange decisions, such as taking away most of the space from default buffer cache and adding it to shared pool.  Obviously, we did not want unpredictable behaviour in production.  Instead, we monitored the shared pool usage and adjusted sizes dynamically as needed.  These are the parameter settings to turn AMM off:

sga_max_size=0
sga_target=0
_memory_imm_mode_without_autosga=FALSE
memory_max_target=0
memory_target=0

We also noticed 11g starting the resource manager and limited CPU on certain sessions (!!) so we turned it off with:

SQL> ALTER SYSTEM SET RESOURCE_MANAGER_PLAN = ” scope=both

We also found automatic network access in 10g, for example with running http reports, were taken away in 11g.  Now you have to add these users to the ACL (access control list) to prevent ORA-24247 errors.

Run this to get a list of users to add to ACL:
SQL> select distinct owner from dba_dependencies
where referenced_name in (‘UTL_MAIL’,’UTL_TCP’,’UTL_SMTP’,’UTL_HTTP’)
and owner not in (‘SYS’,’PUBLIC’)
order by owner;

Create the ACL and add the first user with connect privilege:
begin
DBMS_NETWORK_ACL_ADMIN.CREATE_ACL(
acl => ‘ACL.xml’,
description => ‘Users that have ACL privs’,
principal => ‘USER1’,
is_grant => true,
privilege => ‘connect’);
commit;
end;
/

Add resolve privilege to the first user:
begin
DBMS_NETWORK_ACL_ADMIN.ADD_PRIVILEGE(
acl => ‘ACL.xml’,
principal => ‘USER1’,
is_grant => true,
privilege => ‘resolve’);
commit;
end;
/

Assign * domain to ACL to make hosts accessible from anywhere.
begin
DBMS_NETWORK_ACL_ADMIN.assign_acl (
acl => ‘ACL.xml’,
host => ‘*’,
lower_port => 1,
upper_port => 9999);
commit;
end;
/

To add more users with connect and resolve, repeat these steps:
begin
DBMS_NETWORK_ACL_ADMIN.ADD_PRIVILEGE(
acl => ‘ACL.xml’,
principal => ‘NEXT_USER’,
is_grant => true,
privilege => ‘connect’);
commit;
end;
/

begin
DBMS_NETWORK_ACL_ADMIN.ADD_PRIVILEGE(
acl => ‘ACL.xml’,
principal => ‘NEXT_USER’,
is_grant => true,
privilege => ‘resolve’);
commit;
end;
/

Don’t forget some of these typical post upgrade steps:

– Startup and shutdown scripts

– Check permissions of all utlfile directories and dba_directories – make sure they are now writable by oracle11
– Move each crontab entry for this database from oracle10 crontab to oracle11 crontab.

Some wait events we found in 11g:

‘read by other session’ is now an extra granularity of the ‘buffer busy waits’ event. This is simply one process waiting for another process to complete an I/O to read a hot block from disk into memory.

‘direct path read’ – a lot said on the internet about this!

‘library cache: mutex X’ – I am still investigating how to minimize these. It is obvious that hard parsing and SQL version duplicates have to be reduced, but I wonder about this mutex behavious in 11g as compared to 10g.  I am investigating these parameters:

_cursor_obsolete_threshold=x to force obsolete parent cursors (e.g. x=20)
_mutex_wait_scheme = 0 (revert to 10g behavior??)

A final note on shared pool management. Make sure you use dbms.shared_pool.keep to ‘pin’ all commonly used objects on startup of the database.  This gives a huge benefit to prevent shared pool fragmentation.  We found that if you have enough memory on your server, that it’s better to allocated a high enough shared pool rather than the pre-10g days where you had to fine-tune and could not have it too large.  Our shared pool sits at 10G on our main production database and hovers around 15% free.

I use a custom written Windows tool called OraX to monitor and manage our databases.  If you would like to be a trialist/tester, please let me know.

I hope your upgrade has gone as smoothly as ours (previous upgrades have been nightmares)! Please let me know your thoughts and share your experiences!

Shanil