Neil Chandler's DBA Blog

A resource for Database Professionals

Posts Tagged ‘oracle

RMAN Incarnations revisited (11G)

with one comment

Time for an update to a older post. I have previously talked about the annoyance of connecting to RMAN with a duplicated database where the DBID has not been changed. RMAN happily breaks the catalog by assuming the “new” database is a new incarnation, and prevents the previous owner of the catalog from using the backups.

I wrote a blog post a while ago about hacking your way past this problem, but was recently informed by Martin Bach that there was actually an RMAN command to fix the Incarnation problem I had encountered, so I though I had better take a look and see if it worked!

Well, the first thing I noticed was that Oracle 11G does not break when connecting from a different database with the same DBID the way it did in Oracle 10G:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sat Jan 26 12:26:10 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL1 (DBID=1229390655)
connected to recovery catalog database

RMAN> list incarnation;
starting full resync of recovery catalog
full resync complete
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
1  2 ORCL1 1229390655 CURRENT 754488 30/10/09 11:38:43

RMAN> list backup summary;

List of Backups
===============
Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- ----------------- ------- ------- ---------- ---
331 B A A DISK 26/01/13 08:55:32 1 1 NO BACKUP1
375 B A A DISK 26/01/13 09:06:44 1 1 NO BACKUP2
400 B A A DISK 26/01/13 09:07:02 1 1 NO BACKUP3
587 B F A DISK 26/01/13 11:20:09 1 1 YES FULL BACKUP
609 B F A DISK 26/01/13 11:20:11 1 1 NO TAG20130126T112010

And on the alternate database:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sat Jan 26 12:15:07 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL2 (DBID=1229390655)
connected to recovery catalog database

RMAN> list incarnation;
starting full resync of recovery catalog
full resync complete
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1 18 ORCL2 1229390655 PARENT 1 13/08/09 23:00:48
1  2 ORCL2 1229390655 CURRENT 754488 30/10/09 11:38:43

RMAN> list backup summary;

specification does not match any backup in the repository

RMAN>

Whilst the incarnations look a little incorrect (referring to ORCL2), the system does not break. So, no more need to hack around with incarnations if the system breaks accidentally. However, what if you register the other database…

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 05:44:06 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL2 (DBID=1229390655)
connected to recovery catalog database
RMAN> register database;
starting full resync of recovery catalog
full resync complete
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of register command on default channel at 01/27/2013 05:44:12
RMAN-20002: target database already registered in recovery catalog

So, after a little effort it would appear I can’t easily break the incarnations in Oracle 11G. So let’s try. I recovered the ORCL1 database to create a new incarnation to see how ORCL2 would behave when connected:

on ORCL1:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 12:32:09 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL1 (DBID=1229390655)
 connected to recovery catalog database


RMAN> list incarnation;
 List of Database Incarnations
 DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
 ------- ------- -------- ---------------- --- ---------- ----------
 1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
 1 2 ORCL1 1229390655 PARENT 754488 30/10/09 11:38:43
 1 921 ORCL1 1229390655 CURRENT 10215936 27/01/13 12:27:12 <- new incarnation

<BR>

And now ORCL2 behaves a little differently, recognising the ORCL1 incarnations correctly, and throwing an error:

[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 12:19:27 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL2 (DBID=1229390655)
 connected to recovery catalog database

RMAN> list incarnation;
 List of Database Incarnations
 DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
 ------- ------- -------- ---------------- --- ---------- ----------
 1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
 1 2 ORCL1 1229390655 PARENT 754488 30/10/09 11:38:43
 1 921 ORCL1 1229390655 CURRENT 10215936 27/01/13 12:27:12

RMAN> list backup summary;
RMAN-00571: ===========================================================
 RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
 RMAN-00571: ===========================================================
 RMAN-03002: failure of list command at 01/27/2013 12:19:38
 RMAN-06004: ORACLE error from recovery catalog database: RMAN-20004: target database name does not match name in recovery catalog

So, what if I change the name of ORCL2 back to ORCL1. Can I reproduce my error then?

[oracle@localhost dbs]$ sqlplus / as sysdba
SQL*Plus: Release 11.1.0.7.0 - Production on Sun Jan 27 12:23:29 2013
Copyright (c) 1982, 2008, Oracle. All rights reserved.
Connected to:
 Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
 With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> shutdown immediate;
 Database closed.
 Database dismounted.
 ORACLE instance shut down.

SQL> startup mount;
 ORACLE instance started.
Total System Global Area 456146944 bytes
 Fixed Size 1344840 bytes
 Variable Size 381684408 bytes
 Database Buffers 67108864 bytes
 Redo Buffers 6008832 bytes
 Database mounted.

 SQL> exit
 Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
 With the Partitioning, OLAP, Data Mining and Real Application Testing options

[oracle@localhost dbs]$ nid target=system/oracle dbname=orcl1 setname=yes
DBNEWID: Release 11.2.0.2.0 - Production on Sun Jan 27 12:24:10 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Connected to database ORCL2 (DBID=1229390655)
Connected to server version 11.2.0
Control Files in database:
 /home/oracle/app/oracle/oradata/orcl/control01.ctl
 /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl
Change database name of database ORCL2 to ORCL1? (Y/[N]) => Y
Proceeding with operation
 Changing database name from ORCL2 to ORCL1
 Control File /home/oracle/app/oracle/oradata/orcl/control01.ctl - modified
 Control File /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl - modified
 Datafile /home/oracle/app/oracle/oradata/orcl/system01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/sysaux01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/undotbs01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/users01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/example01.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/APEX_1246426611663638.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/APEX_1265209995679366.db - wrote new name
 Datafile /home/oracle/app/oracle/oradata/orcl/temp01.db - wrote new name
 Control File /home/oracle/app/oracle/oradata/orcl/control01.ctl - wrote new name
 Control File /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl - wrote new name
 Instance shut down
Database name changed to ORCL1.
 Modify parameter file and generate a new password file before restarting.
 Succesfully changed database name.
 DBNEWID - Completed succesfully.
[note: I have already got the relevant init.ora and oratab setup]
[oracle@localhost dbs]$ . oraenv
 ORACLE_SID = [orcl2] ? orcl1
 The Oracle base has been set to /home/oracle/app/oracle

 [oracle@localhost dbs]$ sqlplus / as sysdba
SQL*Plus: Release 11.1.0.7.0 - Production on Sun Jan 27 12:24:31 2013
Copyright (c) 1982, 2008, Oracle. All rights reserved.
Connected to an idle instance.
SQL> startup mount;
 ORACLE instance started.
Total System Global Area 456146944 bytes
 Fixed Size 1344840 bytes
 Variable Size 381684408 bytes
 Database Buffers 67108864 bytes
 Redo Buffers 6008832 bytes
 Database mounted.
SQL> alter database open;
Database altered.

SQL >exit

[oracle@localhost dbs]$ rman target system/oracle catalog rman/rman@orcl1

Recovery Manager: Release 11.2.0.2.0 – Production on Sun Jan 27 12:52:45 2013

Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.

connected to target database: ORCL1 (DBID=1229390655)
connected to recovery catalog database

RMAN> list incarnation;

database reset to incarnation 2
starting full resync of recovery catalog

List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
——- ——- ——– —————- — ———- ———-
1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
1 2 ORCL1 1229390655 CURRENT 754488 30/10/09 11:38:43
1 921 ORCL1 1229390655 ORPHAN 10215936 27/01/13 12:27:12

So, the newly rename-to ORCL1 thinks we are at incarnation 2. However, log back into the original ORCL1 and it resets the incarnation back to 921. Still no corruption, still no problem!

So, I still can’t prove whether the ALTER DATABASE SET INCARNATION command will work as mentioned to me, or whether it’s just something that allows me to recover across a resetlogs command. Looks like I’ll have to reinstall Oracle 10G… tomorrow.

Written by Neil Chandler

13th February 2013 at 22:50

Oracle Data Files

with 2 comments

I was looking at the contents of a tablespace recently – just the extents, not a block dump – with the intention of shrinking a couple of the associated datafiles (don’t ask why – the reason is both stupid and irrelevant). I needed to know just how much I could shrink immediately, and if there were any quick wins in terms of moving objects from near the end of files so even more space could be reclaimed. I was a little surprised to discover that something which I have been doing for over 20 years was not as common knowledge as I thought it was, so I though I would try to let some more people know how to do this.

So, what’s in a tablespace? It’s made up of a number of datafiles (whether in ASM, RAW or on a journalled file-system is irrelevant for this conversation). Within the datafiles are extents. These are logical groupings of blocks – a multiple of the block size – which show where your objects are stored within the datafile.

I wrote this piece of SQL to show where the extents live in 1992. It still works today, regardless of whether your tablespaces are dictionary or locally managed:

 
SELECT tablespace_name, file_id, owner, segment_name, block_id begin,
blocks, block_id+blocks-1 end, bytes/1024 KB, '' free
FROM sys.dba_extents
where tablespace_name = 'USERS'
UNION
SELECT tablespace_name, file_id, '' owner, '' segment_name, block_id begin,
blocks, block_id+blocks-1 end, bytes/1024 KB, 'FREE' free 
FROM sys.dba_free_space 
where tablespace_name = 'USERS' 
ORDER BY 1,2,5

TSPC ID OWNER  SEGMENT_NAME    BEGIN  BLOCKS   END     KB  FREE
USERS 4 SCOTT  DEPT              128       8   135     64
USERS 4 SCOTT  PK_DEPT           136       8   143     64
USERS 4 SCOTT  EMP               144       8   151     64
USERS 4 SCOTT  PK_EMP            152       8   159     64
USERS 4 SCOTT  SALGRADE          160       8   167     64
USERS 4                          168     128   295   1024 FREE
USERS 4 SYS    TABLE_1           296       8   303     64
USERS 4 SYS    TABLE_1           304       8   311     64
USERS 4 SYS    TABLE_1           312       8   319     64
USERS 4 SYS    TABLE_1           320       8   327     64
USERS 4 SYS    TABLE_1           328       8   335     64
USERS 4 SYS    IND_1             336       8   343     64
USERS 4                          344      40   383    320 FREE
USERS 4                          384     128   511   1024 FREE
USERS 4                          512     128   639   1024 FREE
USERS 6 SYS    TABLE_2           128     128   255   1024
USERS 6                          256   12544 12799 100352 FREE
USERS 6 SYS    IND_2           12800     256 13055   2048

So, as we can see from the output above, the tablespace USERS has 2 datafiles; “4″ and “6″ (you can identify them fully using view DBA_DATA_FILES). We can shrink file 4 immediately by 1024K+1024K+320K = a bit over 2M.

File 6 is a bit more of a problem. There’s an index IND_2 in the way, unhelpfully located at the end of the datafile, which stops us from shrinking the datafile. If you try to resize the file using ALTER DATABASE DATAFIL E ‘+DATA/orcl/…..’ 10M, it will fail with the error ORA-03297: file contains used data beyond requested RESIZE value. In this case, the remedy may be as simple as performing an ALTER INDEX IND_2 REBUILD TABLESPACE USERS ONLINE; This will (probably) relocate the index somewhere lower in the file and allow us to shrink the datafile. Moving tables with an ALTER TABLE … MOVE command may prove more difficult depending upon your throughput due to the extra locks needed at the start and end of the transaction, and extra care needs to be take with any LOB objects involved.

Written by Neil Chandler

25th January 2013 at 16:39

Exposing the Oracle Alert Log to SQL

leave a comment »

I’ve been spending some time working in Apex recently, building a small app to draw together the monitoring of application and infrastructure components into a single easy-to-visualise tool. As part of that, I wanted to be able to read and report on the alert log. Traditionally, that would have meant creating an external table to point to the alert log and reading it that way, with lots of string manipulation and regular expressions to try to pull out useful bits of information. However, Oracle 11G had made that a lot easier. Step forward X$DBGALERTEXT. This is the decoded version of the XML version of the Alert log, and as such provides lots of lovely columns to filter by, rather than a single line of text to decode. Particularly useful (for me) is the MESSAGE_LEVEL. Is this line of text informational (16), or critical (1), or something in between? Of course, each “normal” line of text is still available in the MESSAGE_TEXT column.

SQL> desc x$dbgalertext;
 Name                           Type
 ------------------------------ --------------------------------------------------------
 ADDR                           RAW(4)
 INDX                           NUMBER
 INST_ID                        NUMBER
 ORIGINATING_TIMESTAMP          TIMESTAMP(3) WITH TIME ZONE
 NORMALIZED_TIMESTAMP           TIMESTAMP(3) WITH TIME ZONE
 ORGANIZATION_ID                VARCHAR2(64)
 COMPONENT_ID                   VARCHAR2(64)
 HOST_ID                        VARCHAR2(64)
 HOST_ADDRESS                   VARCHAR2(46)
 MESSAGE_TYPE                   NUMBER
 MESSAGE_LEVEL                  NUMBER
 MESSAGE_ID                     VARCHAR2(64)
 MESSAGE_GROUP                  VARCHAR2(64)
 CLIENT_ID                      VARCHAR2(64)
 MODULE_ID                      VARCHAR2(64)
 PROCESS_ID                     VARCHAR2(32)
 THREAD_ID                      VARCHAR2(64)
 USER_ID                        VARCHAR2(64)
 INSTANCE_ID                    VARCHAR2(64)
 DETAILED_LOCATION              VARCHAR2(160)
 PROBLEM_KEY                    VARCHAR2(64)
 UPSTREAM_COMP_ID               VARCHAR2(100)
 DOWNSTREAM_COMP_ID             VARCHAR2(100)
 EXECUTION_CONTEXT_ID           VARCHAR2(100)
 EXECUTION_CONTEXT_SEQUENCE     NUMBER
 ERROR_INSTANCE_ID              NUMBER
 ERROR_INSTANCE_SEQUENCE        NUMBER
 VERSION                        NUMBER
 MESSAGE_TEXT                   VARCHAR2(2048)
 MESSAGE_ARGUMENTS              VARCHAR2(128)
 SUPPLEMENTAL_ATTRIBUTES        VARCHAR2(128)
 SUPPLEMENTAL_DETAILS           VARCHAR2(128)
 PARTITION                      NUMBER
 RECORD_ID                      NUMBER

Very handy. Just add your own view, synonym and permissions to read the view, and you’re away…

create view v_$alert_log as select * from x$dbgalertext;
create public synonym v$alert_log for sys.v_$alert_log;
grant select on v$alert_log to whomever...

  1* select message_text from v$alert_log where ...;

MESSAGE_TEXT
-----------------------------------------------------------------
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Shared memory segment for instance monitoring created
Picked latch-free SCN scheme 2
Autotune of undo retention is turned on.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options.

Using parameter settings in server-side pfile /home/oracle/app/oracle/product/11.2.0/dbhome_2/dbs/initorcl.ora
System parameters with non-default values:
  processes                = 200
  sessions                 = 322
  sga_max_size             = 2G
  pre_page_sga             = TRUE
  nls_language             = "ENGLISH"
  nls_territory            = "UNITED KINGDOM"
  filesystemio_options     = "SetAll"
  sga_target               = 2G
  control_files            = "/u02/oradata/orcl/control01.ctl"
.
[snip]
.
  aq_tm_processes          = 1
  diagnostic_dest          = "/u20/apps/oracle"
PMON started with pid=2, OS id=2492
PSP0 started with pid=3, OS id=2494
VKTM started with pid=4, OS id=2512 at elevated priority
VKTM running at (1)millisec precision with DBRM quantum (100)ms
GEN0 started with pid=5, OS id=2520
DIAG started with pid=6, OS id=2522
...etc...

Written by Neil Chandler

26th March 2012 at 21:58

The 10046 trace. Largely useless, isn’t it?

with 13 comments

The other night I was sat in the pub with some like-minded individuals discussing the relative merits of the 10046 trace (we Rock! in the pub, dudes!) and somebody asked me how often I has actually used it in anger? A well-respected DBA / Architect maintained it was a pretty useless and difficult option to use, given the topology of modern applications (e.g. How do you find the correct session with all that connection pooling going on from multiple web servers.)

My answer surprised me – I thought back to one client where I spent 90% of my time performance tuning a large (TiB-ish) OLTP/Batch hybrid system and concluded that I had ran a 10046 against production about once a year. Once. So if the 10046 is the holy grail of plan information, why wasn’t I using it that much. And why did I never use a 10053 against Production there?

The answer for me is a little more complex than that given in the pub:

1. as stated above, it’s hard to catch the in-flight session unless the application is instrumented to inject the trace statement when needed (and how many applications are instrumented to help you discover problems? Screen ST03 in SAP is very helpful. Any others in major ERP’s? Thought not.)

2. In many places that I have worked, getting authorisation to make any a change to a 24×7 mission-critical system is highly bureaucratic, involving cast-iron justification for the change and it’s positive benefits, requirement that there will be no adverse effects because of the change, very senior sign-off, more red-tape, etc. This causes a significant amount of work simply to put a trace on, even if you can catch the SQL. This can end up being more work than actually fixing the problem.

3. An awful lot of SQL tuning is a fairly blunt affair, as the developer (who is frequently database-blind) has usually missed something obvious. It is frequently to do with incorrectly using or not using an index (or using a poor index), or lack of filtering data at the right point to minimise the I/O.

4. Most importantly, if you have AWR and ASH, it’s not really needed. For each plan created by the optimizer the database stores the bind variables along with it, so we can usually understand why the optimizer makes the decisions it makes. ASH contains the main event waits. Why bother trying to capture all of the detail in a trace when you really don’t need that much detail, and it’s all already there; ready to be extracted from the relevant tables (e.g. dba_hist_active_sess_history, dba_hist_sql_plan and dba_hist_sql_bind.)

I have never used a 10053 trace on a Production system. I have simply never needed to know the decisions taken by the optimizer in that much detail. Like most DBA’s and Oracle consultants, I don’t go from site-to-site on a weekly basis resolving edge-case problems that the incumbent DBA’s haven’t had the time, or possibly don’t have the skills, to resolve themselves. I usually don’t need that level of confirmation that I’m right about why the plan is wrong, and I don’t have the time to conclusively prove it over and over again – I just need to get the fix into place and move onto the next problem.

That said, perhaps I should get fully to the bottom of these problems to ensure that they never occur again – which is the fundamental problem with Adaptive Cursor Sharing.

Written by Neil Chandler

22nd November 2011 at 20:06

Problems with RMAN and incarnations.

with 5 comments

One day, not so very long ago, I was at a client site looking through the “passive” half of an AIX  HACMP clustered server to tidy it up a little as we were experiencing pressure on space. There was a test database on there with a very large amount of historic archive logs. I thought it would be a good idea to check the database backups in RMAN and maybe do some tidying up through that route. This, it turned out, was not the most sensible thing I have done. The test database was a straight binary copy of the Production database. It had received no subsequent changes, especially the most important one from an RMAN perspective: the Database ID. Without a warning, RMAN immediately assumed that this database, with its more recent resetlogs and matching ID, was a new Incarnation of Production and promptly amended the catalog to that effect. Let’s just see that in action:

[oracle]$ export ORACLE_SID=PROD
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:01:41 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: PROD (DBID=1099918981)
connected to recovery catalog database

RMAN> list incarnation;

List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       8       PROD     1099918981       PARENT  1          30-JUN-05
1       2       PROD     1099918981       CURRENT 446075     04-APR-10

RMAN> list backup summary;

List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        FULL BACKUP
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

RMAN> exit

Recovery Manager complete.

[oracle]$ export ORACLE_SID=TEST
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:23 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: TEST (DBID=1099918981)
connected to recovery catalog database

RMAN> list backup summary;

new incarnation of database registered in recovery catalog
starting full resync of recovery catalog
full resync complete

List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        FULL BACKUP
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

RMAN> list incarnation;

List of Database Incarnations
DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1       8       PROD     1099918981       PARENT  1          30-JUN-05
1       2       TEST     1099918981       PARENT  446075     04-APR-10
1       1789    TEST     1099918981       CURRENT 3023938    27-FEB-11

RMAN> exit

Recovery Manager complete.

And lets see what happens when we go into RMAN for Production

[oracle]$ export ORACLE_SID=PROD
[oracle]$ rman target / catalog rman/rman@rman_db

Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:59 2011

Copyright (c) 1982, 2005, Oracle.  All rights reserved.

connected to target database: PROD (DBID=1099918981)
connected to recovery catalog database

RMAN> list backup summary;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of list command at 02/27/2011 21:03:04
RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog

RMAN> list incarnation;
RMAN> exit

Recovery Manager complete.

So where does that leave me? With the current production datawarehouse unable to access RMAN as it’s not the right Incarnation. One quick look at the clock, and you know what time it is. 30 minutes to the start of a tight backup window, which will fail. It’s inevitable that this sort of thing never happens with 8 hours of free time to work out the best way to resolve the problem, but with scant time to sort it out with no impact on the Production system. After some thought, and some Google, it became apparent that the only solution was to hack manually edit the RMAN catalog to remove the new incarnation.

EDIT! Before trying the catalog mod below you should look at the My Oracle Support document 412113.1, and check out the rman commands:

RMAN> list incarnation;
RMAN> reset database to incarnation <dbinc_key>;    
RMAN> resync catalog;
RMAN> list incarnation;

OK. Proceed at your own risk!

To remove the bad incarnation record from the recovery catalog:

[oracle]$ sqlplus rman/rman@rman_db
RMAN @ RMAN_DB >  select * from rc_database_incarnation order by resetlogs_time;
DB_KEY             DBID DBINC_KEY NAME     RESETLOGS_CHANGE# RESETLOGS         CUR       PARENT_DB INC_KEY          PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS
----------   ---------- ----------         --------          ----------------- --------- ---       ---------------- ----------------------- --------- --------
1 1099918981          8 PROD                      1          30-JUN-05         NO                                                                     PARENT
1 1099918981          2 PROD                 446075          04-APR-10         NO          8                                              1 30-JUN-05 PARENT
1 1099918981       1789 TEST                3023938          27-FEB-11         YES         2                                         446075 04-APR-10 CURRENT
RMAN @ RMAN_DB > select * from db;
DB_KEY      DB_ID HIGH_CONF_RECID LAST_KCCDIVTS HIGH_IC_RECID CURR_DBINC_KEY
---------- ---------- --------------- ------------- ------------- --------------
1 1099918981                     744219186             2           1789
RMAN @ RMAN_DB > update db set curr_dbinc_key = 2;
1 row updated.
RMAN @ RMAN_DB > delete from dbinc where dbinc_key = 1789;
1 row deleted.
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY             DBID DBINC_KEY NAME     RESETLOGS_CHANGE# RESETLOGS         CUR       PARENT_DB INC_KEY          PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS
----------   ---------- ----------         --------          ----------------- --------- ---       ---------------- ----------------------- --------- --------
1 1099918981          8 PROD                      1          30-JUN-05         NO                                                                     PARENT
1 1099918981          2 PROD                 446075          04-APR-10         YES         8                                              1 30-JUN-05 PARENT

RMAN @ RMAN_DB > commit;

Commit complete.
And let's see if we can use RMAN again...
[oracle]$ rman target / catalog rman/rman@rman_db
Recovery Manager: Release 10.2.0.1.0 - Production on Mon Mar 21 22:08:45 2011
Copyright (c) 1982, 2005, Oracle.  All rights reserved.
connected to target database: PROD  (DBID=1099918981)
connected to recovery catalog database
RMAN> list backup summary;
starting full resync of recovery catalog
full resync complete
List of Backups
===============
Key     TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- --------------- ------- ------- ---------- ---
1737    B  F  A DISK        27-FEB-11       1       1       YES        TAG20110227T144639
1752    B  F  A DISK        27-FEB-11       1       1       NO         TAG20110227T144855

And so, we are just about back where we started before some idiot messed up the RMAN catalog, and the backups work just fine. Now we need to change the dbid on the TEST database, using the nid command before another DBA does the same thing.

The last thing to do was to ensure that the recovery worked too.

NOTE: 11G Update to this blog entry

Written by Neil Chandler

23rd March 2011 at 10:46

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: