Posts Tagged ‘oracle’
RMAN Incarnations revisited (11G)
Time for an update to a older post. I have previously talked about the annoyance of connecting to RMAN with a duplicated database where the DBID has not been changed. RMAN happily breaks the catalog by assuming the “new” database is a new incarnation, and prevents the previous owner of the catalog from using the backups.
I wrote a blog post a while ago about hacking your way past this problem, but was recently informed by Martin Bach that there was actually an RMAN command to fix the Incarnation problem I had encountered, so I though I had better take a look and see if it worked!
Well, the first thing I noticed was that Oracle 11G does not break when connecting from a different database with the same DBID the way it did in Oracle 10G:
[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sat Jan 26 12:26:10 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL1 (DBID=1229390655)
connected to recovery catalog database
RMAN> list incarnation;
starting full resync of recovery catalog
full resync complete
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
1 2 ORCL1 1229390655 CURRENT 754488 30/10/09 11:38:43
RMAN> list backup summary;
List of Backups
===============
Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag
------- -- -- - ----------- ----------------- ------- ------- ---------- ---
331 B A A DISK 26/01/13 08:55:32 1 1 NO BACKUP1
375 B A A DISK 26/01/13 09:06:44 1 1 NO BACKUP2
400 B A A DISK 26/01/13 09:07:02 1 1 NO BACKUP3
587 B F A DISK 26/01/13 11:20:09 1 1 YES FULL BACKUP
609 B F A DISK 26/01/13 11:20:11 1 1 NO TAG20130126T112010
And on the alternate database:
[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 - Production on Sat Jan 26 12:15:07 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL2 (DBID=1229390655)
connected to recovery catalog database
RMAN> list incarnation;
starting full resync of recovery catalog
full resync complete
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
------- ------- -------- ---------------- --- ---------- ----------
1 18 ORCL2 1229390655 PARENT 1 13/08/09 23:00:48
1 2 ORCL2 1229390655 CURRENT 754488 30/10/09 11:38:43
RMAN> list backup summary;
specification does not match any backup in the repository
RMAN>
Whilst the incarnations look a little incorrect (referring to ORCL2), the system does not break. So, no more need to hack around with incarnations if the system breaks accidentally. However, what if you register the other database…
[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1 Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 05:44:06 2013 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. connected to target database: ORCL2 (DBID=1229390655) connected to recovery catalog database RMAN> register database; starting full resync of recovery catalog full resync complete RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of register command on default channel at 01/27/2013 05:44:12 RMAN-20002: target database already registered in recovery catalog
So, after a little effort it would appear I can’t easily break the incarnations in Oracle 11G. So let’s try. I recovered the ORCL1 database to create a new incarnation to see how ORCL2 would behave when connected:
on ORCL1:
[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1 Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 12:32:09 2013 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. connected to target database: ORCL1 (DBID=1229390655) connected to recovery catalog database RMAN> list incarnation; List of Database Incarnations DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time ------- ------- -------- ---------------- --- ---------- ---------- 1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48 1 2 ORCL1 1229390655 PARENT 754488 30/10/09 11:38:43 1 921 ORCL1 1229390655 CURRENT 10215936 27/01/13 12:27:12 <- new incarnation
<BR>
And now ORCL2 behaves a little differently, recognising the ORCL1 incarnations correctly, and throwing an error:
[oracle@localhost ~]$ rman target system/oracle catalog rman/rman@orcl1 Recovery Manager: Release 11.2.0.2.0 - Production on Sun Jan 27 12:19:27 2013 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. connected to target database: ORCL2 (DBID=1229390655) connected to recovery catalog database RMAN> list incarnation; List of Database Incarnations DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time ------- ------- -------- ---------------- --- ---------- ---------- 1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48 1 2 ORCL1 1229390655 PARENT 754488 30/10/09 11:38:43 1 921 ORCL1 1229390655 CURRENT 10215936 27/01/13 12:27:12 RMAN> list backup summary; RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of list command at 01/27/2013 12:19:38 RMAN-06004: ORACLE error from recovery catalog database: RMAN-20004: target database name does not match name in recovery catalog
So, what if I change the name of ORCL2 back to ORCL1. Can I reproduce my error then?
[oracle@localhost dbs]$ sqlplus / as sysdba SQL*Plus: Release 11.1.0.7.0 - Production on Sun Jan 27 12:23:29 2013 Copyright (c) 1982, 2008, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options SQL> shutdown immediate; Database closed. Database dismounted. ORACLE instance shut down. SQL> startup mount; ORACLE instance started. Total System Global Area 456146944 bytes Fixed Size 1344840 bytes Variable Size 381684408 bytes Database Buffers 67108864 bytes Redo Buffers 6008832 bytes Database mounted. SQL> exit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options [oracle@localhost dbs]$ nid target=system/oracle dbname=orcl1 setname=yes DBNEWID: Release 11.2.0.2.0 - Production on Sun Jan 27 12:24:10 2013 Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved. Connected to database ORCL2 (DBID=1229390655) Connected to server version 11.2.0 Control Files in database: /home/oracle/app/oracle/oradata/orcl/control01.ctl /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl Change database name of database ORCL2 to ORCL1? (Y/[N]) => Y Proceeding with operation Changing database name from ORCL2 to ORCL1 Control File /home/oracle/app/oracle/oradata/orcl/control01.ctl - modified Control File /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl - modified Datafile /home/oracle/app/oracle/oradata/orcl/system01.db - wrote new name Datafile /home/oracle/app/oracle/oradata/orcl/sysaux01.db - wrote new name Datafile /home/oracle/app/oracle/oradata/orcl/undotbs01.db - wrote new name Datafile /home/oracle/app/oracle/oradata/orcl/users01.db - wrote new name Datafile /home/oracle/app/oracle/oradata/orcl/example01.db - wrote new name Datafile /home/oracle/app/oracle/oradata/orcl/APEX_1246426611663638.db - wrote new name Datafile /home/oracle/app/oracle/oradata/orcl/APEX_1265209995679366.db - wrote new name Datafile /home/oracle/app/oracle/oradata/orcl/temp01.db - wrote new name Control File /home/oracle/app/oracle/oradata/orcl/control01.ctl - wrote new name Control File /home/oracle/app/oracle/flash_recovery_area/orcl/control02.ctl - wrote new name Instance shut down Database name changed to ORCL1. Modify parameter file and generate a new password file before restarting. Succesfully changed database name. DBNEWID - Completed succesfully. [note: I have already got the relevant init.ora and oratab setup] [oracle@localhost dbs]$ . oraenv ORACLE_SID = [orcl2] ? orcl1 The Oracle base has been set to /home/oracle/app/oracle [oracle@localhost dbs]$ sqlplus / as sysdba SQL*Plus: Release 11.1.0.7.0 - Production on Sun Jan 27 12:24:31 2013 Copyright (c) 1982, 2008, Oracle. All rights reserved. Connected to an idle instance. SQL> startup mount; ORACLE instance started. Total System Global Area 456146944 bytes Fixed Size 1344840 bytes Variable Size 381684408 bytes Database Buffers 67108864 bytes Redo Buffers 6008832 bytes Database mounted. SQL> alter database open; Database altered. SQL >exit [oracle@localhost dbs]$ rman target system/oracle catalog rman/rman@orcl1
Recovery Manager: Release 11.2.0.2.0 – Production on Sun Jan 27 12:52:45 2013
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: ORCL1 (DBID=1229390655)
connected to recovery catalog database
RMAN> list incarnation;
database reset to incarnation 2
starting full resync of recovery catalog
List of Database Incarnations
DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time
——- ——- ——– —————- — ———- ———-
1 18 ORCL1 1229390655 PARENT 1 13/08/09 23:00:48
1 2 ORCL1 1229390655 CURRENT 754488 30/10/09 11:38:43
1 921 ORCL1 1229390655 ORPHAN 10215936 27/01/13 12:27:12
So, the newly rename-to ORCL1 thinks we are at incarnation 2. However, log back into the original ORCL1 and it resets the incarnation back to 921. Still no corruption, still no problem!
So, I still can’t prove whether the ALTER DATABASE SET INCARNATION command will work as mentioned to me, or whether it’s just something that allows me to recover across a resetlogs command. Looks like I’ll have to reinstall Oracle 10G… tomorrow.
Oracle Data Files
I was looking at the contents of a tablespace recently – just the extents, not a block dump – with the intention of shrinking a couple of the associated datafiles (don’t ask why – the reason is both stupid and irrelevant). I needed to know just how much I could shrink immediately, and if there were any quick wins in terms of moving objects from near the end of files so even more space could be reclaimed. I was a little surprised to discover that something which I have been doing for over 20 years was not as common knowledge as I thought it was, so I though I would try to let some more people know how to do this.
So, what’s in a tablespace? It’s made up of a number of datafiles (whether in ASM, RAW or on a journalled file-system is irrelevant for this conversation). Within the datafiles are extents. These are logical groupings of blocks – a multiple of the block size – which show where your objects are stored within the datafile.
I wrote this piece of SQL to show where the extents live in 1992. It still works today, regardless of whether your tablespaces are dictionary or locally managed:
SELECT tablespace_name, file_id, owner, segment_name, block_id begin, blocks, block_id+blocks-1 end, bytes/1024 KB, '' free FROM sys.dba_extents where tablespace_name = 'USERS' UNION SELECT tablespace_name, file_id, '' owner, '' segment_name, block_id begin, blocks, block_id+blocks-1 end, bytes/1024 KB, 'FREE' free FROM sys.dba_free_space where tablespace_name = 'USERS' ORDER BY 1,2,5 TSPC ID OWNER SEGMENT_NAME BEGIN BLOCKS END KB FREE USERS 4 SCOTT DEPT 128 8 135 64 USERS 4 SCOTT PK_DEPT 136 8 143 64 USERS 4 SCOTT EMP 144 8 151 64 USERS 4 SCOTT PK_EMP 152 8 159 64 USERS 4 SCOTT SALGRADE 160 8 167 64 USERS 4 168 128 295 1024 FREE USERS 4 SYS TABLE_1 296 8 303 64 USERS 4 SYS TABLE_1 304 8 311 64 USERS 4 SYS TABLE_1 312 8 319 64 USERS 4 SYS TABLE_1 320 8 327 64 USERS 4 SYS TABLE_1 328 8 335 64 USERS 4 SYS IND_1 336 8 343 64 USERS 4 344 40 383 320 FREE USERS 4 384 128 511 1024 FREE USERS 4 512 128 639 1024 FREE USERS 6 SYS TABLE_2 128 128 255 1024 USERS 6 256 12544 12799 100352 FREE USERS 6 SYS IND_2 12800 256 13055 2048
So, as we can see from the output above, the tablespace USERS has 2 datafiles; “4″ and “6″ (you can identify them fully using view DBA_DATA_FILES). We can shrink file 4 immediately by 1024K+1024K+320K = a bit over 2M.
File 6 is a bit more of a problem. There’s an index IND_2 in the way, unhelpfully located at the end of the datafile, which stops us from shrinking the datafile. If you try to resize the file using ALTER DATABASE DATAFIL E ‘+DATA/orcl/…..’ 10M, it will fail with the error ORA-03297: file contains used data beyond requested RESIZE value. In this case, the remedy may be as simple as performing an ALTER INDEX IND_2 REBUILD TABLESPACE USERS ONLINE; This will (probably) relocate the index somewhere lower in the file and allow us to shrink the datafile. Moving tables with an ALTER TABLE … MOVE command may prove more difficult depending upon your throughput due to the extra locks needed at the start and end of the transaction, and extra care needs to be take with any LOB objects involved.
Exposing the Oracle Alert Log to SQL
I’ve been spending some time working in Apex recently, building a small app to draw together the monitoring of application and infrastructure components into a single easy-to-visualise tool. As part of that, I wanted to be able to read and report on the alert log. Traditionally, that would have meant creating an external table to point to the alert log and reading it that way, with lots of string manipulation and regular expressions to try to pull out useful bits of information. However, Oracle 11G had made that a lot easier. Step forward X$DBGALERTEXT. This is the decoded version of the XML version of the Alert log, and as such provides lots of lovely columns to filter by, rather than a single line of text to decode. Particularly useful (for me) is the MESSAGE_LEVEL. Is this line of text informational (16), or critical (1), or something in between? Of course, each “normal” line of text is still available in the MESSAGE_TEXT column.
SQL> desc x$dbgalertext; Name Type ------------------------------ -------------------------------------------------------- ADDR RAW(4) INDX NUMBER INST_ID NUMBER ORIGINATING_TIMESTAMP TIMESTAMP(3) WITH TIME ZONE NORMALIZED_TIMESTAMP TIMESTAMP(3) WITH TIME ZONE ORGANIZATION_ID VARCHAR2(64) COMPONENT_ID VARCHAR2(64) HOST_ID VARCHAR2(64) HOST_ADDRESS VARCHAR2(46) MESSAGE_TYPE NUMBER MESSAGE_LEVEL NUMBER MESSAGE_ID VARCHAR2(64) MESSAGE_GROUP VARCHAR2(64) CLIENT_ID VARCHAR2(64) MODULE_ID VARCHAR2(64) PROCESS_ID VARCHAR2(32) THREAD_ID VARCHAR2(64) USER_ID VARCHAR2(64) INSTANCE_ID VARCHAR2(64) DETAILED_LOCATION VARCHAR2(160) PROBLEM_KEY VARCHAR2(64) UPSTREAM_COMP_ID VARCHAR2(100) DOWNSTREAM_COMP_ID VARCHAR2(100) EXECUTION_CONTEXT_ID VARCHAR2(100) EXECUTION_CONTEXT_SEQUENCE NUMBER ERROR_INSTANCE_ID NUMBER ERROR_INSTANCE_SEQUENCE NUMBER VERSION NUMBER MESSAGE_TEXT VARCHAR2(2048) MESSAGE_ARGUMENTS VARCHAR2(128) SUPPLEMENTAL_ATTRIBUTES VARCHAR2(128) SUPPLEMENTAL_DETAILS VARCHAR2(128) PARTITION NUMBER RECORD_ID NUMBER
Very handy. Just add your own view, synonym and permissions to read the view, and you’re away…
create view v_$alert_log as select * from x$dbgalertext; create public synonym v$alert_log for sys.v_$alert_log; grant select on v$alert_log to whomever... 1* select message_text from v$alert_log where ...; MESSAGE_TEXT ----------------------------------------------------------------- LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Shared memory segment for instance monitoring created Picked latch-free SCN scheme 2 Autotune of undo retention is turned on. LICENSE_MAX_USERS = 0 SYS auditing is disabled Starting up: Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production With the Partitioning, OLAP, Data Mining and Real Application Testing options. Using parameter settings in server-side pfile /home/oracle/app/oracle/product/11.2.0/dbhome_2/dbs/initorcl.ora System parameters with non-default values: processes = 200 sessions = 322 sga_max_size = 2G pre_page_sga = TRUE nls_language = "ENGLISH" nls_territory = "UNITED KINGDOM" filesystemio_options = "SetAll" sga_target = 2G control_files = "/u02/oradata/orcl/control01.ctl" . [snip] . aq_tm_processes = 1 diagnostic_dest = "/u20/apps/oracle" PMON started with pid=2, OS id=2492 PSP0 started with pid=3, OS id=2494 VKTM started with pid=4, OS id=2512 at elevated priority VKTM running at (1)millisec precision with DBRM quantum (100)ms GEN0 started with pid=5, OS id=2520 DIAG started with pid=6, OS id=2522 ...etc...
The 10046 trace. Largely useless, isn’t it?
The other night I was sat in the pub with some like-minded individuals discussing the relative merits of the 10046 trace (we Rock! in the pub, dudes!) and somebody asked me how often I has actually used it in anger? A well-respected DBA / Architect maintained it was a pretty useless and difficult option to use, given the topology of modern applications (e.g. How do you find the correct session with all that connection pooling going on from multiple web servers.)
My answer surprised me – I thought back to one client where I spent 90% of my time performance tuning a large (TiB-ish) OLTP/Batch hybrid system and concluded that I had ran a 10046 against production about once a year. Once. So if the 10046 is the holy grail of plan information, why wasn’t I using it that much. And why did I never use a 10053 against Production there?
The answer for me is a little more complex than that given in the pub:
1. as stated above, it’s hard to catch the in-flight session unless the application is instrumented to inject the trace statement when needed (and how many applications are instrumented to help you discover problems? Screen ST03 in SAP is very helpful. Any others in major ERP’s? Thought not.)
2. In many places that I have worked, getting authorisation to make any a change to a 24×7 mission-critical system is highly bureaucratic, involving cast-iron justification for the change and it’s positive benefits, requirement that there will be no adverse effects because of the change, very senior sign-off, more red-tape, etc. This causes a significant amount of work simply to put a trace on, even if you can catch the SQL. This can end up being more work than actually fixing the problem.
3. An awful lot of SQL tuning is a fairly blunt affair, as the developer (who is frequently database-blind) has usually missed something obvious. It is frequently to do with incorrectly using or not using an index (or using a poor index), or lack of filtering data at the right point to minimise the I/O.
4. Most importantly, if you have AWR and ASH, it’s not really needed. For each plan created by the optimizer the database stores the bind variables along with it, so we can usually understand why the optimizer makes the decisions it makes. ASH contains the main event waits. Why bother trying to capture all of the detail in a trace when you really don’t need that much detail, and it’s all already there; ready to be extracted from the relevant tables (e.g. dba_hist_active_sess_history, dba_hist_sql_plan and dba_hist_sql_bind.)
I have never used a 10053 trace on a Production system. I have simply never needed to know the decisions taken by the optimizer in that much detail. Like most DBA’s and Oracle consultants, I don’t go from site-to-site on a weekly basis resolving edge-case problems that the incumbent DBA’s haven’t had the time, or possibly don’t have the skills, to resolve themselves. I usually don’t need that level of confirmation that I’m right about why the plan is wrong, and I don’t have the time to conclusively prove it over and over again – I just need to get the fix into place and move onto the next problem.
That said, perhaps I should get fully to the bottom of these problems to ensure that they never occur again – which is the fundamental problem with Adaptive Cursor Sharing.
Problems with RMAN and incarnations.
One day, not so very long ago, I was at a client site looking through the “passive” half of an AIX HACMP clustered server to tidy it up a little as we were experiencing pressure on space. There was a test database on there with a very large amount of historic archive logs. I thought it would be a good idea to check the database backups in RMAN and maybe do some tidying up through that route. This, it turned out, was not the most sensible thing I have done. The test database was a straight binary copy of the Production database. It had received no subsequent changes, especially the most important one from an RMAN perspective: the Database ID. Without a warning, RMAN immediately assumed that this database, with its more recent resetlogs and matching ID, was a new Incarnation of Production and promptly amended the catalog to that effect. Let’s just see that in action:
[oracle]$ export ORACLE_SID=PROD [oracle]$ rman target / catalog rman/rman@rman_db Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:01:41 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: PROD (DBID=1099918981) connected to recovery catalog database RMAN> list incarnation; List of Database Incarnations DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time ------- ------- -------- ---------------- --- ---------- ---------- 1 8 PROD 1099918981 PARENT 1 30-JUN-05 1 2 PROD 1099918981 CURRENT 446075 04-APR-10 RMAN> list backup summary; List of Backups =============== Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag ------- -- -- - ----------- --------------- ------- ------- ---------- --- 1737 B F A DISK 27-FEB-11 1 1 YES FULL BACKUP 1752 B F A DISK 27-FEB-11 1 1 NO TAG20110227T144855 RMAN> exit Recovery Manager complete. [oracle]$ export ORACLE_SID=TEST [oracle]$ rman target / catalog rman/rman@rman_db Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:23 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: TEST (DBID=1099918981) connected to recovery catalog database RMAN> list backup summary; new incarnation of database registered in recovery catalog starting full resync of recovery catalog full resync complete List of Backups =============== Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag ------- -- -- - ----------- --------------- ------- ------- ---------- --- 1737 B F A DISK 27-FEB-11 1 1 YES FULL BACKUP 1752 B F A DISK 27-FEB-11 1 1 NO TAG20110227T144855 RMAN> list incarnation; List of Database Incarnations DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time ------- ------- -------- ---------------- --- ---------- ---------- 1 8 PROD 1099918981 PARENT 1 30-JUN-05 1 2 TEST 1099918981 PARENT 446075 04-APR-10 1 1789 TEST 1099918981 CURRENT 3023938 27-FEB-11 RMAN> exit Recovery Manager complete. And lets see what happens when we go into RMAN for Production [oracle]$ export ORACLE_SID=PROD [oracle]$ rman target / catalog rman/rman@rman_db Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:59 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: PROD (DBID=1099918981) connected to recovery catalog database RMAN> list backup summary; RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of list command at 02/27/2011 21:03:04 RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog RMAN> list incarnation; RMAN> exit Recovery Manager complete.
So where does that leave me? With the current production datawarehouse unable to access RMAN as it’s not the right Incarnation. One quick look at the clock, and you know what time it is. 30 minutes to the start of a tight backup window, which will fail. It’s inevitable that this sort of thing never happens with 8 hours of free time to work out the best way to resolve the problem, but with scant time to sort it out with no impact on the Production system. After some thought, and some Google, it became apparent that the only solution was to hack manually edit the RMAN catalog to remove the new incarnation.
EDIT! Before trying the catalog mod below you should look at the My Oracle Support document 412113.1, and check out the rman commands:
RMAN> list incarnation; RMAN> reset database to incarnation <dbinc_key>; RMAN> resync catalog; RMAN> list incarnation;
OK. Proceed at your own risk!
To remove the bad incarnation record from the recovery catalog:
[oracle]$ sqlplus rman/rman@rman_db
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY DBID DBINC_KEY NAME RESETLOGS_CHANGE# RESETLOGS CUR PARENT_DB INC_KEY PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS ---------- ---------- ---------- -------- ----------------- --------- --- ---------------- ----------------------- --------- -------- 1 1099918981 8 PROD 1 30-JUN-05 NO PARENT 1 1099918981 2 PROD 446075 04-APR-10 NO 8 1 30-JUN-05 PARENT 1 1099918981 1789 TEST 3023938 27-FEB-11 YES 2 446075 04-APR-10 CURRENT
RMAN @ RMAN_DB > select * from db;
DB_KEY DB_ID HIGH_CONF_RECID LAST_KCCDIVTS HIGH_IC_RECID CURR_DBINC_KEY ---------- ---------- --------------- ------------- ------------- -------------- 1 1099918981 744219186 2 1789
RMAN @ RMAN_DB > update db set curr_dbinc_key = 2;
1 row updated.
RMAN @ RMAN_DB > delete from dbinc where dbinc_key = 1789;
1 row deleted.
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY DBID DBINC_KEY NAME RESETLOGS_CHANGE# RESETLOGS CUR PARENT_DB INC_KEY PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS ---------- ---------- ---------- -------- ----------------- --------- --- ---------------- ----------------------- --------- -------- 1 1099918981 8 PROD 1 30-JUN-05 NO PARENT 1 1099918981 2 PROD 446075 04-APR-10 YES 8 1 30-JUN-05 PARENT
RMAN @ RMAN_DB > commit;
Commit complete.
And let's see if we can use RMAN again...
[oracle]$ rman target / catalog rman/rman@rman_db
Recovery Manager: Release 10.2.0.1.0 - Production on Mon Mar 21 22:08:45 2011
Copyright (c) 1982, 2005, Oracle. All rights reserved.
connected to target database: PROD (DBID=1099918981) connected to recovery catalog database
RMAN> list backup summary;
starting full resync of recovery catalog full resync complete
List of Backups =============== Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag ------- -- -- - ----------- --------------- ------- ------- ---------- --- 1737 B F A DISK 27-FEB-11 1 1 YES TAG20110227T144639 1752 B F A DISK 27-FEB-11 1 1 NO TAG20110227T144855
And so, we are just about back where we started before some idiot messed up the RMAN catalog, and the backups work just fine. Now we need to change the dbid on the TEST database, using the nid command before another DBA does the same thing.
The last thing to do was to ensure that the recovery worked too.