Archive for March 2011
Datapump Failure
Did you know that Datapump creates a table on the fly? I had forgotten, but I suspect I won’t forget again after resolving a datapump failure.
We do a datapump export every night. Tonight it failed. Last night we had a kernel panic which caused server reboot at the same time as the usual datapump job was running, and so it didn’t clean itself up properly. So tonights job was failing as follows:
expdp userid='/ as sysdba' estimate_only=n status=0
parallel=3 directory=export_dir job_name=JOBNAME
dumpfile=ORASID.dmp logfile=ORASID.log FULL=y
Export: Release 10.2.0.3.0 - 64bit Production on Thursday, 24 March, 2011 5:15:36
Copyright (c) 2003, 2005, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORA-31626: job does not exist
ORA-31633: unable to create master table "SYS.JOBNAME"
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPV$FT", line 863
ORA-00955: name is already used by an existing object
The key line is highlighted in red - an ORA-00955. The expdp creates a “temporary” table into which it writes lots of information about the export, which it removes at the end of the job.
If you look in the schema which you are using to perform the datapump, you will find a table with the same name as the job. This was left behind from the rather sudden system shutdown.
The solution?
drop table sys.jobname
and everything returns to normal.
It is worth considering job names for datapump – make sure there isn’t already an object within the database with that name or you’ll hit the same problem.
Problems with RMAN and incarnations.
One day, not so very long ago, I was at a client site looking through the “passive” half of an AIX HACMP clustered server to tidy it up a little as we were experiencing pressure on space. There was a test database on there with a very large amount of historic archive logs. I thought it would be a good idea to check the database backups in RMAN and maybe do some tidying up through that route. This, it turned out, was not the most sensible thing I have done. The test database was a straight binary copy of the Production database. It had received no subsequent changes, especially the most important one from an RMAN perspective: the Database ID. Without a warning, RMAN immediately assumed that this database, with its more recent resetlogs and matching ID, was a new Incarnation of Production and promptly amended the catalog to that effect. Let’s just see that in action:
[oracle]$ export ORACLE_SID=PROD [oracle]$ rman target / catalog rman/rman@rman_db Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:01:41 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: PROD (DBID=1099918981) connected to recovery catalog database RMAN> list incarnation; List of Database Incarnations DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time ------- ------- -------- ---------------- --- ---------- ---------- 1 8 PROD 1099918981 PARENT 1 30-JUN-05 1 2 PROD 1099918981 CURRENT 446075 04-APR-10 RMAN> list backup summary; List of Backups =============== Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag ------- -- -- - ----------- --------------- ------- ------- ---------- --- 1737 B F A DISK 27-FEB-11 1 1 YES FULL BACKUP 1752 B F A DISK 27-FEB-11 1 1 NO TAG20110227T144855 RMAN> exit Recovery Manager complete. [oracle]$ export ORACLE_SID=TEST [oracle]$ rman target / catalog rman/rman@rman_db Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:23 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: TEST (DBID=1099918981) connected to recovery catalog database RMAN> list backup summary; new incarnation of database registered in recovery catalog starting full resync of recovery catalog full resync complete List of Backups =============== Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag ------- -- -- - ----------- --------------- ------- ------- ---------- --- 1737 B F A DISK 27-FEB-11 1 1 YES FULL BACKUP 1752 B F A DISK 27-FEB-11 1 1 NO TAG20110227T144855 RMAN> list incarnation; List of Database Incarnations DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time ------- ------- -------- ---------------- --- ---------- ---------- 1 8 PROD 1099918981 PARENT 1 30-JUN-05 1 2 TEST 1099918981 PARENT 446075 04-APR-10 1 1789 TEST 1099918981 CURRENT 3023938 27-FEB-11 RMAN> exit Recovery Manager complete. And lets see what happens when we go into RMAN for Production [oracle]$ export ORACLE_SID=PROD [oracle]$ rman target / catalog rman/rman@rman_db Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:59 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: PROD (DBID=1099918981) connected to recovery catalog database RMAN> list backup summary; RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of list command at 02/27/2011 21:03:04 RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog RMAN> list incarnation; RMAN> exit Recovery Manager complete.
So where does that leave me? With the current production datawarehouse unable to access RMAN as it’s not the right Incarnation. One quick look at the clock, and you know what time it is. 30 minutes to the start of a tight backup window, which will fail. It’s inevitable that this sort of thing never happens with 8 hours of free time to work out the best way to resolve the problem, but with scant time to sort it out with no impact on the Production system. After some thought, and some Google, it became apparent that the only solution was to hack manually edit the RMAN catalog to remove the new incarnation.
EDIT! Before trying the catalog mod below you should look at the My Oracle Support document 412113.1, and check out the rman commands:
RMAN> list incarnation; RMAN> reset database to incarnation <dbinc_key>; RMAN> resync catalog; RMAN> list incarnation;
OK. Proceed at your own risk!
To remove the bad incarnation record from the recovery catalog:
[oracle]$ sqlplus rman/rman@rman_db
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY DBID DBINC_KEY NAME RESETLOGS_CHANGE# RESETLOGS CUR PARENT_DB INC_KEY PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS ---------- ---------- ---------- -------- ----------------- --------- --- ---------------- ----------------------- --------- -------- 1 1099918981 8 PROD 1 30-JUN-05 NO PARENT 1 1099918981 2 PROD 446075 04-APR-10 NO 8 1 30-JUN-05 PARENT 1 1099918981 1789 TEST 3023938 27-FEB-11 YES 2 446075 04-APR-10 CURRENT
RMAN @ RMAN_DB > select * from db;
DB_KEY DB_ID HIGH_CONF_RECID LAST_KCCDIVTS HIGH_IC_RECID CURR_DBINC_KEY ---------- ---------- --------------- ------------- ------------- -------------- 1 1099918981 744219186 2 1789
RMAN @ RMAN_DB > update db set curr_dbinc_key = 2;
1 row updated.
RMAN @ RMAN_DB > delete from dbinc where dbinc_key = 1789;
1 row deleted.
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY DBID DBINC_KEY NAME RESETLOGS_CHANGE# RESETLOGS CUR PARENT_DB INC_KEY PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS ---------- ---------- ---------- -------- ----------------- --------- --- ---------------- ----------------------- --------- -------- 1 1099918981 8 PROD 1 30-JUN-05 NO PARENT 1 1099918981 2 PROD 446075 04-APR-10 YES 8 1 30-JUN-05 PARENT
RMAN @ RMAN_DB > commit;
Commit complete.
And let's see if we can use RMAN again...
[oracle]$ rman target / catalog rman/rman@rman_db
Recovery Manager: Release 10.2.0.1.0 - Production on Mon Mar 21 22:08:45 2011
Copyright (c) 1982, 2005, Oracle. All rights reserved.
connected to target database: PROD (DBID=1099918981) connected to recovery catalog database
RMAN> list backup summary;
starting full resync of recovery catalog full resync complete
List of Backups =============== Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag ------- -- -- - ----------- --------------- ------- ------- ---------- --- 1737 B F A DISK 27-FEB-11 1 1 YES TAG20110227T144639 1752 B F A DISK 27-FEB-11 1 1 NO TAG20110227T144855
And so, we are just about back where we started before some idiot messed up the RMAN catalog, and the backups work just fine. Now we need to change the dbid on the TEST database, using the nid command before another DBA does the same thing.
The last thing to do was to ensure that the recovery worked too.