Archive for the ‘Backups’ Category
Problem solving with DBMS_DATAPUMP
Yesterday I had a problem… I noticed that one of our overnight jobs was failing. The job in question is an ad-hoc tidy-up job, ran as needed when a performance metric for a specific query indicates that we’re going to be in trouble soon.
So, what was the problem? I have the stack trace captured in the job output:
Finished shrinks at 02-APR-2012 00:12 Creating job data pump export job LOG_EXPORT User:....... NEW_SCHEMA Tables:..... 'table-1','table-2' DECLARE * ERROR at line 1: ORA-31626: job does not exist ORA-06512: at "SYS.DBMS_SYS_ERROR", line 79 ORA-06512: at "SYS.DBMS_DATAPUMP", line 911 ORA-06512: at "SYS.DBMS_DATAPUMP", line 4356 ORA-06512: at line 22
So, “job does not exist”! What does that mean, and what could be causing this to happen? What has changed? Let’s look at the code at line 22 and see what we’re running:
.
.
v_handle := DBMS_DATAPUMP.OPEN ( operation => 'EXPORT',
job_mode => 'SCHEMA',
job_name => 'LOG_EXPORT');
.
.
And that’s it. We’re initiating a DBMS_DATAPUMP job (to extract the data to an external file before manipulation.)
So, what has changed? Since the last invokation of this job, we have had a new software release, and for this product that means a whole new schema. We’ve missed something. We must have a permissions problem within the database. Fortunately, we keep the old schema in place, locked, following an upgrade so all we need to do is compare the permissions for the old and new schemas and we’re done.
select granted_role from dba_role_privs where grantee = 'OLD_SCHEMA' minus select granted_role from dba_role_privs where grantee = 'NEW_SCHEMA' / no rows selected select privilege from dba_sys_privs where grantee = 'OLD_SCHEMA' minus select privilege from dba_sys_privs where grantee = 'NEW_SCHEMA' / no rows selected select table_name,privilege from dba_tab_privs where grantee = 'OLD_SCHEMA' minus select table_name,privilege from dba_tab_privs where grantee = 'NEW_SCHEMA' / TABLE_NAME PRIVILEGE ------------------------------ ---------------------------------------- DATA_PUMP_DIR READ DATA_PUMP_DIR WRITE
Ah ha! There’s the difference! And it is definitely data pump related, so let’s run the grants into the database and run the job again. This will fix it.
grant READ on DIRECTORY data_pump_dir to NEW_SCHEMA; grant WRITE on DIRECTORY data_pump_dir to NEW_SCHEMA; ... and run the job again... DECLARE * ERROR at line 1: ORA-31626: job does not exist ORA-06512: at "SYS.DBMS_SYS_ERROR", line 79 ORA-06512: at "SYS.DBMS_DATAPUMP", line 911 ORA-06512: at "SYS.DBMS_DATAPUMP", line 4356 ORA-06512: at line 22
What! Oh nuts. Why didn’t that work? [because you have not assigned a directory to DBMS_DATAPUMP yet! It'll be useful later though... Now stop jumping to conclusions! ed.] What are we going to do now? We know the permissions match. I’ve checked the O/S and there’s space in the filesystem. DBMS_DATAPUMP is not giving me enough information; what can I do to diagnose this problem further? I know! Why don’t I use datapump externally and see if that makes a difference…
expdp NEW_SCHEMA/..pwd..@..DB.. directory=data_pump_dir dumpfile=neilchandler.dmp job_name=neilchandler Export: Release 10.2.0.3.0 - 64bit Production on Tuesday, 03 April, 2012 12:54:27 Copyright (c) 2003, 2005, Oracle. All rights reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engine options ORA-31626: job does not exist ORA-31633: unable to create master table "NEW_SCHEMA.NEILCHANDLER" ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95 ORA-06512: at "SYS.KUPV$FT", line 863 ORA-01536: space quota exceeded for tablespace 'USERS'
Oh heck, we can’t write the parameter table as we have been denied a space quota on the default tablespace. D’oh! How painful was that to locate?
alter user NEW_SCHEMA quota unlimited on USERS;
And it all worked.
My question is, why couldn’t DBMS_DATAPUMP tell me that?
Datapump Failure
Did you know that Datapump creates a table on the fly? I had forgotten, but I suspect I won’t forget again after resolving a datapump failure.
We do a datapump export every night. Tonight it failed. Last night we had a kernel panic which caused server reboot at the same time as the usual datapump job was running, and so it didn’t clean itself up properly. So tonights job was failing as follows:
expdp userid='/ as sysdba' estimate_only=n status=0
parallel=3 directory=export_dir job_name=JOBNAME
dumpfile=ORASID.dmp logfile=ORASID.log FULL=y
Export: Release 10.2.0.3.0 - 64bit Production on Thursday, 24 March, 2011 5:15:36
Copyright (c) 2003, 2005, Oracle. All rights reserved.
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
ORA-31626: job does not exist
ORA-31633: unable to create master table "SYS.JOBNAME"
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPV$FT", line 863
ORA-00955: name is already used by an existing object
The key line is highlighted in red - an ORA-00955. The expdp creates a “temporary” table into which it writes lots of information about the export, which it removes at the end of the job.
If you look in the schema which you are using to perform the datapump, you will find a table with the same name as the job. This was left behind from the rather sudden system shutdown.
The solution?
drop table sys.jobname
and everything returns to normal.
It is worth considering job names for datapump – make sure there isn’t already an object within the database with that name or you’ll hit the same problem.
Time for a Mistake? Problems with RMAN and incarnations.
The title of this blog refers, I hope, to the following anecdote and not to my initial foray into the world of blogging. I hope that I will find the time, inclination and subject matter to make an interesting and useful resource for the Oracle, SQL Server, Unix, Linux and other technical communities. So here goes…
One day, not so very long ago, I was at a client site looking through the “passive” half of an AIX HACMP clustered server to tidy it up a little as we were experiencing pressure on space. There was a test database on there with a very large amount of historic archive logs. I thought it would be a good idea to check the database backups in RMAN and maybe do some tidying up through that route. This, it turned out, was not the most sensible thing I have done. The test database was a straight binary copy of the Production database. It had received no subsequent changes, especially the most important one from an RMAN perspective: the Database ID. Without a warning, RMAN immediately assumed that this database, with its more recent resetlogs and matching ID, was a new Incarnation of Production and promptly amended the catalog to that effect. Let’s just see that in action:
[oracle]$ export ORACLE_SID=PROD [oracle]$ rman target / catalog rman/rman@rman_db Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:01:41 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: PROD (DBID=1099918981) connected to recovery catalog database RMAN> list incarnation; List of Database Incarnations DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time ------- ------- -------- ---------------- --- ---------- ---------- 1 8 PROD 1099918981 PARENT 1 30-JUN-05 1 2 PROD 1099918981 CURRENT 446075 04-APR-10 RMAN> list backup summary; List of Backups =============== Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag ------- -- -- - ----------- --------------- ------- ------- ---------- --- 1737 B F A DISK 27-FEB-11 1 1 YES FULL BACKUP 1752 B F A DISK 27-FEB-11 1 1 NO TAG20110227T144855 RMAN> exit Recovery Manager complete. [oracle]$ export ORACLE_SID=TEST [oracle]$ rman target / catalog rman/rman@rman_db Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:23 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: TEST (DBID=1099918981) connected to recovery catalog database RMAN> list backup summary; new incarnation of database registered in recovery catalog starting full resync of recovery catalog full resync complete List of Backups =============== Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag ------- -- -- - ----------- --------------- ------- ------- ---------- --- 1737 B F A DISK 27-FEB-11 1 1 YES FULL BACKUP 1752 B F A DISK 27-FEB-11 1 1 NO TAG20110227T144855 RMAN> list incarnation; List of Database Incarnations DB Key Inc Key DB Name DB ID STATUS Reset SCN Reset Time ------- ------- -------- ---------------- --- ---------- ---------- 1 8 PROD 1099918981 PARENT 1 30-JUN-05 1 2 TEST 1099918981 PARENT 446075 04-APR-10 1 1789 TEST 1099918981 CURRENT 3023938 27-FEB-11 RMAN> exit Recovery Manager complete. And lets see what happens when we go into RMAN for Production [oracle]$ export ORACLE_SID=PROD [oracle]$ rman target / catalog rman/rman@rman_db Recovery Manager: Release 10.2.0.1.0 - Production on Sun Feb 27 21:02:59 2011 Copyright (c) 1982, 2005, Oracle. All rights reserved. connected to target database: PROD (DBID=1099918981) connected to recovery catalog database RMAN> list backup summary; RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of list command at 02/27/2011 21:03:04 RMAN-06004: ORACLE error from recovery catalog database: RMAN-20011: target database incarnation is not current in recovery catalog RMAN> list incarnation; RMAN> exit Recovery Manager complete.
So where does that leave me? With the current production datawarehouse unable to access RMAN as it’s not the right Incarnation. One quick look at the clock, and you know what time it is. 30 minutes to the start of a tight backup window, which will fail. It’s inevitable that this sort of thing never happens with 8 hours of free time to work out the best way to resolve the problem, but with scant time to sort it out with no impact on the Production system. After some thought, and some Google, it became apparent that the only solution was to hack manually edit the RMAN catalog to remove the new incarnation.
To remove the bad incarnation record from the recovery catalog:
[oracle]$ sqlplus rman/rman@rman_db
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY DBID DBINC_KEY NAME RESETLOGS_CHANGE# RESETLOGS CUR PARENT_DB INC_KEY PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS ---------- ---------- ---------- -------- ----------------- --------- --- ---------------- ----------------------- --------- -------- 1 1099918981 8 PROD 1 30-JUN-05 NO PARENT 1 1099918981 2 PROD 446075 04-APR-10 NO 8 1 30-JUN-05 PARENT 1 1099918981 1789 TEST 3023938 27-FEB-11 YES 2 446075 04-APR-10 CURRENT
RMAN @ RMAN_DB > select * from db;
DB_KEY DB_ID HIGH_CONF_RECID LAST_KCCDIVTS HIGH_IC_RECID CURR_DBINC_KEY ---------- ---------- --------------- ------------- ------------- -------------- 1 1099918981 744219186 2 1789
RMAN @ RMAN_DB > update db set curr_dbinc_key = 2;
1 row updated.
RMAN @ RMAN_DB > delete from dbinc where dbinc_key = 1789;
1 row deleted.
RMAN @ RMAN_DB > select * from rc_database_incarnation order by resetlogs_time;
DB_KEY DBID DBINC_KEY NAME RESETLOGS_CHANGE# RESETLOGS CUR PARENT_DB INC_KEY PRIOR_RESETLOGS_CHANGE# PRIOR_RES STATUS---------- ---------- ---------- -------- ----------------- --------- --- ---------------- ----------------------- --------- -------- 1 1099918981 8 PROD 1 30-JUN-05 NO PARENT 1 1099918981 2 PROD 446075 04-APR-10 YES 8 1 30-JUN-05 PARENTRMAN @ RMAN_DB > commit;
Commit complete.And let's see if we can use RMAN again...[oracle]$ rman target / catalog rman/rman@rman_dbRecovery Manager: Release 10.2.0.1.0 - Production on Mon Mar 21 22:08:45 2011Copyright (c) 1982, 2005, Oracle. All rights reserved.connected to target database: PROD (DBID=1099918981) connected to recovery catalog databaseRMAN> list backup summary;starting full resync of recovery catalog full resync completeList of Backups =============== Key TY LV S Device Type Completion Time #Pieces #Copies Compressed Tag ------- -- -- - ----------- --------------- ------- ------- ---------- --- 1737 B F A DISK 27-FEB-11 1 1 YES TAG20110227T144639 1752 B F A DISK 27-FEB-11 1 1 NO TAG20110227T144855And so, we are just about back where we started before some idiot messed up the RMAN catalog, and the backups work just fine. Now we need to change the dbid on the TEST database, using the nid command before another DBA does the same thing.
The last thing to do was to ensure that the recovery worked too, but that's for another blog.