Archive for the ‘Administration’ Category
The Grid Infrastructure filesystem was reporting that it was a bit full today (release 188.8.131.52). This was tracked down to the “crfclust.bdb” file, which records information about the cluster health for monitoring purposes. It was 26GB. It’s not supposed to get bigger than 1GB so this is probably a bug, but let’s explicitly resolve the size issue right now and search Oracle support later. Worst case, bdb (Berkerley Database) files get regenerated when CHM (ora.crf) resource is restarted. You only lose the (OS) statistics that CHM has gathered. Deleting bdb files does not have other impact. CHM will start collecting the OS statistics again.
df –h /u01
Filesystem Size Used Avail Use% Mounted on
/dev/sdc1 48G 36G 9.0G 81% /u01
-rw-r–r– 1 root root 2.1M Jul 22 12:12 22-JUL-2014-12:12:03.txt
-rw-r–r– 1 root root 1.3M Apr 23 14:28 23-APR-2014-14:28:04.txt
-rw-r–r– 1 root root 1.2M Apr 23 14:33 23-APR-2014-14:33:34.txt
-rw-r–r– 1 root root 1.3M Jul 23 12:53 23-JUL-2014-12:53:02.txt
-rw-r–r– 1 root root 946K Apr 26 03:57 26-APR-2014-03:57:21.txt
-rw-r—– 1 root root 492M Aug 26 10:33 crfalert.bdb
-rw-r—– 1 root root 26G Aug 26 10:33 crfclust.bdb <-26G!
-rw-r—– 1 root root 8.0K Jul 23 12:52 crfconn.bdb
-rw-r—– 1 root root 521M Aug 26 10:33 crfcpu.bdb
-rw-r—– 1 root root 513M Aug 26 10:33 crfhosts.bdb
-rw-r—– 1 root root 645M Aug 26 10:33 crfloclts.bdb
-rw-r—– 1 root root 418M Aug 26 10:33 crfts.bdb
-rw-r—– 1 root root 24K Aug 1 16:07 __db.001
-rw-r—– 1 root root 392K Aug 26 10:33 __db.002
-rw-r—– 1 root root 2.6M Aug 26 10:33 __db.003
-rw-r—– 1 root root 2.1M Aug 26 10:34 __db.004
-rw-r—– 1 root root 1.2M Aug 26 10:33 __db.005
-rw-r—– 1 root root 56K Aug 26 10:34 __db.006
-rw-r—– 1 root root 16M Aug 26 10:17 log.0000008759
-rw-r—– 1 root root 16M Aug 26 10:33 log.0000008760
-rw-r—– 1 root root 8.0K Aug 26 10:33 repdhosts.bdb
-rw-r–r– 1 root root 115M Jul 22 12:12 node01.ldb
Lets see how big the repository is…
oclumon manage -get repsize
CHM Repository Size = 1073736016
Wow. Seems a bit oversized. Change the repository size to the desired number of seconds, between 3600 (1 hour) and 259200 (3 days)
oclumon manage -repos resize 259200
node01 –> retention check successful
node02 –> retention check successful
New retention is 259200 and will use 4524595200 bytes of disk space
CRS-9115-Cluster Health Monitor repository size change completed on all nodes.
If we now check the size, we get an error as the repository is bigger than the max allowed size.
oclumon manage -get resize
CRS-9011-Error manage: Failed to initialize connection to the Cluster Logger Service
So we need to stop and start the ora.crf service to get everything working again. It should be OK to do this on a running system with no impact, but I’d start with your sandpit to test it. Don’t take my word for it!
Check for process:
node01:/u01/app/11g/grid/bin>ps -ef |grep crf
root 26983 1 0 10:44 ? 00:00:00 /u01/app/11g/grid/bin/ologgerd -m node02 -r -d /u01/app/11g/grid/crf/db/node01
node01:/u01/app/11g/grid/bin>crsctl stop res ora.crf -init
CRS-2673: Attempting to stop ‘ora.crf’ on ‘node01′
CRS-2677: Stop of ‘ora.crf’ on ‘node01′ succeeded
node01:/u01/app/11g/grid/bin>crsctl start res ora.crf -init
CRS-2672: Attempting to start ‘ora.crf’ on ‘node01′
CRS-2676: Start of ‘ora.crf’ on ‘node01′ succeeded
Check for Process:
node01:/u01/app/11g/grid/bin>ps -ef |grep crf
root 28000 1 5 10:49 ? 00:00:00 /u01/app/11g/grid/bin/ologgerd -m node02 -r -d /u01/app/11g/grid/crf/db/node01
Check the size – as specified:
node01:/u01/app/11g/grid/bin>oclumon manage -get repsize
CHM Repository Size = 259200
And the space is released and reclaimed.
node01:/u01/app/11g/grid/bin>df –h /u01
Filesystem Size Used Avail Use% Mounted on
/dev/sdc1 48G 7.7G 38G 18% /u01
The space has been returned. Marvellous.
Now repeat the stop/start on each node.
UPDATE: From Oracle Support: Having very large bdb files (greater than 2GB) is likely due to a bug since the default size limits the bdb to 1GB unless the CHM data retention time is increased. One such bug is 10165314.
Sometimes we occasionally just miss the obvious, for years. Just noticed that an easy way to release code to a particular schema is to login as your normal DBA user (USER1) [as preferred by audit], use the alter session command to switch to point to the release schema (USER2) and run your DDL. Oracle behaves, from an object-owner perspective, as if you are logged-in as the schema owner without all the negative aspects of actually having to login as the schema owner or prefix all object with the schema owner.
This is particularly useful when you have multiple schemas within the database and you are releasing to more than one schema at once.
sqlplus USER1@ORCL SQL*Plus: Release 184.108.40.206.0 Production on Wed Sep 17 12:17:36 2014 Copyright (c) 1982, 2010, Oracle. All rights reserved. Enter password: Connected to: Oracle Database 11g Enterprise Edition Release 220.127.116.11.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options 12:17:38 USER1@ORCL>select owner,table_name from dba_tables where table_name = 'N1'; no rows selected 12:17:46 USER1@ORCL>create table n1 (c1 date); Table created. 12:17:56 USER1@ORCL>select owner,table_name from dba_tables where table_name = 'N1'; OWNER TABLE_NAME ------------------------------ ------------------------------ USER1 N1 12:17:57 USER1@ORCL>alter session set current_schema=USER2; Session altered. 12:18:08 USER1@ORCL>create table n1 (c1 date); Table created. 12:18:11 USER1@ORCL>select owner,table_name from dba_tables where table_name = 'N1'; OWNER TABLE_NAME ------------------------------ ------------------------------ USER2 N1 USER1 N1
Now that you have got your audit table somewhere a little more sensible (i.e. not in the SYSTEM tablespace), there’s probably a policy about how many audit records should be kept.
Thoughtfully, the DBMS_AUDIT_MGMT package provides some of what you need to keep the audit records in check. However, a little more thought by Oracle would have helped. Lets see what I mean.
First we need to initialise for audit control. You can check to see if this has already been done as follows:
SET SERVEROUTPUT ON BEGIN IF DBMS_AUDIT_MGMT.IS_CLEANUP_INITIALIZED(DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD) THEN DBMS_OUTPUT.PUT_LINE('AUD$ is initialized for cleanup'); ELSE DBMS_OUTPUT.PUT_LINE('AUD$ is not initialized for cleanup.'); END IF; END; /
NOTE: To do this for Fine-Grained auditing, you need to use the constant DBMS_AUDIT_MGMT.AUDIT_TRAIL_FGA_STD instead, and check on table FGA_LOG$.
If cleanup is not initialised, you need to set it up as follows:
BEGIN DBMS_AUDIT_MGMT.INIT_CLEANUP (AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD, DEFAULT_CLEANUP_INTERVAL => 999 ); END; /
Some VERY important things to note here:
- The DEFAULT_CLEANUP_INTERVAL doesn’t do anything (up to and including Rel 12.1). It’s for “future use”, apparently. However, if it is not specified, it has been associated with bugs in relation to cleanup to last timestamp not working and not cleaning anything up.
- If you have not already moved the audit tables AUD$ / FGA_LOG$ out of the SYSTEM tablespace, to any other tablespace this will move them for you, right now, whether desired or not, into SYSAUX.
- If you DEINIT_CLEANUP, it does not move the tables back to SYSTEM.
OK, we are initialised. We could call the creation of a purge job, which will wipe out all of our audit records (every 24 hours in this example), but that would be an unlikely requirement.
BEGIN DBMS_AUDIT_MGMT.CREATE_PURGE_JOB ( AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD, AUDIT_TRAIL_PURGE_INTERVAL => 24, AUDIT_TRAIL_PURGE_NAME => 'Purge_AUD$', USE_LAST_ARCH_TIMESTAMP => FALSE ); END; /
It’s MORE likely we want to wipe out the last N days worth of records. To do this we need to set the point (LAST_ARCHIVE_TIMESTAMP) from which we want to retain records and wipe out everything before that. So lets set for a 30 day retention.
BEGIN DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP( AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD, LAST_ARCHIVE_TIME => systimestamp-30, RAC_INSTANCE_NUMBER => 1 ); END; /
And check it
select * from DBA_AUDIT_MGMT_LAST_ARCH_TS; AUDIT_TRAIL RAC_INSTANCE LAST_ARCHIVE_TS ---------------------- ------------ ------------------------------------ STANDARD AUDIT TRAIL 0 17-MAY-14 11.00:01.000000 PM +00:00
Excellent. Now we create a job as before with “USE_LAST_ARCH_TIMESTAMP => TRUE” and all is good, EXCEPT that nothing is moving the timestamp forward.
The job will be called, purge the old records and that’s it. When it is next invoked, the timestamp will not have moved on. We therefore need another job to move the timestamp on… SO why bother setting up a job with these automatic routines if it doesn’t automate all of the requirement? Bit annoying that. I just create my own scheduled job with 2 calls, and forget the built-in (half a) job aspect of the management system:
BEGIN DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP( AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD, LAST_ARCHIVE_TIME => systimestamp-30, RAC_INSTANCE_NUMBER => 1 ); DBMS_AUDIT_MGMT.clean_audit_trail( audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD, use_last_arch_timestamp => TRUE); END; /
OK, the audit management system is pretty good; it deletes in batches, it works well, and it doesn’t need much from Oracle to make it much better. 7/10. Good but could do better.
You need to check out the associated views which show you the basic system config and what’s going on:
DBA_AUDIT_MGMT_CLEAN_EVENTS Displays the cleanup event history DBA_AUDIT_MGMT_CLEANUP_JOBS Displays the currently configured audit trail purge jobs DBA_AUDIT_MGMT_CONFIG_PARAMS Displays the currently configured audit trail properties DBA_AUDIT_MGMT_LAST_ARCH_TS Displays the last archive timestamps set for the audit trails
OK – that should keep things nice and tidy in the database. What about the audit files on the OS? Find out about that in Part 3.
One of the oldest problems with the Auditing capabilities within Oracle is that the SYS.AUD$ table resides in the SYSTEM tablespace. Unless you are rigorous in ensuring that your audit records are routinely pruned to keep the table manageable, it can single-handedly make the SYSTEM tablespace enormous.
Historically, we used to move the table and its associated objects to a new tablespace ourselves. In Oracle 7 is was a drop and re-create. Later we performed an alter table … move; command, coupled with an alter index rebuild. However, some bits frequently got left behind doing this…
In Oracle 10, a new package appeared: DBMS_AUDIT_MGMT. The procedure SET_AUDIT_TRAIL_LOCATION allowed you to move the table to a new tablespace. It didn’t work properly. It didn’t move indexes or LOB segments, and shouldn’t be used. However, roll on Oracle 11 and the (obvious) bugs have been ironed-out.
First of all, moving the table (NOTE: If the table is big, this may take quite a while. Only do this at a period of low system activity to avoid potential locking issues at the start and end of the move):
BEGIN DBMS_AUDIT_MGMT.SET_AUDIT_TRAIL_LOCATION( AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD, AUDIT_TRAIL_LOCATION_VALUE => 'SYSAUX'); END; /
This works a treat in Oracle 11 and 12 for the standard audit trail, and for fine-grained auditing. It successfully moved every object associated with SYS.AUD$.
select owner,table_name,tablespace_name from dba_tables where table_name = 'AUD$' OWNER TABLE_NAME TABLESPACE_NAME ------------------------------ ------------------------------ -------------------------- SYS AUD$ SYSAUX
select owner,table_name,tablespace_name from dba_lobs where table_name = 'AUD$' OWNER TABLE_NAME TABLESPACE_NAME ------------------------------ ------------------------------ -------------------------- SYS AUD$ SYSAUX SYS AUD$ SYSAUX select owner,table_name,tablespace_name from dba_indexes where table_name = 'AUD$' OWNER TABLE_NAME TABLESPACE_NAME ------------------------------ ------------------------------ -------------------------- SYS AUD$ SYSAUX SYS AUD$ SYSAUX
WARNING! Oracle still say that AUD$ should be in the system tablespace for upgrades. I can’t find anything that supercedes that, despite moving the table now being supported by an official package that works.
These are a bit of a pain as their location isn’t intuitive. So to remind me where they are:
Login to the server(s) as the grid owner and check the scan listener status. This will show you the location of the listener log. cd to just below the diag directory and you’re off!:
server-name:/u01/grid>ps -ef | grep SCAN grid 8542 8282 0 10:20 pts/0 00:00:00 grep SCAN grid 9349 1 0 Mar07 ? 00:07:33 /u01/app/11g/grid/bin/tnslsnr LISTENER_SCAN1 -inherit
server-name:/u01/grid>lsnrctl status LISTENER_SCAN1
LSNRCTL for Linux: Version 18.104.22.168.0 - Production on 28-MAY-2014 10:20:12
Copyright (c) 1991, 2013, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1))) STATUS of the LISTENER ------------------------ Alias LISTENER_SCAN1 Version TNSLSNR for Linux: Version 22.214.171.124.0 - Production Start Date 07-MAR-2014 17:27:50 Uptime 81 days 15 hr. 52 min. 21 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /u01/app/11g/grid/network/admin/listener.ora Listener Log File /u01/app/11g/grid/log/diag/tnslsnr/server-name/listener_scan1/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.6.148.141)(PORT=1521))) Services Summary... Service "FRONT_APP_DB_SVC.WORLD" has 2 instance(s). Instance "ORCL1", status READY, has 1 handler(s) for this service... Instance "ORCL2", status READY, has 1 handler(s) for this service... Service "ORCL.WORLD" has 2 instance(s). Instance "ORCL1", status READY, has 1 handler(s) for this service... Instance "ORCL2", status READY, has 1 handler(s) for this service... Service "ORCLXDB.WORLD" has 2 instance(s). Instance "ORCL1", status READY, has 1 handler(s) for this service... Instance "ORCL2", status READY, has 1 handler(s) for this service... The command completed successfully
ADRCI: Release 126.96.36.199.0 - Production on Wed May 28 10:22:47 2014
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
ADR base = "/u01/app/11g/grid/log" adrci> show homes ADR Homes: diag/asmcmd/user_grid/server-name diag/tnslsnr/server-name/listener_scan3 diag/tnslsnr/server-name/listener_scan2 diag/tnslsnr/server-name/listener_scan1 diag/asmtool/user_root/host_3797755080_80
adrci> show alert
Choose the alert log from the following homes to view:
1: diag/asmcmd/user_grid/server-name 2: diag/tnslsnr/server-name/listener_scan3 3: diag/tnslsnr/server-name/listener_scan2 4: diag/tnslsnr/server-name/listener_scan1 5: diag/asmtool/user_root/host_3797755080_80 Q: to quit
Please select option: 4
and there we are…
remember to be on the correct host for each scan,
otherwise the alert (listener) log file will be out of date.
When you end up spending a far great a percentage of your day than seems sensible killing off Java connections that Developers have carelessly left lying around, locking objects all over the place, you need a solution to get them to go away. The solution is to let them do it themselves!
I’m not advocating granting ALTER SYSTEM to Developers! That way madness lies, or certainly some unintended consequences. I’m all for Dev’s having a lot of freedom in the database, just not freedom with the database.
So, creating a stored procedure (in this example as sys, but as any user with an explicit ALTER SYSTEM privilege granted will do) to kill sessions without allowing too much latitude to do anything else seems appropriate. Here’s one I built earlier:
create or replace procedure sys.kill_session ( p_sid IN number, p_serial IN number, p_instance IN number) as -- Neil Chandler. Grant the ability to kill session on a restricted basis. 21.07.2010 l_username varchar2(30) := null; l_priv number := 1; begin -- Who owns the session? select username into l_username from gv$session where sid = p_sid and serial#=p_serial and inst_id = p_instance; -- Check for DBA role select count(*) into l_priv from dba_role_privs where grantee = l_username and granted_role = 'DBA'; -- If the user has the DBA priv, deny the kill request if l_priv > 0 or l_username is null then dbms_output.put_line ('User request to kill session '||p_sid||','||p_serial||',@'||p_instance|| ' denied. Session is for privileged user '||l_username||'.'); else dbms_output.put_line ('Killing user '||l_username||' - '||p_sid||','||p_serial||',@'||p_instance); execute immediate 'alter system disconnect session '''|| p_sid||','||p_serial||',@'||p_instance|| ''' immediate'; end if; end; / -- and let the proc be seen and used create or replace public synonym kill_session for sys.kill_session; grant execute on kill_session to (whomever); Then a nifty bit of sql to generate the kill commands for the Developers. Please include your favourite columns from gv$session: select username,status,blocking_session, 'exec kill_session ('|| sid||','||serial#||','||inst_id||')' Kill_Command from gv$session where username is not null and type <> 'BACKGROUND' / USERNAME STATUS BLOCKING_SESSION KILL_COMMAND --------- ------- ---------------- ------------------------------------ SYS ACTIVE exec kill_session (31,65,1) SYSTEM INACTIVE exec kill_session (58,207,1) USER_1 INACTIVE exec kill_session (59,404,1) USER_2 INACTIVE exec kill_session (72,138,1) USER_2 INACTIVE exec kill_session (46,99,2)
May the odds be forever in your favour. Let the killing commence…
When installing Oracle Grid Infrastructure 11.2 (and all other releases), you need to make sure that you have all of the server setting correct and to standard before you do the install. One that bit me recently was the timezone setting. The Red Hat 6.4 server(s) in question has the correct file in /etc/localtime (copied from /usr/share/zoneinfo/whatever). If I type in date, I get the reply in the correct timezone (GMT/BST as I’m in London), so all seems correct.
However, the slack Unix Sysadmin (which might or might not have been me) had not put the correct setting in /etc/sysconfig/clock. Unfortunately, when you install Grid Infrastructure, the setting is read from /etc/sysconfig/clock and embedded into a Grid Inforastructure config file. $GRID_HOME/crs/install/s_crsconfig_hostname_env.txt
### This file can be used to modify the NLS_LANG environment variable, which determines the charset to be used for messages.
### For example, a new charset can be configured by setting NLS_LANG=JAPANESE_JAPAN.UTF8
### Do not modify this file except to change NLS_LANG, or under the direction of Oracle Support Services
If you change this entry, and you should check with Oracle Support if this is OK for your site, and you will need to restart Grid Infrastructure. The one thing about this that I really don’t like is that Oracle is storing a runtime configuration file in a an install directory. Does it do that anywhere else?