Neil Chandler's DBA Blog

A resource for Database Professionals

Archive for the ‘Administration’ Category

Grid Infrastructure Disk Space Problem – CHM DB file: crfclust.bdb

leave a comment »

The Grid Infrastructure filesystem was reporting that it was a bit full today (release 11.2.0.4). This was tracked down to the “crfclust.bdb” file, which records information about the cluster health for monitoring purposes. It was 26GB. It’s not supposed to get bigger than 1GB so this is probably a bug, but let’s explicitly resolve the size issue right now and search Oracle support later. Worst case, bdb (Berkerley Database) files get regenerated when CHM (ora.crf) resource is restarted.  You only lose the (OS) statistics that CHM has gathered. Deleting bdb files does not have other impact.  CHM will start collecting the OS statistics again.

 

df –h /u01

Filesystem                Size  Used Avail Use% Mounted on
/dev/sdc1                  48G   36G  9.0G  81% /u01

pwd
/u01/app/11g/grid/crf/db/node01

ls -lh
total 29G

-rw-r–r– 1 root root 2.1M Jul 22 12:12 22-JUL-2014-12:12:03.txt
-rw-r–r– 1 root root 1.3M Apr 23 14:28 23-APR-2014-14:28:04.txt
-rw-r–r– 1 root root 1.2M Apr 23 14:33 23-APR-2014-14:33:34.txt
-rw-r–r– 1 root root 1.3M Jul 23 12:53 23-JUL-2014-12:53:02.txt
-rw-r–r– 1 root root 946K Apr 26 03:57 26-APR-2014-03:57:21.txt
-rw-r—– 1 root root 492M Aug 26 10:33 crfalert.bdb
-rw-r—– 1 root root  26G Aug 26 10:33 crfclust.bdb   <-26G!
-rw-r—– 1 root root 8.0K Jul 23 12:52 crfconn.bdb
-rw-r—– 1 root root 521M Aug 26 10:33 crfcpu.bdb
-rw-r—– 1 root root 513M Aug 26 10:33 crfhosts.bdb
-rw-r—– 1 root root 645M Aug 26 10:33 crfloclts.bdb
-rw-r—– 1 root root 418M Aug 26 10:33 crfts.bdb
-rw-r—– 1 root root  24K Aug  1 16:07 __db.001
-rw-r—– 1 root root 392K Aug 26 10:33 __db.002
-rw-r—– 1 root root 2.6M Aug 26 10:33 __db.003
-rw-r—– 1 root root 2.1M Aug 26 10:34 __db.004
-rw-r—– 1 root root 1.2M Aug 26 10:33 __db.005
-rw-r—– 1 root root  56K Aug 26 10:34 __db.006
-rw-r—– 1 root root  16M Aug 26 10:17 log.0000008759
-rw-r—– 1 root root  16M Aug 26 10:33 log.0000008760
-rw-r—– 1 root root 8.0K Aug 26 10:33 repdhosts.bdb
-rw-r–r– 1 root root 115M Jul 22 12:12 node01.ldb

Lets see how big the repository is…

oclumon manage -get repsize
CHM Repository Size = 1073736016

Wow.  Seems a bit oversized. Change the repository size to the desired number of seconds, between 3600 (1 hour) and 259200 (3 days)

oclumon manage -repos resize 259200

node01 –> retention check successful
node02 –> retention check successful

New retention is 259200 and will use 4524595200 bytes of disk space
CRS-9115-Cluster Health Monitor repository size change completed on all nodes.

If we now check the size, we get an error as the repository is bigger than the max allowed size.

oclumon manage -get resize
CRS-9011-Error manage: Failed to initialize connection to the Cluster Logger Service

So we need to stop and start the ora.crf service to get everything working again. It should be OK to do this on a running system with no impact, but I’d start with your sandpit to test it. Don’t take my word for it!

Check for process:

node01:/u01/app/11g/grid/bin>ps -ef |grep crf
root     26983     1  0 10:44 ?        00:00:00 /u01/app/11g/grid/bin/ologgerd -m node02 -r -d /u01/app/11g/grid/crf/db/node01

Stop service:
node01:/u01/app/11g/grid/bin>crsctl stop res ora.crf -init

CRS-2673: Attempting to stop ‘ora.crf’ on ‘node01′
CRS-2677: Stop of ‘ora.crf’ on ‘node01′ succeeded

Start Service:
node01:/u01/app/11g/grid/bin>crsctl start res ora.crf -init
CRS-2672: Attempting to start ‘ora.crf’ on ‘node01′
CRS-2676: Start of ‘ora.crf’ on ‘node01′ succeeded

Check for Process:
node01:/u01/app/11g/grid/bin>ps -ef  |grep crf
root     28000     1  5 10:49 ?        00:00:00 /u01/app/11g/grid/bin/ologgerd -m node02 -r -d /u01/app/11g/grid/crf/db/node01

Check the size – as specified:
node01:/u01/app/11g/grid/bin>oclumon manage -get repsize

CHM Repository Size = 259200

Done

And the space is released and reclaimed.

node01:/u01/app/11g/grid/bin>df –h /u01

Filesystem                Size  Used Avail Use% Mounted on
/dev/sdc1                  48G  7.7G   38G  18% /u01

The space has been returned. Marvellous.
Now repeat the stop/start on each node.

 

UPDATE: From Oracle Support: Having very large bdb files (greater than 2GB) is likely due to a bug since the default size limits the bdb to 1GB unless the CHM data retention time is increased.  One such bug is 10165314.

Written by Neil Chandler

29th September 2014 at 06:43

Releasing to schemas the easy way

leave a comment »

Sometimes we occasionally just miss the obvious, for years. Just noticed that an easy way to release code to a particular schema is to login as your normal DBA user (USER1) [as preferred by audit], use the alter session command to switch to point to the release schema (USER2) and run your DDL. Oracle behaves, from an object-owner perspective, as if you are logged-in as the schema owner without all the negative aspects of actually having to login as the schema owner or prefix all object with the schema owner.

This is particularly useful when you have multiple schemas within the database and you are releasing to more than one schema at once.

 

e.g.

sqlplus USER1@ORCL

SQL*Plus: Release 11.2.0.1.0 Production on Wed Sep 17 12:17:36 2014

Copyright (c) 1982, 2010, Oracle. All rights reserved.

Enter password:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

12:17:38 USER1@ORCL>select owner,table_name from dba_tables where table_name = 'N1';

no rows selected

12:17:46 USER1@ORCL>create table n1 (c1 date);

Table created.

12:17:56 USER1@ORCL>select owner,table_name from dba_tables where table_name = 'N1';

OWNER                          TABLE_NAME
------------------------------ ------------------------------
USER1                          N1

12:17:57 USER1@ORCL>alter session set current_schema=USER2;

Session altered.

12:18:08 USER1@ORCL>create table n1 (c1 date);

Table created.

12:18:11 USER1@ORCL>select owner,table_name from dba_tables where table_name = 'N1';

OWNER                          TABLE_NAME
------------------------------ ------------------------------
USER2                          N1
USER1                          N1

 

 

Written by Neil Chandler

28th September 2014 at 22:46

Posted in Administration, audit, SCHEMA

Tagged with , , ,

Oracle Audit Control part 2 – purging records

leave a comment »

Now that you have got your audit table somewhere a little more sensible (i.e. not in the SYSTEM tablespace), there’s probably a policy about how many audit records should be kept.

Thoughtfully, the DBMS_AUDIT_MGMT package provides some of what you need to keep the audit records in check. However, a little more thought by Oracle would have helped. Lets see what I mean.

 

First we need to initialise for audit control. You can check to see if this has already been done as follows:

SET SERVEROUTPUT ON
 BEGIN
 IF
  DBMS_AUDIT_MGMT.IS_CLEANUP_INITIALIZED(DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD)
 THEN
  DBMS_OUTPUT.PUT_LINE('AUD$ is initialized for cleanup');
 ELSE
  DBMS_OUTPUT.PUT_LINE('AUD$ is not initialized for cleanup.');
 END IF;
END;
/

NOTE: To do this for Fine-Grained auditing, you need to use the constant DBMS_AUDIT_MGMT.AUDIT_TRAIL_FGA_STD instead, and check on table FGA_LOG$.

If cleanup is not initialised, you need to set it up as follows:

BEGIN
DBMS_AUDIT_MGMT.INIT_CLEANUP
 (AUDIT_TRAIL_TYPE         => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
  DEFAULT_CLEANUP_INTERVAL => 999 );
END;
/

Some VERY important things to note here:

  1. The DEFAULT_CLEANUP_INTERVAL doesn’t do anything (up to and including Rel 12.1). It’s for “future use”, apparently. However, if it is not specified, it has been associated with bugs in relation to cleanup to last timestamp not working and not cleaning anything up.
  2. If you have not already moved the audit tables AUD$ / FGA_LOG$ out of the SYSTEM tablespace, to any other tablespace this will move them for you, right now, whether desired or not, into SYSAUX.
  3. If you DEINIT_CLEANUP, it does not move the tables back to SYSTEM.

 

OK, we are initialised. We could call the creation of a purge job, which will wipe out all of our audit records (every 24 hours in this example), but that would be an unlikely requirement.

BEGIN
 DBMS_AUDIT_MGMT.CREATE_PURGE_JOB (
 AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
 AUDIT_TRAIL_PURGE_INTERVAL => 24,
 AUDIT_TRAIL_PURGE_NAME => 'Purge_AUD$',
 USE_LAST_ARCH_TIMESTAMP => FALSE );
END;
/

 

It’s MORE likely we want to wipe out the last N days worth of records. To do this we need to set the point (LAST_ARCHIVE_TIMESTAMP) from which we want to retain records and wipe out everything before that. So lets set for a 30 day retention.

BEGIN
 DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(
 AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
 LAST_ARCHIVE_TIME => systimestamp-30,
 RAC_INSTANCE_NUMBER => 1 );
END;
/

And check it

select * from DBA_AUDIT_MGMT_LAST_ARCH_TS;

AUDIT_TRAIL            RAC_INSTANCE LAST_ARCHIVE_TS
---------------------- ------------ ------------------------------------
STANDARD AUDIT TRAIL              0 17-MAY-14 11.00:01.000000 PM +00:00

Excellent. Now we create a job as before with “USE_LAST_ARCH_TIMESTAMP => TRUE” and all is good, EXCEPT that nothing is moving the timestamp forward.
The job will be called, purge the old records and that’s it. When it is next invoked, the timestamp will not have moved on. We therefore need another job to move the timestamp on… SO why bother setting up a job with these automatic routines if it doesn’t automate all of the requirement? Bit annoying that. I just create my own scheduled job with 2 calls, and forget the built-in (half a) job aspect of the management system:

BEGIN
 DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP(
 AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
 LAST_ARCHIVE_TIME => systimestamp-30,
 RAC_INSTANCE_NUMBER => 1 );

 DBMS_AUDIT_MGMT.clean_audit_trail(
   audit_trail_type        => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
   use_last_arch_timestamp => TRUE);
END;
/

OK, the audit management system is pretty good; it deletes in batches, it works well, and it doesn’t need much from Oracle to make it much better. 7/10. Good but could do better.

You need to check out the associated views which show you the basic system config and what’s going on:

DBA_AUDIT_MGMT_CLEAN_EVENTS  Displays the cleanup event history
DBA_AUDIT_MGMT_CLEANUP_JOBS  Displays the currently configured audit trail purge jobs
DBA_AUDIT_MGMT_CONFIG_PARAMS Displays the currently configured audit trail properties
DBA_AUDIT_MGMT_LAST_ARCH_TS  Displays the last archive timestamps set for the audit trails

OK – that should keep things nice and tidy in the database. What about the audit files on the OS? Find out about that in Part 3.

Written by Neil Chandler

20th June 2014 at 06:03

Posted in audit

Tagged with , ,

Oracle Audit Control part 1 – location of the audit table

with one comment

One of the oldest problems with the Auditing capabilities within Oracle is that the SYS.AUD$ table resides in the SYSTEM tablespace. Unless you are rigorous in ensuring that your audit records are routinely pruned to keep the table manageable, it can single-handedly make the SYSTEM tablespace enormous.

Historically, we used to move the table and its associated objects to a new tablespace ourselves. In Oracle 7 is was a drop and re-create. Later we performed an alter table … move; command, coupled with an alter index rebuild. However, some bits frequently got left behind doing this…

In Oracle 10, a new package appeared: DBMS_AUDIT_MGMT. The procedure SET_AUDIT_TRAIL_LOCATION allowed you to move the table to a new tablespace. It didn’t work properly. It didn’t move indexes or LOB segments, and shouldn’t be used. However, roll on Oracle 11 and the (obvious) bugs have been ironed-out.

First of all, moving the table (NOTE: If the table is big, this may take quite a while. Only do this at a period of low system activity to avoid potential locking issues at the start and end of the move):

BEGIN
 DBMS_AUDIT_MGMT.SET_AUDIT_TRAIL_LOCATION(
 AUDIT_TRAIL_TYPE => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
 AUDIT_TRAIL_LOCATION_VALUE => 'SYSAUX');
END;
/

This works a treat in Oracle 11 and 12 for the standard audit trail, and for fine-grained auditing. It successfully moved every object associated with SYS.AUD$.

select owner,table_name,tablespace_name from dba_tables where table_name = 'AUD$'
OWNER                          TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------------ --------------------------
SYS                            AUD$                           SYSAUX

select owner,table_name,tablespace_name from dba_lobs where table_name = 'AUD$'
OWNER                          TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------------ --------------------------
SYS                            AUD$                           SYSAUX
SYS                            AUD$                           SYSAUX

select owner,table_name,tablespace_name from dba_indexes where table_name = 'AUD$'
OWNER                          TABLE_NAME                     TABLESPACE_NAME
------------------------------ ------------------------------ --------------------------
SYS                            AUD$                           SYSAUX
SYS                            AUD$                           SYSAUX

WARNING! Oracle still say that AUD$ should be in the system tablespace for upgrades. I can’t find anything that supercedes that, despite moving the table now being supported by an official package that works.

 

 

Written by Neil Chandler

17th June 2014 at 09:14

Posted in Administration

Tagged with , , , ,

SCAN listener logs – where are they?

leave a comment »

These are a bit of a pain as their location isn’t intuitive. So to remind me where they are:

Login to the server(s) as the grid owner and check the scan listener status. This will show you the location of the listener log. cd to just below the diag directory and you’re off!:

server-name:/u01/grid>ps -ef | grep SCAN
 grid 8542 8282 0 10:20 pts/0 00:00:00 grep SCAN
 grid 9349 1 0 Mar07 ? 00:07:33 /u01/app/11g/grid/bin/tnslsnr LISTENER_SCAN1 -inherit

server-name:/u01/grid>lsnrctl status LISTENER_SCAN1
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 28-MAY-2014 10:20:12
Copyright (c) 1991, 2013, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
 STATUS of the LISTENER
 ------------------------
 Alias LISTENER_SCAN1
 Version TNSLSNR for Linux: Version 11.2.0.4.0 - Production
 Start Date 07-MAR-2014 17:27:50
 Uptime 81 days 15 hr. 52 min. 21 sec
 Trace Level off
 Security ON: Local OS Authentication
 SNMP OFF
 Listener Parameter File /u01/app/11g/grid/network/admin/listener.ora
 Listener Log File /u01/app/11g/grid/log/diag/tnslsnr/server-name/listener_scan1/alert/log.xml
 Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.6.148.141)(PORT=1521)))
 Services Summary...
 Service "FRONT_APP_DB_SVC.WORLD" has 2 instance(s).
 Instance "ORCL1", status READY, has 1 handler(s) for this service...
 Instance "ORCL2", status READY, has 1 handler(s) for this service...
 Service "ORCL.WORLD" has 2 instance(s).
 Instance "ORCL1", status READY, has 1 handler(s) for this service...
 Instance "ORCL2", status READY, has 1 handler(s) for this service...
 Service "ORCLXDB.WORLD" has 2 instance(s).
 Instance "ORCL1", status READY, has 1 handler(s) for this service...
 Instance "ORCL2", status READY, has 1 handler(s) for this service...
 The command completed successfully

server-name:/u01/app/11g/grid/log>cd /u01/app/11g/grid/log
server-name:/u01/app/11g/grid/log>adrci
ADRCI: Release 11.2.0.4.0 - Production on Wed May 28 10:22:47 2014
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
ADR base = "/u01/app/11g/grid/log"
 adrci> show homes
 ADR Homes:
 diag/asmcmd/user_grid/server-name
 diag/tnslsnr/server-name/listener_scan3
 diag/tnslsnr/server-name/listener_scan2
 diag/tnslsnr/server-name/listener_scan1
 diag/asmtool/user_root/host_3797755080_80
adrci> show alert
Choose the alert log from the following homes to view:
1: diag/asmcmd/user_grid/server-name
 2: diag/tnslsnr/server-name/listener_scan3
 3: diag/tnslsnr/server-name/listener_scan2
 4: diag/tnslsnr/server-name/listener_scan1
 5: diag/asmtool/user_root/host_3797755080_80
 Q: to quit
Please select option: 4

and there we are…
remember to be on the correct host for each scan,
otherwise the alert (listener) log file will be out of date.

 

Written by Neil Chandler

6th June 2014 at 06:28

Posted in Problem Solving, RAC

Tagged with , , ,

Developers Killing Sessions

with 3 comments

When you end up spending a far great a percentage of your day than seems sensible killing off Java connections that Developers have carelessly left lying around, locking objects all over the place, you need a solution to get them to go away. The solution is to let them do it themselves!

I’m not advocating granting ALTER SYSTEM to Developers! That way madness lies, or certainly some unintended consequences. I’m all for Dev’s having a lot of freedom in the database, just not freedom with the database.

So, creating a stored procedure (in this example as sys, but as any user with an explicit ALTER SYSTEM privilege granted will do) to kill sessions without allowing too much latitude to do anything else seems appropriate. Here’s one I built earlier:

 

create or replace procedure sys.kill_session 
 ( p_sid IN number, p_serial IN number, p_instance IN number) as
 -- Neil Chandler. Grant the ability to kill session on a restricted basis. 21.07.2010
 l_username varchar2(30) := null;
 l_priv number := 1;
begin
 -- Who owns the session?
 select username into l_username
 from gv$session
 where sid = p_sid and serial#=p_serial and inst_id = p_instance;
 -- Check for DBA role
 select count(*) into l_priv
 from dba_role_privs
 where grantee = l_username and granted_role = 'DBA';
 -- If the user has the DBA priv, deny the kill request
 if l_priv > 0 or l_username is null
 then
 dbms_output.put_line 
  ('User request to kill session '||p_sid||','||p_serial||',@'||p_instance||
   ' denied. Session is for privileged user '||l_username||'.');
 else
 dbms_output.put_line 
   ('Killing user '||l_username||' - '||p_sid||','||p_serial||',@'||p_instance);
 execute immediate 'alter system disconnect session '''||
                    p_sid||','||p_serial||',@'||p_instance||
                   ''' immediate';
 end if;
end;
/

-- and let the proc be seen and used
create or replace public synonym kill_session for sys.kill_session; 
grant execute on kill_session to (whomever);

Then a nifty bit of sql to generate the kill commands for the Developers. 
Please include your favourite columns from gv$session:

select username,status,blocking_session,
 'exec kill_session ('|| sid||','||serial#||','||inst_id||')' Kill_Command 
 from gv$session
where username is not null
  and type <> 'BACKGROUND'
/

USERNAME  STATUS  BLOCKING_SESSION KILL_COMMAND
--------- ------- ---------------- ------------------------------------ 
SYS       ACTIVE                   exec kill_session (31,65,1)
SYSTEM    INACTIVE                 exec kill_session (58,207,1)
USER_1    INACTIVE                 exec kill_session (59,404,1)
USER_2    INACTIVE                 exec kill_session (72,138,1)
USER_2    INACTIVE                 exec kill_session (46,99,2)


May the odds be forever in your favour. Let the killing commence…

Written by Neil Chandler

29th May 2014 at 11:13

The trouble with Timezones & Grid Infrastructure

leave a comment »

When installing Oracle Grid Infrastructure 11.2 (and all other releases), you need to make sure that you have all of the server setting correct and to standard before you do the install. One that bit me recently was the timezone setting. The Red Hat 6.4 server(s) in question has the correct file in /etc/localtime (copied from /usr/share/zoneinfo/whatever). If I type in date, I get the reply in the correct timezone (GMT/BST as I’m in London), so all seems correct.

However, the slack Unix Sysadmin (which might or might not have been me) had not put the correct setting in /etc/sysconfig/clock. Unfortunately, when you install Grid Infrastructure, the setting is read from /etc/sysconfig/clock and embedded into a Grid Inforastructure config file. $GRID_HOME/crs/install/s_crsconfig_hostname_env.txt


### This file can be used to modify the NLS_LANG environment variable, which determines the charset to be used for messages.

### For example, a new charset can be configured by setting NLS_LANG=JAPANESE_JAPAN.UTF8

### Do not modify this file except to change NLS_LANG, or under the direction of Oracle Support Services

TZ=Europe/London

NLS_LANG=AMERICAN_AMERICA.AL32UTF8

TNS_ADMIN=

ORACLE_BASE=


If you change this entry, and you should check with Oracle Support if this is OK for your site, and you will need to restart Grid Infrastructure. The one thing about this that I really don’t like is that Oracle is storing a runtime configuration file in a an install directory. Does it do that anywhere else?

Written by Neil Chandler

11th April 2014 at 10:51

Follow

Get every new post delivered to your Inbox.

Join 27 other followers

%d bloggers like this: