Goldengate: Problems with character sets

One complication that you may face with replicating data using Goldengate (or other tools) is when your source character set is different to your destination character set. This is particularly true when the source character set is UTF-8 and the destination is not.

If the application does not sanitise (or you do not want to sanitise) inputs to restrict them to the lowest common denominator within your systems, you will need to ensure that you take action to ensure the source data is fed appropriately to the destination systems.

I recently experienced this at a client, but the special measures taken by me to allow Oracle UTF-8 data into a SQL Server database using a standard Windows-1252 character set also hit a Goldengate codepath bug, recreated here:

The source table, TAB1, has 3 columns for these purposes:
 

 ID     number
 NAME   varchar2(50)
 COL_TS timestamp

 
The source table is allowed to contain NULLS, but the destination table must not, so a null value test is specified in the COLMAP in the REPLICAT.

To cope with the character set conversion, the parameter REPLACEBADCHAR SPACE is specified in the replication. This states that is there are ANY characters in the trail file which the destination database cannot store, then that character should be converted to (in this instance) a space.

REPLICAT snippet:

REPLACEBADCHAR SPACE
MAP SCHEMA_OWNER.TAB1, TARGET DBO.TAB1,
    COLMAP (USEDEFAULTS,
            NAME =@IF(@COLTEST(NAME, NULL), ' ' ,NAME));

 
All processing progressed nicely, with the NULLs entered into TAB1.NAME being converted into a single space, until an unexpected character was pasted into the screen on the source system and the REPLICAT abended:

2015-02-25 22:32:00 WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 of target column NAME failed because the source column contains a character that is not available in the target character set.
2015-02-25 22:32:00 WARNING OGG-01503 Aborting BATCHSQL transaction. Mapping error.
2015-02-25 22:32:01 WARNING OGG-01137 BATCHSQL suspended, continuing in normal mode.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.
2015-02-25 22:32:01 WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 of target column NAME failed because the source column contains a character that is not available in the target character set.
2015-02-25 22:32:01 WARNING OGG-01431 Aborted grouped transaction on 'dbo.TAB1', Mapping error.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.
2015-02-25 22:32:01 WARNING OGG-01151 Error mapping from SCHEMA_OWNER.TAB1 to dbo.SUP_TAB1.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.

Source Context :
SourceModule : [er.errors]
SourceID : [er/errors.cpp]
SourceFunction : [take_rep_err_action]
SourceLine : [682]
ThreadBacktrace : [12] elements
: [Z:\gg12\gglog.dll(?CreateMessage@CMessageFactory@@QEAAPEAVCMessage@@PEAVCSourceContext@@IZZ+0x886) [0x000007FEF00809D6]]
: [Z:\gg12\gglog.dll(?_MSG_ERR_MAP_TO_TANDEM_FAILED@@YAPEAVCMessage@@PEAVCSourceContext@@AEBV?$CQualDBObjName@$00@ggapp@gglib@ggs@@1W4MessageDisposition@CMessageFactory@@@Z+0x81) [0x000007FEF0043631]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x733c) [0x000000013F6E96BC]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x2fe7a) [0x000000013F7121FA]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x6a575) [0x000000013F74C8F5]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xea23) [0x000000013F7F2323]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xe000) [0x000000013F7F1900]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xe8cd) [0x000000013F7F21CD]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x6a5f9) [0x000000013F74C979]]
: [Z:\gg12\replicat.exe(CommonLexerNewSSD+0xc0d2) [0x000000013F8862D2]]
: [C:\Windows\system32\kernel32.dll(BaseThreadInitThunk+0xd) [0x00000000773B652D]]
: [C:\Windows\SYSTEM32\ntdll.dll(RtlUserThreadStart+0x21) [0x00000000774EC541]]

2015-02-25 22:32:01 ERROR OGG-01296 Error mapping from SCHEMA_OWNER.TAB1 to dbo.TAB1.

Looking in the Goldengate discard file (always a good place to start when you have a GG problem), you can see the problem character “ef bf bd”:

Oracle GoldenGate Delivery for SQL Server process started, group REP_SQL discard file opened: 2015-02-11 22:02:42.389000
Mapping error to target column: NAME
Mapping error to target column: NAME
Mapping error to target column: NAME
Current time: 2015-02-25 22:32:01
Discarded record from action ABEND on error 0

Aborting transaction on ./dirdat/NC beginning at seqno 2 rba 123
error at seqno 2 rba 123
Problem replicating SCHEMA_OWNER.TAB1 to dbo.TAB1
Mapping problem with insert record (source format)...
*
ID = 123
000000: 31 32 33  |123     |

NAME = NEIL \uFFFD CHA
000000: 4E 45 49 4C 20 ef bf bd 20 43 48 41 | NEIL ... CHA|

COL_TS = 2015-02-25:22:31:56.303000000
000000: 32 30 31 35 2d 30 32 2d 32 35 3a 32 32 3a 33 31 |2015-02-25:22:31|
000010: 3a 35 36 2e 33 30 33 30 30 30 30 30 30          |:56.303000000   |
*

Process Abending : 2015-02-25 22:32:01

So, why didn’t REPLACEBADCHAR catch this and turn the offending character into a space? There’s a clue in the ABEND report information

WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 

The column is referred-to as @IF(), not as NAME. A quick scan of MOS show that this appears to be BUG 19818362 “Column function execution was happening internally under NOCHARSETCONVERSION cases” – it’s going through the wrong codepath for REPLCEBADCHAR to work. And this bug fix was released 10 days before this problem was encountered. Result!

The short-term fix? Remove the data manipulation from the REPLICAT

REPLACEBADCHAR SPACE
MAP SCHEMA_OWNER.TAB1, TARGET DBO.TAB1,
    COLMAP (USEDEFAULTS);

 

Start and run the REPLICAT until past the problem, then revert the REPLICAT back to data manipulation until either Goldengate is patched and tested or we have to repeat this exercise due to another Unicode character problem occurs.

Checking the alert log – the easy way

Do you check the alert log of your databases every day? In the morning when you get in? But what about the alerts which happen during the day? How do you spot them – especially if you don’t have Grid Control or Cloud Control configured. Even if you do have a full monitoring solution, this can be useful for a belt-and-braces approach.

Here’s a short bash shell script to use adrci to read through each ORACLE_HOME (for a DIAG location) and check every alert log contained therein, using adrci pattern matching functionality to search for problems. I usually schedule it within each host (using cron) to minimise the moving parts, and therefore minimise the opportunity for it to stop working. Any problems, and I get an email. I hope you find it useful. I usually keep it in /opt/oracle/bin, but you stick it in your script home of choice.

This should work for 11G and 12C database (tested to 12.1.0.2), unless I’ve made a cut/paste error :-)

#!/bin/bash
#########################################################################################
# Description: Read each Oracle Home directory. Run adrci matching for problems
# Author : N Chandler.2014-03-28
#
# crontab : # Check Alert Log 30.03.2014
# 00,30 * * * * /opt/oracle/bin/adrci_alert.sh > /opt/oracle/bin/log/adrci.cron.log 2>&1
#
#########################################################################################
# Which HOME?
 export ORACLE_HOME=/opt/app/oracle/product/11g
 export DIAG_LOC=/opt/app/oracle/diag/rdbms
# Who gets the alert?
 export RECIPIENT='neil@chandler.uk.com'
# Other Variables
 export LD_LIBRARY_PATH=$ORACLE_HOME/lib
 export HOST=`hostname`
 export PATH=$ORACLE_HOME/bin:$PATH
 export NLS_DATE_FORMAT='yyyy-mm-dd hh24:mi:ss'
 export SUBJECT="Oracle ALERTS on ${HOST} OK"
 export LOG=/tmp
 export ALERT=$LOG/error.txt

# Write the alert log message header for the email
 echo "${HOST} `date +%Y-%m-%d.%H:%M:%S-%Z`" > ${ALERT}
 echo "All alerts in ADRCI Alert log for the last 30 minutes" >> ${ALERT}
 echo "THIS ALERT WILL NOT BE REPEATED!!! TAKE ACTION NOW!!!" >> ${ALERT}
 echo "Follow-up on this email and check the alert log on ${HOST}" >> ${ALERT}

# find out the homes
 adrci_homes=( $(adrci exec="show homes" | grep -e rdbms -e asm))

# run through Each home found and examine the alert log
# Here we are looking for ORA- messges, Deadlock, anything which raises an incident or anything which is instance-level
# IN THE LAST 30 MINUTES (1/48), so we need to run this code every 30 minutes or we may miss something. 
 for adrci_home in ${adrci_homes[@]}
 do
   echo "Checking: ${adrci_home}" >> ${ALERT}
   echo $adrci_home' Alert Log' >> ${ALERT}
   adrci exec="set home ${adrci_home} ; show alert -p \\\"(message_text like '%ORA-%' or message_text like '%Deadlock%' or message_text like '%instance%' or message_text like '%incident%') and originating_timestamp>=systimestamp-(1/48) \\\"" -term >>${ALERT}
 done
# count the errors. This is a good place to exclude specific errors you wish to ignore with a -v match.
# note - your grep must be aligned with the pattern match above for this to work
num_errors=`grep -c -e 'TNS' -e 'ORA' -e 'Deadlock' -e 'instance' -e 'incident' ${ALERT} | grep -v 'ORA-28'`

# If there are any errors, lets email the alert information to someone
if [ $num_errors -gt 0 ]
then
  SUBJECT="ERROR in Oracle ALERT log on ${HOST}"
  mail -s "${SUBJECT}" ${RECIPIENT} < ${ALERT}
fi

Extending an ACFS filesystem dynamically.

To extend an ACFS cluster filesystem dynamically, we need to use the acfsutil command:

node01:/u01/grid>/sbin/acfsutil size +10G /u02
acfsutil size: ACFS-03008: The volume could not be resized.  The volume expansion limit has been reached.
acfsutil size: ACFS-03216: The ADVM compatibility attribute for the diskgroup was below the required
                           version (11.2.0.4.0) for unlimited volume expansions.

Oh dear, not 11.0.2.4, so you can only extend volumes dynamically a few times (5) before the global bitmap becomes full. So, now it’s an offline change. :-(

Check what is accessing /u02 and stop it:

node01:/opt/oracle>sudo -s
[root@node01 oracle]# lsof /u02

COMMAND   PID   USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
bash     5566 oracle  cwd    DIR 252,50177    12288   78 /u02/goldengate/bin11
su      29509   root  cwd    DIR 252,50177    12288   78 /u02/goldengate/bin11

erm. kill -9 5566 29509 :-)

DO THIS ON EVERY RAC NODE!

[root@node01 oracle]# umount –t acfs /u02
[root@node02 oracle]# umount –t acfs /u02
[root@node-n oracle]# umount –t acfs /u02

Once unmounted, we can “repair” the global bitmap:

[root@node02 oracle]# fsck -y -t acfs  /dev/asm/acfsdisk_u02-98
fsck from util-linux-ng 2.17.2
version                   = 11.2.0.4.0
*****************************
********** Pass: 1 **********
*****************************
Oracle ASM Cluster File System (ACFS) On-Disk Structure Version: 39.0
 ACFS file system created at: Thu Jan  2 17:08:02 2014
 checking primary file system
 Files checked in primary file system: 25%
 Files checked in primary file system: 100%

 fsck.acfs: ACFS-07728: The Global_BitMap file has reached the maximum number of extents (5).
 The file system can no longer be expanded. 

 Running fsck.acfs in fixer mode will attempt to consolidate the storage bitmap into 
 fewer extents which would allow for file system expansion

 Checking if any files are orphaned...
 0 orphans found
 Checker completed with no errors.

So lets fix it – output seriously trimmed, but with the important bit

[root@node02 oracle]# /sbin/fsck.acfs -a -v /dev/asm/acfsdisk_u02-98
fsck from util-linux-ng 2.17.2
version                   = 11.2.0.4.0
 *****************************
 ********** Pass: 1 **********
 *****************************
 Oracle ASM Cluster File System (ACFS) On-Disk Structure Version: 39.0
 ACFS file system created at: Thu Jan  2 17:08:02 2014

 checking primary file system
 Files checked in primary file system: 25%
 Files checked in primary file system: 100%

 fsck.acfs: ACFS-07729: The Global_Bitmap file has been
 consolidated into 2 extents.
 This may allow for file system expansion via the 'acfsutil size' command.
  
 Checking if any files are orphaned...
 0 orphans found
 Checker completed with no errors.

So, we’re done and can re-mount ON EVERY NODE. Given it’s now 2 extents, and the max we can have is 5, we have 3 more dynamic extensions before we need to do this again.

mount –t acfs /dev/asm/acfsdisk_u02-98 /u02

And re-attempt to expand the filesystem

node01:/u01/grid>df –h /u02
Filesystem              Size  Used Avail Use% Mounted on
/dev/asm/acfsdisk_u02-98  325G   36G  290G  12% /u02

node01:/u01/grid>/sbin/acfsutil size +10G /u02
acfsutil size: new file system size: 359703511040 (343040MB)
node01:/u01/grid>df -h

node01:/u01/grid>df -h /u02
Filesystem              Size  Used Avail Use% Mounted on
/dev/asm/acfsdisk_u02-98  335G   36G  300G  11% /u02
node01:/u01/grid>

Yey! Bigger filesystem! Lets minimise the amount of times it needs to be extended in the future but doing it in big lumps. Might just save a planned outage.

 

Alternatively, upgrade Grid Infra to at least 11.2.0.4 and set advm compatibility to 11.2.0.4 and the restriction will be gone for good:

ALTER DISKGROUP acfsdisk SET ATTRIBUTE 'compatible.asm' = '11.2.0.4', 'compatible.rdbms' = '11.2.0.4', 'compatible.advm' = '11.2.0.4';
(or ASMCMD [+] > setattr -G acfsdisk compatible.advm 11.2.0.4)
(or right-click on the disk group in asmca and click "edit attributes")

Releasing to schemas the easy way

Sometimes we occasionally just miss the obvious, for years. Just noticed that an easy way to release code to a particular schema is to login as your normal DBA user (USER1) [as preferred by audit], use the alter session command to switch to point to the release schema (USER2) and run your DDL. Oracle behaves, from an object-owner perspective, as if you are logged-in as the schema owner without all the negative aspects of actually having to login as the schema owner or prefix all object with the schema owner.

This is particularly useful when you have multiple schemas within the database and you are releasing to more than one schema at once.

 

e.g.

sqlplus USER1@ORCL

SQL*Plus: Release 11.2.0.1.0 Production on Wed Sep 17 12:17:36 2014

Copyright (c) 1982, 2010, Oracle. All rights reserved.

Enter password:

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options

12:17:38 USER1@ORCL>select owner,table_name from dba_tables where table_name = 'N1';

no rows selected

12:17:46 USER1@ORCL>create table n1 (c1 date);

Table created.

12:17:56 USER1@ORCL>select owner,table_name from dba_tables where table_name = 'N1';

OWNER                          TABLE_NAME
------------------------------ ------------------------------
USER1                          N1

12:17:57 USER1@ORCL>alter session set current_schema=USER2;

Session altered.

12:18:08 USER1@ORCL>create table n1 (c1 date);

Table created.

12:18:11 USER1@ORCL>select owner,table_name from dba_tables where table_name = 'N1';

OWNER                          TABLE_NAME
------------------------------ ------------------------------
USER2                          N1
USER1                          N1

 

 

SCAN listener logs – where are they?

These are a bit of a pain as their location isn’t intuitive. So to remind me where they are:

Login to the server(s) as the grid owner and check the scan listener status. This will show you the location of the listener log. cd to just below the diag directory and you’re off!:

server-name:/u01/grid>ps -ef | grep SCAN
 grid 8542 8282 0 10:20 pts/0 00:00:00 grep SCAN
 grid 9349 1 0 Mar07 ? 00:07:33 /u01/app/11g/grid/bin/tnslsnr LISTENER_SCAN1 -inherit

server-name:/u01/grid>lsnrctl status LISTENER_SCAN1
LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 28-MAY-2014 10:20:12
Copyright (c) 1991, 2013, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER_SCAN1)))
 STATUS of the LISTENER
 ------------------------
 Alias LISTENER_SCAN1
 Version TNSLSNR for Linux: Version 11.2.0.4.0 - Production
 Start Date 07-MAR-2014 17:27:50
 Uptime 81 days 15 hr. 52 min. 21 sec
 Trace Level off
 Security ON: Local OS Authentication
 SNMP OFF
 Listener Parameter File /u01/app/11g/grid/network/admin/listener.ora
 Listener Log File /u01/app/11g/grid/log/diag/tnslsnr/server-name/listener_scan1/alert/log.xml
 Listening Endpoints Summary...
 (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER_SCAN1)))
 (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.6.148.141)(PORT=1521)))
 Services Summary...
 Service "FRONT_APP_DB_SVC.WORLD" has 2 instance(s).
 Instance "ORCL1", status READY, has 1 handler(s) for this service...
 Instance "ORCL2", status READY, has 1 handler(s) for this service...
 Service "ORCL.WORLD" has 2 instance(s).
 Instance "ORCL1", status READY, has 1 handler(s) for this service...
 Instance "ORCL2", status READY, has 1 handler(s) for this service...
 Service "ORCLXDB.WORLD" has 2 instance(s).
 Instance "ORCL1", status READY, has 1 handler(s) for this service...
 Instance "ORCL2", status READY, has 1 handler(s) for this service...
 The command completed successfully

server-name:/u01/app/11g/grid/log>cd /u01/app/11g/grid/log
server-name:/u01/app/11g/grid/log>adrci
ADRCI: Release 11.2.0.4.0 - Production on Wed May 28 10:22:47 2014
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
ADR base = "/u01/app/11g/grid/log"
 adrci> show homes
 ADR Homes:
 diag/asmcmd/user_grid/server-name
 diag/tnslsnr/server-name/listener_scan3
 diag/tnslsnr/server-name/listener_scan2
 diag/tnslsnr/server-name/listener_scan1
 diag/asmtool/user_root/host_3797755080_80
adrci> show alert
Choose the alert log from the following homes to view:
1: diag/asmcmd/user_grid/server-name
 2: diag/tnslsnr/server-name/listener_scan3
 3: diag/tnslsnr/server-name/listener_scan2
 4: diag/tnslsnr/server-name/listener_scan1
 5: diag/asmtool/user_root/host_3797755080_80
 Q: to quit
Please select option: 4

and there we are…
remember to be on the correct host for each scan,
otherwise the alert (listener) log file will be out of date.

 

Developers Killing Sessions

When you end up spending a far great a percentage of your day than seems sensible killing off Java connections that Developers have carelessly left lying around, locking objects all over the place, you need a solution to get them to go away. The solution is to let them do it themselves!

I’m not advocating granting ALTER SYSTEM to Developers! That way madness lies, or certainly some unintended consequences. I’m all for Dev’s having a lot of freedom in the database, just not freedom with the database.

So, creating a stored procedure (in this example as sys, but as any user with an explicit ALTER SYSTEM privilege granted will do) to kill sessions without allowing too much latitude to do anything else seems appropriate. Here’s one I built earlier:

 

create or replace procedure sys.kill_session 
 ( p_sid IN number, p_serial IN number, p_instance IN number) as
 -- Neil Chandler. Grant the ability to kill session on a restricted basis. 21.07.2010
 l_username varchar2(30) := null;
 l_priv number := 1;
begin
 -- Who owns the session?
 select username into l_username
 from gv$session
 where sid = p_sid and serial#=p_serial and inst_id = p_instance;
 -- Check for DBA role
 select count(*) into l_priv
 from dba_role_privs
 where grantee = l_username and granted_role = 'DBA';
 -- If the user has the DBA priv, deny the kill request
 if l_priv > 0 or l_username is null
 then
 dbms_output.put_line 
  ('User request to kill session '||p_sid||','||p_serial||',@'||p_instance||
   ' denied. Session is for privileged user '||l_username||'.');
 else
 dbms_output.put_line 
   ('Killing user '||l_username||' - '||p_sid||','||p_serial||',@'||p_instance);
 execute immediate 'alter system disconnect session '''||
                    p_sid||','||p_serial||',@'||p_instance||
                   ''' immediate';
 end if;
end;
/

-- and let the proc be seen and used
create or replace public synonym kill_session for sys.kill_session; 
grant execute on kill_session to (whomever);

Then a nifty bit of sql to generate the kill commands for the Developers. 
Please include your favourite columns from gv$session:

select username,status,blocking_session,
 'exec kill_session ('|| sid||','||serial#||','||inst_id||')' Kill_Command 
 from gv$session
where username is not null
  and type <> 'BACKGROUND'
/

USERNAME  STATUS  BLOCKING_SESSION KILL_COMMAND
--------- ------- ---------------- ------------------------------------ 
SYS       ACTIVE                   exec kill_session (31,65,1)
SYSTEM    INACTIVE                 exec kill_session (58,207,1)
USER_1    INACTIVE                 exec kill_session (59,404,1)
USER_2    INACTIVE                 exec kill_session (72,138,1)
USER_2    INACTIVE                 exec kill_session (46,99,2)


May the odds be forever in your favour. Let the killing commence…

Follow

Get every new post delivered to your Inbox.

Join 27 other followers

%d bloggers like this: