Goldengate: Problems with character sets

One complication that you may face with replicating data using Goldengate (or other tools) is when your source character set is different to your destination character set. This is particularly true when the source character set is UTF-8 and the destination is not.

If the application does not sanitise (or you do not want to sanitise) inputs to restrict them to the lowest common denominator within your systems, you will need to ensure that you take action to ensure the source data is fed appropriately to the destination systems.

I recently experienced this at a client, but the special measures taken by me to allow Oracle UTF-8 data into a SQL Server database using a standard Windows-1252 character set also hit a Goldengate codepath bug, recreated here:

The source table, TAB1, has 3 columns for these purposes:
 

 ID     number
 NAME   varchar2(50)
 COL_TS timestamp

 
The source table is allowed to contain NULLS, but the destination table must not, so a null value test is specified in the COLMAP in the REPLICAT.

To cope with the character set conversion, the parameter REPLACEBADCHAR SPACE is specified in the replication. This states that is there are ANY characters in the trail file which the destination database cannot store, then that character should be converted to (in this instance) a space.

REPLICAT snippet:

REPLACEBADCHAR SPACE
MAP SCHEMA_OWNER.TAB1, TARGET DBO.TAB1,
    COLMAP (USEDEFAULTS,
            NAME =@IF(@COLTEST(NAME, NULL), ' ' ,NAME));

 
All processing progressed nicely, with the NULLs entered into TAB1.NAME being converted into a single space, until an unexpected character was pasted into the screen on the source system and the REPLICAT abended:

2015-02-25 22:32:00 WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 of target column NAME failed because the source column contains a character that is not available in the target character set.
2015-02-25 22:32:00 WARNING OGG-01503 Aborting BATCHSQL transaction. Mapping error.
2015-02-25 22:32:01 WARNING OGG-01137 BATCHSQL suspended, continuing in normal mode.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.
2015-02-25 22:32:01 WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 of target column NAME failed because the source column contains a character that is not available in the target character set.
2015-02-25 22:32:01 WARNING OGG-01431 Aborted grouped transaction on 'dbo.TAB1', Mapping error.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.
2015-02-25 22:32:01 WARNING OGG-01151 Error mapping from SCHEMA_OWNER.TAB1 to dbo.SUP_TAB1.
2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2.

Source Context :
SourceModule : [er.errors]
SourceID : [er/errors.cpp]
SourceFunction : [take_rep_err_action]
SourceLine : [682]
ThreadBacktrace : [12] elements
: [Z:\gg12\gglog.dll(?CreateMessage@CMessageFactory@@QEAAPEAVCMessage@@PEAVCSourceContext@@IZZ+0x886) [0x000007FEF00809D6]]
: [Z:\gg12\gglog.dll(?_MSG_ERR_MAP_TO_TANDEM_FAILED@@YAPEAVCMessage@@PEAVCSourceContext@@AEBV?$CQualDBObjName@$00@ggapp@gglib@ggs@@1W4MessageDisposition@CMessageFactory@@@Z+0x81) [0x000007FEF0043631]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x733c) [0x000000013F6E96BC]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x2fe7a) [0x000000013F7121FA]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x6a575) [0x000000013F74C8F5]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xea23) [0x000000013F7F2323]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xe000) [0x000000013F7F1900]]
: [Z:\gg12\replicat.exe(_ggTryDebugHook+0xe8cd) [0x000000013F7F21CD]]
: [Z:\gg12\replicat.exe(ERCALLBACK+0x6a5f9) [0x000000013F74C979]]
: [Z:\gg12\replicat.exe(CommonLexerNewSSD+0xc0d2) [0x000000013F8862D2]]
: [C:\Windows\system32\kernel32.dll(BaseThreadInitThunk+0xd) [0x00000000773B652D]]
: [C:\Windows\SYSTEM32\ntdll.dll(RtlUserThreadStart+0x21) [0x00000000774EC541]]

2015-02-25 22:32:01 ERROR OGG-01296 Error mapping from SCHEMA_OWNER.TAB1 to dbo.TAB1.

Looking in the Goldengate discard file (always a good place to start when you have a GG problem), you can see the problem character “ef bf bd”:

Oracle GoldenGate Delivery for SQL Server process started, group REP_SQL discard file opened: 2015-02-11 22:02:42.389000
Mapping error to target column: NAME
Mapping error to target column: NAME
Mapping error to target column: NAME
Current time: 2015-02-25 22:32:01
Discarded record from action ABEND on error 0

Aborting transaction on ./dirdat/NC beginning at seqno 2 rba 123
error at seqno 2 rba 123
Problem replicating SCHEMA_OWNER.TAB1 to dbo.TAB1
Mapping problem with insert record (source format)...
*
ID = 123
000000: 31 32 33  |123     |

NAME = NEIL \uFFFD CHA
000000: 4E 45 49 4C 20 ef bf bd 20 43 48 41 | NEIL ... CHA|

COL_TS = 2015-02-25:22:31:56.303000000
000000: 32 30 31 35 2d 30 32 2d 32 35 3a 32 32 3a 33 31 |2015-02-25:22:31|
000010: 3a 35 36 2e 33 30 33 30 30 30 30 30 30          |:56.303000000   |
*

Process Abending : 2015-02-25 22:32:01

So, why didn’t REPLACEBADCHAR catch this and turn the offending character into a space? There’s a clue in the ABEND report information

WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 

The column is referred-to as @IF(), not as NAME. A quick scan of MOS show that this appears to be BUG 19818362 “Column function execution was happening internally under NOCHARSETCONVERSION cases” – it’s going through the wrong codepath for REPLCEBADCHAR to work. And this bug fix was released 10 days before this problem was encountered. Result!

The short-term fix? Remove the data manipulation from the REPLICAT

REPLACEBADCHAR SPACE
MAP SCHEMA_OWNER.TAB1, TARGET DBO.TAB1,
    COLMAP (USEDEFAULTS);

 

Start and run the REPLICAT until past the problem, then revert the REPLICAT back to data manipulation until either Goldengate is patched and tested or we have to repeat this exercise due to another Unicode character problem occurs.

Checking the alert log – the easy way

Do you check the alert log of your databases every day? In the morning when you get in? But what about the alerts which happen during the day? How do you spot them – especially if you don’t have Grid Control or Cloud Control configured. Even if you do have a full monitoring solution, this can be useful for a belt-and-braces approach.

Here’s a short bash shell script to use adrci to read through each ORACLE_HOME (for a DIAG location) and check every alert log contained therein, using adrci pattern matching functionality to search for problems. I usually schedule it within each host (using cron) to minimise the moving parts, and therefore minimise the opportunity for it to stop working. Any problems, and I get an email. I hope you find it useful. I usually keep it in /opt/oracle/bin, but you stick it in your script home of choice.

This should work for 11G and 12C database (tested to 12.1.0.2), unless I’ve made a cut/paste error :-)

#!/bin/bash
#########################################################################################
# Description: Read each Oracle Home directory. Run adrci matching for problems
# Author : N Chandler.2014-03-28
#
# crontab : # Check Alert Log 30.03.2014
# 00,30 * * * * /opt/oracle/bin/adrci_alert.sh > /opt/oracle/bin/log/adrci.cron.log 2>&1
#
#########################################################################################
# Which HOME?
 export ORACLE_HOME=/opt/app/oracle/product/11g
 export DIAG_LOC=/opt/app/oracle/diag/rdbms
# Who gets the alert?
 export RECIPIENT='neil@chandler.uk.com'
# Other Variables
 export LD_LIBRARY_PATH=$ORACLE_HOME/lib
 export HOST=`hostname`
 export PATH=$ORACLE_HOME/bin:$PATH
 export NLS_DATE_FORMAT='yyyy-mm-dd hh24:mi:ss'
 export SUBJECT="Oracle ALERTS on ${HOST} OK"
 export LOG=/tmp
 export ALERT=$LOG/error.txt

# Write the alert log message header for the email
 echo "${HOST} `date +%Y-%m-%d.%H:%M:%S-%Z`" > ${ALERT}
 echo "All alerts in ADRCI Alert log for the last 30 minutes" >> ${ALERT}
 echo "THIS ALERT WILL NOT BE REPEATED!!! TAKE ACTION NOW!!!" >> ${ALERT}
 echo "Follow-up on this email and check the alert log on ${HOST}" >> ${ALERT}

# find out the homes
 adrci_homes=( $(adrci exec="show homes" | grep -e rdbms -e asm))

# run through Each home found and examine the alert log
# Here we are looking for ORA- messges, Deadlock, anything which raises an incident or anything which is instance-level
# IN THE LAST 30 MINUTES (1/48), so we need to run this code every 30 minutes or we may miss something. 
 for adrci_home in ${adrci_homes[@]}
 do
   echo "Checking: ${adrci_home}" >> ${ALERT}
   echo $adrci_home' Alert Log' >> ${ALERT}
   adrci exec="set home ${adrci_home} ; show alert -p \\\"(message_text like '%ORA-%' or message_text like '%Deadlock%' or message_text like '%instance%' or message_text like '%incident%') and originating_timestamp>=systimestamp-(1/48) \\\"" -term >>${ALERT}
 done
# count the errors. This is a good place to exclude specific errors you wish to ignore with a -v match.
# note - your grep must be aligned with the pattern match above for this to work
num_errors=`grep -c -e 'TNS' -e 'ORA' -e 'Deadlock' -e 'instance' -e 'incident' ${ALERT} | grep -v 'ORA-28'`

# If there are any errors, lets email the alert information to someone
if [ $num_errors -gt 0 ]
then
  SUBJECT="ERROR in Oracle ALERT log on ${HOST}"
  mail -s "${SUBJECT}" ${RECIPIENT} < ${ALERT}
fi

UKOUG Tech 14

On Sunday I will be heading North from London to Liverpool for 4 days, to attend another UK Oracle User Group conference – #UKOUG_Tech14

I’m sure it will be as wonderful and informative a 4 days as you can get in the Oracle technical area. The hard part of attending is working out what and who to see.

I will be presenting there again – this time a talk on Goldengate late on the final day. I just need to get my slides a little more polished…

Hopefully I’ll see you there. Please say Hi! I am fairly social and mostly house trained. But no stalking, OK.

Epicaricacy

Wednesday’s word is Epicaricacy, meaning to take joy in the misfortune of others. Yes, this is the (admittedly rare) English word that doesn’t exist, causing an outbreak of German in otherwise sane sentence.

It’s interesting that taking pleasure from others misfortune as a single ‘concept'; Schadenfreude, skadeglädje, leedvermaak, skadefryd, skadeglädje and vahingonilo, are all Northern European words (being German, Swedish, Dutch, Danish, Norwegian and Finnish respectively). Us North Europeans seem to like little more that a good laugh at someone in pain.

Perhaps we should all be a little more Buddhist, and enjoy Muditā, the joy at another persons well being, unadulterated by self interest.

More on Epicaricacy in another post later this week, about JOINs. :-)

Adding a DEFAULT column in 12C

I was at a talk recently, and there was an update by Jason Arneil about adding columns to tables with DEFAULT values in Oracle 12C. The NOT NULL restriction has been lifted and now Oracle cleverly intercepts the null value and replaces it with the DEFAULT meta-data without storing it in the table. To repeat the 11G experiment I ran recently:

 

SQL> alter table ncha.tab1 add (filler_default char(1000) default 'EXPAND' not null);
Table altered.

SQL> select table_name,num_rows,blocks,avg_space,avg_row_len 
      from user_tables where table_name = 'TAB1';
TABLE_NAME NUM_ROWS       BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1            10000       1504          0        2017


In both releases we then issue:
SQL> alter table ncha.tab1 modify (filler_default null);
Table altered.


IN 11G
SQL> select table_name,num_rows,blocks,avg_space,avg_row_len
      from user_tables where table_name = 'TAB1';

TABLE_NAME NUM_ROWS       BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1            10000       3394          0        2017

BUT IN 12C
SQL> select table_name,num_rows,blocks,avg_space,avg_row_len
      from user_tables where table_name = 'TAB1';
TABLE_NAME NUM_ROWS       BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1            10000       1504          0        2017

So, as we can see, making the column NULLABLE in 12C didn’t cause it to go through and update every row in the way it must in 11G. It’s still a chained-row update accident waiting to happen, but its a more flexible accident :-)

However, I think it’s worth pointing out that you only get “free data storage” when you add the column. When inserting a record, simply having a column with a DEFAULT value means that the DEFAULT gets physically stored with the record if it is not specified. The meta-data effect is ONLY for subsequently added columns with DEFAULT values.

SQL> create table ncha.tab1 (pk number, c2 timestamp, filler char(1000), filler2 char(1000) DEFAULT 'FILLER2' NOT NULL) pctfree 1;
Table created.

SQL> alter table ncha.tab1 add constraint tab1_pk primary key (pk);
Table altered.

Insert 10,000 rows into the table, but not into FILLER2 with the DEFAULT
SQL> insert into ncha.tab1 (pk, c2, filler) select rownum id, sysdate, 'A' from dual connect by level <= 10000;
commit;
Commit complete.

Gather some stats and have a look after loading the table. Check for chained rows at the same time.
SQL> exec dbms_stats.gather_table_stats('NCHA','TAB1',null,100);
PL/SQL procedure successfully completed.

SQL> select table_name,num_rows,blocks,avg_space,avg_row_len
     from user_tables where table_name = 'TAB1';

TABLE_NAME   NUM_ROWS	  BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1		10000	    3394	  0	   2017

For a bit of fun, I thought I would see just how weird the stats might look if I played around with adding defaults

SQL> drop table ncha.tab1;
Table dropped.

SQL> create table ncha.tab1 (pk number) pctfree 1;
Table created.

SQL> alter table ncha.tab1 add constraint tab1_pk primary key (pk);
Table altered.

Insert 10,000 rows into the table

SQL> insert into ncha.tab1 (pk) select rownum id from dual connect by level <= 10000;
commit;
Commit complete.

Gather some stats and have a look after loading the table. Check for chained rows at the same time.
SQL> exec dbms_stats.gather_table_stats('NCHA','TAB1',null,100);

PL/SQL procedure successfully completed.

SQL> select table_name,num_rows,blocks,avg_space,avg_row_len
  2    from user_tables
  3   where table_name = 'TAB1';

TABLE_NAME   NUM_ROWS	  BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1		10000	      20	  0	      4

Now lets add a lot of defaults
SQL> alter table ncha.tab1 add (filler_1 char(2000) default 'F1' not null, filler_2 char(2000) default 'F2' null, filler_3 char(2000) default 'F3', filler_4 char(2000) default 'how big?' null );
Table altered.

Gather some stats and have a look after adding the column. Check for chained rows at the same time.
SQL> exec dbms_stats.gather_table_stats('NCHA','TAB1',null,100);

PL/SQL procedure successfully completed.

SQL> select table_name,num_rows,blocks,avg_space,avg_row_len
  2    from user_tables
  3   where table_name = 'TAB1';

TABLE_NAME   NUM_ROWS	  BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1		10000	      20	  0	   8008

10,000 rows with an AVG_ROW_LEN of 8008, all in 20 blocks. Magic!

Just to finish off, lets update each DEFAULT column so the table expands….

SQL> select filler_1, filler_2, filler_3, filler_4,count(*) from ncha.tab1 group by filler_1,filler_2,filler_3,filler_4;

FILLER_1   FILLER_2   FILLER_3	 FILLER_4     COUNT(*)
---------- ---------- ---------- ---------- ----------
F1	   F2	      F3	 how big?	 10000

So it's all there. The metadata is intercepting the nulls and converting them to the default on the fly, rather than storing them in the blocks.
So what happens if we actually UPDATE the table?

SQL> update ncha.tab1 set filler_1 = 'EXPAND', filler_2 = 'EXPAND', filler_3='EXPAND', filler_4='THIS BIG!';
10000 rows updated.

SQL> select filler_1, filler_2, filler_3, filler_4,count(*) from ncha.tab1 group by filler_1,filler_2,filler_3,filler_4;

FILLER_1   FILLER_2   FILLER_3	 FILLER_4     COUNT(*)
---------- ---------- ---------- ---------- ----------
EXPAND	   EXPAND     EXPAND	 THIS BIG!	 10000

Gather some stats and have a look after the update, checking for chained rows at the same time.
SQL> exec dbms_stats.gather_table_stats('NCHA','TAB1',null,100);

PL/SQL procedure successfully completed.

SQL> select table_name,num_rows,blocks,avg_space,avg_row_len
     from user_tables where table_name = 'TAB1';

TABLE_NAME   NUM_ROWS	  BLOCKS  AVG_SPACE AVG_ROW_LEN
---------- ---------- ---------- ---------- -----------
TAB1		10000	   19277	  0	   8010

SQL> 
SQL> analyze table tab1 list chained rows into chained_rows;

Table analyzed.

SQL> select count(*) CHAINED_ROWS from chained_rows;

CHAINED_ROWS
------------
       10000

Yep. That’s bigger.

Gambrinous

Wednesday’s word is Gambrinous, meaning full of beer, allegedly named after a Flemish King who is said to have invented beer.
 
Use: It was difficult to determine the most gambrinous group. Oakies, ACED, or Oakies ∩ ACED. However, it was not Oakies ∩ ACED ∩ BARBIGEROUS individuals, who tend to stick to cider.
 
 

Follow

Get every new post delivered to your Inbox.

Join 27 other followers

%d bloggers like this: