26th February 2015 Leave a comment
One complication that you may face with replicating data using Goldengate (or other tools) is when your source character set is different to your destination character set. This is particularly true when the source character set is UTF-8 and the destination is not.
If the application does not sanitise (or you do not want to sanitise) inputs to restrict them to the lowest common denominator within your systems, you will need to ensure that you take action to ensure the source data is fed appropriately to the destination systems.
I recently experienced this at a client, but the special measures taken by me to allow Oracle UTF-8 data into a SQL Server database using a standard Windows-1252 character set also hit a Goldengate codepath bug, recreated here:
The source table, TAB1, has 3 columns for these purposes:
ID number NAME varchar2(50) COL_TS timestamp
The source table is allowed to contain NULLS, but the destination table must not, so a null value test is specified in the COLMAP in the REPLICAT.
To cope with the character set conversion, the parameter REPLACEBADCHAR SPACE is specified in the replication. This states that is there are ANY characters in the trail file which the destination database cannot store, then that character should be converted to (in this instance) a space.
REPLACEBADCHAR SPACE MAP SCHEMA_OWNER.TAB1, TARGET DBO.TAB1, COLMAP (USEDEFAULTS, NAME =@IF(@COLTEST(NAME, NULL), ' ' ,NAME));
All processing progressed nicely, with the NULLs entered into TAB1.NAME being converted into a single space, until an unexpected character was pasted into the screen on the source system and the REPLICAT abended:
2015-02-25 22:32:00 WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 of target column NAME failed because the source column contains a character that is not available in the target character set. 2015-02-25 22:32:00 WARNING OGG-01503 Aborting BATCHSQL transaction. Mapping error. 2015-02-25 22:32:01 WARNING OGG-01137 BATCHSQL suspended, continuing in normal mode. 2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2. 2015-02-25 22:32:01 WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252 of target column NAME failed because the source column contains a character that is not available in the target character set. 2015-02-25 22:32:01 WARNING OGG-01431 Aborted grouped transaction on 'dbo.TAB1', Mapping error. 2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2. 2015-02-25 22:32:01 WARNING OGG-01151 Error mapping from SCHEMA_OWNER.TAB1 to dbo.SUP_TAB1. 2015-02-25 22:32:01 WARNING OGG-01003 Repositioning to rba 123 in seqno 2. Source Context : SourceModule : [er.errors] SourceID : [er/errors.cpp] SourceFunction : [take_rep_err_action] SourceLine :  ThreadBacktrace :  elements : [Z:\gg12\gglog.dll(?CreateMessage@CMessageFactory@@QEAAPEAVCMessage@@PEAVCSourceContext@@IZZ+0x886) [0x000007FEF00809D6]] : [Z:\gg12\gglog.dll(?_MSG_ERR_MAP_TO_TANDEM_FAILED@@YAPEAVCMessage@@PEAVCSourceContext@@AEBV?$CQualDBObjName@$00@ggapp@gglib@ggs@@1W4MessageDisposition@CMessageFactory@@@Z+0x81) [0x000007FEF0043631]] : [Z:\gg12\replicat.exe(ERCALLBACK+0x733c) [0x000000013F6E96BC]] : [Z:\gg12\replicat.exe(ERCALLBACK+0x2fe7a) [0x000000013F7121FA]] : [Z:\gg12\replicat.exe(ERCALLBACK+0x6a575) [0x000000013F74C8F5]] : [Z:\gg12\replicat.exe(_ggTryDebugHook+0xea23) [0x000000013F7F2323]] : [Z:\gg12\replicat.exe(_ggTryDebugHook+0xe000) [0x000000013F7F1900]] : [Z:\gg12\replicat.exe(_ggTryDebugHook+0xe8cd) [0x000000013F7F21CD]] : [Z:\gg12\replicat.exe(ERCALLBACK+0x6a5f9) [0x000000013F74C979]] : [Z:\gg12\replicat.exe(CommonLexerNewSSD+0xc0d2) [0x000000013F8862D2]] : [C:\Windows\system32\kernel32.dll(BaseThreadInitThunk+0xd) [0x00000000773B652D]] : [C:\Windows\SYSTEM32\ntdll.dll(RtlUserThreadStart+0x21) [0x00000000774EC541]] 2015-02-25 22:32:01 ERROR OGG-01296 Error mapping from SCHEMA_OWNER.TAB1 to dbo.TAB1.
Looking in the Goldengate discard file (always a good place to start when you have a GG problem), you can see the problem character “ef bf bd”:
Oracle GoldenGate Delivery for SQL Server process started, group REP_SQL discard file opened: 2015-02-11 22:02:42.389000 Mapping error to target column: NAME Mapping error to target column: NAME Mapping error to target column: NAME Current time: 2015-02-25 22:32:01 Discarded record from action ABEND on error 0 Aborting transaction on ./dirdat/NC beginning at seqno 2 rba 123 error at seqno 2 rba 123 Problem replicating SCHEMA_OWNER.TAB1 to dbo.TAB1 Mapping problem with insert record (source format)... * ID = 123 000000: 31 32 33 |123 | NAME = NEIL \uFFFD CHA 000000: 4E 45 49 4C 20 ef bf bd 20 43 48 41 | NEIL ... CHA| COL_TS = 2015-02-25:22:31:56.303000000 000000: 32 30 31 35 2d 30 32 2d 32 35 3a 32 32 3a 33 31 |2015-02-25:22:31| 000010: 3a 35 36 2e 33 30 33 30 30 30 30 30 30 |:56.303000000 | * Process Abending : 2015-02-25 22:32:01
So, why didn’t REPLACEBADCHAR catch this and turn the offending character into a space? There’s a clue in the ABEND report information
WARNING OGG-00869 Conversion from character set UTF-8 of source column @IF() to character set windows-1252
The column is referred-to as @IF(), not as NAME. A quick scan of MOS show that this appears to be BUG 19818362 “Column function execution was happening internally under NOCHARSETCONVERSION cases” – it’s going through the wrong codepath for REPLCEBADCHAR to work. And this bug fix was released 10 days before this problem was encountered. Result!
The short-term fix? Remove the data manipulation from the REPLICAT
REPLACEBADCHAR SPACE MAP SCHEMA_OWNER.TAB1, TARGET DBO.TAB1, COLMAP (USEDEFAULTS);
Start and run the REPLICAT until past the problem, then revert the REPLICAT back to data manipulation until either Goldengate is patched and tested or we have to repeat this exercise due to another Unicode character problem occurs.