Goldengate Data Manipulation – When Inserts & Updates differ

One very useful aspect of Golden Gate is to allow the manipulation of data between the source and the destinations.

One recent problem that I encountered was to alter the data differently for inserts than for updates. This was caused by the receiving system needing to have some default data in columns which may or may not be supplied by the insert or update statements. This is slightly more complex than first imagined:

  • If we have an INSERT and the column value IS NULL, or the column value IS NOT SUPPLIED by the insert statement, we should set the default for that column.
  • If we have an UPDATE and the column value IS NULL, then we should set the default for that column.
    We must NOT set the default if the UPDATE does not supply the column, otherwise we may incorrectly overwrite data in the target system.

The first problem is that, by default, you are not allowed to have more than one table mapping per table. To get around this, you need to use the “ALLOWDUPTARGETMAP” parameter. You can then add multiple mappings.

You need to be aware that each mapping will fire for each transaction action. If you have 2 active table mappings for the same table, you will end up with 2 inserts/updates/deletes. Get this mapping wrong and your data integrity will be destroyed, and you will get a lot of constraint errors. In this case we have 2 mappings, one for inserts and one for updates and deletes. I need to use the get/ignore commands to indicate which actions each mapping should use.

The following example was for a data pump, but it is valid to do this for all extracts and replicats.

 

-- ggsci: add extract p_neil, exttrailsource /u02/gg/bin12/dirdat/NE, BEGIN NOW
-- ggsci: add rmttrail ./dirdat/NP extract p_neil megabytes 100

EXTRACT p_neil
USERID owner_goldengate@DB_LOCAL PASSWORD password
passthru
TARGETDEFS ./dirdef/defgen.neil.def
RMTHOST remote.server.world mgrport 7809
RMTTRAIL ./dirdat/NP

-- So we can have multiple mappings for a single table. This is a dangerous parameter!
ALLOWDUPTARGETMAP

-- FOR INSERTS - REPLACE MISSING COLUMNS
getinserts
ignoreupdates
ignoredeletes

TABLE NCHA.NEIL, TARGET NCHA.NEIL
COLMAP (usedefaults, &
C2 = @IF (@COLTEST (C2 , NULL, MISSING) , '1900-01-01:00:00:00.000000' , C2 ), &
C3 = @IF (@COLTEST (C3 , NULL, MISSING) , '1900-01-01:00:00:00.000000' , C3 ), &
);

-- FOR UPDATES - IGNORE MISSING COLUMNS
-- We will do the deletes here too. If they are supplied as NULL they should be modified
-- You may need to do a separate section for deletes depending upon your rules.
ignoreinserts
getupdates
getdeletes

TABLE NCHA.NEIL, TARGET NCHA.NEIL
COLMAP (usedefaults, &
C2 = @IF (@COLTEST (C2 , NULL) , '1900-01-01:00:00:00.000000' , C2 ), &
C3 = @IF (@COLTEST (C3 , NULL) , '1900-01-01:00:00:00.000000' , C3 ), &
);

-- And back to normal for subsequent table mappings
getinserts
getupdates
getdeletes

UKOUG Tech15 Call for Papers

The #ukoug_tech15 call for papers has gone out, and will be open until midnight on 10th May 2015

TECH15_EF_SAA_v1

Last year was a fabulous event in Liverpool and we hope and expect it to be even better in Birmingham this year.

Read more of this post

Goldengate Log Rotation

Golden Gate 12 has some excellent commands to keep your log files in check, plus one glaring omission (scheduled for a future enhancement)

Each extract, datapump and replicat will be writing to report (.rpt) and discard (.dsc) files in the dirrpt directory (if you aren’t specifying a discard file, you should. They are very useful for troubleshooting)
If your system is up for a long time, these files are going to get large. Oracle has realised this and provides some lovely in-built log rotation commands. To keep my parameter (.prm) files nice and neat and consistent, I use include files, and this is a perfect case for a standard include.

report.prm:

-- Standard include commands for ALL extracts and replicats to ensure they are aligned
-- Write the days stats out to the file at the end of every day.
-- Roll the file over every week
-- Report just how much throughput we have every 15 minutes
-- History: 14.04.2015 N Chandler 
-- place the command include/dirprm/report.prm in your parameter files

STATOPTIONS REPORTDETAIL, RESETREPORTSTATS
REPORT AT 23:59
REPORTROLLOVER AT 00:01 ON MONDAY
REPORTCOUNT EVERY 15 MINUTES, RATE
-- or if you would rather by volume...
--REPORTCOUNT EVERY 10000000 RECORDS, RATE

I think it’s worth pointing out here what the 2 bracketed throughput numbers output by the REPORTCOUNT commands mean, as you’ll struggle to find it in the documentation

Rate  = number of records processed per second since startup divided by the total time since startup of the extract/replicat
Delta = number of records processed per second since last report divided by time since last report (in this case, 15 minutes)

 

 

There is 1 notable growing file which you cannot rotate using Goldengate commands: ggserr.log

This is a significant oversight by Oracle and will be rectified in a future release, but as of 12.1 you have to manually sort this out. You have 2 main options to do this:

1. Stop the manager, rename the file, restart the manager
2. Copy the file to a new file and then empty the in-place file by catting /dev/null into it. (I’m sure there’s a Windows equivalent of this, but I mainly work on Unix)
* DO NOT simply delete the file while the manager is running.
All future error output will drop into a “black hole” until the manager is restarted.   Option 2 tends to be preferably, so here’s part of a bash script I use to perform this action

#!/bin/bash
# rotate_ggserr_log.sh - copies the logfile to one side with a date suffix and blows away the current file
# but leaves it in place to we can continue to write to it with the manager.
# Neil Chandler 14.04.2015 created
#
today=`date +%Y%m%d`

-- Check to see if we have already rolled-over today
if [ -e /u99/gg/bin/ggserr.log.${today} ]
then
 echo "File /u99/gg/bin/ggserr.log.${today} exist already. Stopping."
else
 # copy the log file preserving attributes
 /bin/cp -pnv /u99/gg/bin/ggserr.log /u99/gg/bin/ggserr.log.${today}

 # See if there is a difference - did you copy it successfully?
 diff /u99/gg/bin/ggserr.log /u99/gg/bin/ggserr.log.${today}
 RC=$?

 # If there is no difference, wipe the ggserr.log file out
 # otherwise stop!
 if [ ${RC} -eq 0 ]
 then
  echo "clear the file /u99/gg/bin/ggserr.log"
  cat /dev/null > /u99/gg/bin/ggserr.log
  exit ${RC}
 else
  echo "Error - cannot clear file /u99/gg/bin/ggserr.log as it's not the same as the copied version. Stopping."
  exit ${RC}
 fi
fi

Developers

Just a small Sunday night anecdote with a wider point. I, or maybe a colleague, recently received an update statement from a developer. Now, this developer is long of tooth and is well versed in the ways of Oracle data manipulation.

The aforementioned update statement contained an interesting hint. BYPASS_UJVC. You may not have heard of this hint. It’s not commonly used, although it’s been around since Oracle 8.1.5. Mainly because it is both undocumented and unsupported by Oracle. In the right hands, it’s a very neat way around a problem of doing an update through a join where you would otherwise be restricted by the potential of having transient keys (i.e. multiple updates via the join giving random results). There’s a bunch of other blogs around about how/why/not to use it so I won’t waffle on here.

However, the Dev was disappointed when we [the DBA’s] told him to, erm, rewrite his code (probably as a merge – tends to let you get round the same problem), given he has been using this hint for as long as it’s been around (a long time!) but as it’s NOT supported we wouldn’t allow it. I’m not about to update millions of rows in a multi-billion row database with an unsupported function unless I have a MAJOR problem.

The point of the story is, that evening, we met up for drinks with another Developer (Dev2) whom we have both known for a couple of decades. Dev says to Dev2 “DBA’s – they are never happy, going around being all DBA-ish and No-you-cant today”, and (unprompted) Dev2 says “BYPASS_UJVC?”

I like Developers – I used to be one – but the role of Developer and DBA should be symbiotic and not adversarial as it can become upon occasion. We should work together – all of the best system I have delivered have a great relationship between Dev and DBA – but don’t ask me to bend my database out of shape just so you can take a short cut. Talk about physicalities, data access, data life-cycle, volume and performance before a line of code has been written and we will all work much better together. If all parties feel they have some ownership with the system, there is a greater chance of success and joy.

Auditing Read-Only Standbys

If your company has a passing interest in database security, you are probably running some sort of session auditing at the very least [audit session] (although this can also be useful for troubleshooting connectivity issues). There’s a reasonable chance you’re also running some level of object auditing, or even DML access auditing if your employer so dictates:

Check access/change of objects in the DB
  1  select audit_option, success, failure
  2  from dba_stmt_audit_opts
  3  union
  4  select privilege, success, failure
  5  from dba_priv_audit_opts
  6* order by 2,1
AUDIT_OPTION                                       SUCCESS                                  FAILURE
-------------------------------------------------- ---------------------------------------- -----------------
ALTER ANY PROCEDURE                                BY ACCESS                                BY ACCESS
ALTER ANY TABLE                                    BY ACCESS                                BY ACCESS
ALTER DATABASE                                     BY ACCESS                                BY ACCESS
ALTER PROFILE                                      BY ACCESS                                BY ACCESS
ALTER SYSTEM                                       BY ACCESS                                BY ACCESS
ALTER TABLE                                        BY ACCESS                                BY ACCESS
ALTER USER                                         BY ACCESS                                BY ACCESS
AUDIT SYSTEM                                       BY ACCESS                                BY ACCESS
CREATE ANY JOB                                     BY ACCESS                                BY ACCESS
CREATE ANY LIBRARY                                 BY ACCESS                                BY ACCESS
CREATE ANY PROCEDURE                               BY ACCESS                                BY ACCESS
CREATE ANY TABLE                                   BY ACCESS                                BY ACCESS
CREATE EXTERNAL JOB                                BY ACCESS                                BY ACCESS
CREATE LIBRARY                                     BY ACCESS                                BY ACCESS
CREATE PROCEDURE                                   BY ACCESS                                BY ACCESS
CREATE PUBLIC DATABASE LINK                        BY ACCESS                                BY ACCESS
CREATE SESSION                                     BY ACCESS                                BY ACCESS
CREATE TABLE                                       BY ACCESS                                BY ACCESS
CREATE USER                                        BY ACCESS                                BY ACCESS
DATABASE LINK                                      BY ACCESS                                BY ACCESS
DIRECTORY                                          BY ACCESS                                BY ACCESS
DROP ANY PROCEDURE                                 BY ACCESS                                BY ACCESS
DROP ANY TABLE                                     BY ACCESS                                BY ACCESS
DROP PROFILE                                       BY ACCESS                                BY ACCESS
DROP USER                                          BY ACCESS                                BY ACCESS
EXEMPT ACCESS POLICY                               BY ACCESS                                BY ACCESS
GRANT ANY OBJECT PRIVILEGE                         BY ACCESS                                BY ACCESS
GRANT ANY PRIVILEGE                                BY ACCESS                                BY ACCESS
GRANT ANY ROLE                                     BY ACCESS                                BY ACCESS
PROFILE                                            BY ACCESS                                BY ACCESS
PUBLIC DATABASE LINK                               BY ACCESS                                BY ACCESS
PUBLIC SYNONYM                                     BY ACCESS                                BY ACCESS
ROLE                                               BY ACCESS                                BY ACCESS
SYSTEM AUDIT                                       BY ACCESS                                BY ACCESS
SYSTEM GRANT                                       BY ACCESS                                BY ACCESS
USER                                               BY ACCESS                                BY ACCESS
CREATE JOB                                         BY SESSION                               BY SESSION

And you’re probably writing it into a database table [AUDIT_TRAIL=’DB’]

so how does that work if you open a Dataguard database read only? You are writing into sys.aud$ on the Primary, and that table is replicated to the Standby. So what happens?

From Oracle 11G, if you are running Dataguard and opening the standby up for read access, you may not notice the line in the ALERT log which reads:

AUDIT_TRAIL initialization parameter is changed to OS, as DB is NOT compatible for database opened with read-only access

So it starts writing down O/S audit trail files for all of your auditing options (well, the session connect and DML audit options – you can’t run DDL in a r/o DB). You might want to go and see just how many files it has written to [audit_file_dest], as you may be surprised at just how many are in there.

You may, one day, either run out of space or (more worryingly) have so many millions of files that it causes a performance problem when Oracle access the O/S directory. You might want to think about some sort of periodic clean-up job.

Sequences – where did the view go?

I was pottering around with the sequence cache recently, investigating a few things like exactly when it refreshes (such as if it is flushed or aged from the shared pool – pin it!) and I was monitoring the next value using V$_SEQUENCE, like I have been since, erm, Oracle 8i in 1999 or summit.

I happened to log into an Oracle 12.1.0.2 DB for a check-test and it was empty! My first reaction was, how come there are no sequences cached in the shared pool, shortly followed by No! The view doesn’t work any more!

Right, it’s a v$ view. (Actually, it’s a V$_ view, of which the only other one I know is V_$_lock). It is based upon c-memory-structure-plus-helper x$ “tables” (warning, these tables required tuning/diagnostic licensing packs to query!), so let’s query the underlying object view: x$kglob.

What used to happen pre-Oracle 12c:

SYS @ orcl > select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production

select test_seq.nextval from dual;
NEXTVAL
----------
40067

select sequence_owner,sequence_name,nextvalue,cache_size from v$_sequences where sequence_name = 'TEST_SEQ'
/
SEQUENCE_OWNER                 SEQUENCE_NAME                  NEXTVALUE  CACHE_SIZE
------------------------------ ------------------------------ ---------- ----------
NEIL                           TEST_SEQ                            40068         20

select KGLNAOBJ,
KGLOBTN0 next_val,
KGLOBTN4 cache_size,
kglhdkmk kept
from x$kglob
where KGLOBTYP=6
and KGLNAOBJ = 'TEST_SEQ'
/
KGLNAOBJ             NEXT_VAL   CACHE_SIZE KEPT
-------------------- ---------- ---------- ----------
TEST_SEQ                  40068         20          0

And now what happened in Oracle 12:

SQL> select * from v$version;
BANNER CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production 0

select test_seq.nextval from dual;
NEXTVAL
----------
3

select sequence_owner,sequence_name,nextvalue,cache_size from v$_sequences where sequence_name = 'TEST_SEQ';

no rows selected

select KGLNAOBJ,
KGLOBTN0 next_val,
KGLOBTN4 cache_size,
kglhdkmk kept
from x$kglob
where KGLOBTYP=6
and KGLNAOBJ = 'TEST_SEQ';

KGLNAOBJ NEXT_VAL CACHE_SIZE KEPT
-------------------- ---------- ---------- ----------
TEST_SEQ 0

So the values are no longer exposed in Oracle 12c. NEXT_VAL and CACHE_SIZE are empty.

Does anybody know where to get this value? It's no longer in any column in X$KGLOB. Have Oracle hidden it in some weird container mode I don't understand yet? Can anybody help, or am I going to have to attach the SGA with a C program and step through a bunch of linked lists?

Sequences – a quick guide and an unexpected quirk

Sequences on Oracle databases are simple, but there are some basic truths about sequences which need to be understood if you are going to use them, especially on RAC:

  1. Sequences will have “holes” in them, even if you specify NOCACHE (e.g. if you rollback your transaction).
  2. Don’t specify NOCACHE. It doesn’t scale.
  3. Sequences will NOT necessarily be, erm, sequential for your transactions. That will depend upon the COMMIT order and if you are using RAC.
  4. The higher the CACHE value, the bigger the holes but the more scalable the sequence will be become. You will lose the entire set of unused cache values across instance restart.
  5. Oracle stores the CACHE value on EACH RAC node, so for a cache of 100,000, NODE 1 gets 1-100,000, NODE 2 get 100,001-200,000 to play with, and so on.
  6. If you are using RAC, and you have a heavy workload on one node and a low number of transactions on another node, there may be a significant disparity between the NEXTVAL on each node. I have seen this disparity manifest in the tens of millions.
  7. There is a global cache lock which must be passed from node to node when getting a new cache of sequence numbers on that node. It is lightweight, but it is a serialisation point. If your CACHE value is too low for your throughput, you will get contention on this lock.
  8. Same as the serialisation point if you specify ORDERED, so you get the sequences advancing in step (rather than each node having its own cache) on multiple RAC nodes. Don’t do this either.
  9. The point if sequences is to give you a unique value for a (surrogate) Primary Key. You can rely upon sequences to be unique. Well. Mostly. As long as they do not CYCLE.
  10. If you use sequences for a PK, you will probably end up with a right-hand index with hot-block contention during inserts, causing you to implement something suboptimal, like hash partitioning or reverse-key indexes to get around your design flaw.

Sequences on RAC can behave in unexpected ways. The below example is a little contrived but shows a potentially unexpected behaviour caused by CYCLEing your sequences:

NODE 1 NODE 2
create sequence seq_1
start     with 1
increment by   1
maxvalue       10
cache          2
cycle;

Sequence created.

NODE1 :select seq_1.nextval from dual;
NEXTVAL: 1
NODE2 :select seq_1.nextval from dual;
NEXTVAL
3 <- Gets next value above cache on node 1
NODE1 : /
NEXTVAL: 2
NODE1 : /
NEXTVAL: 5 (skipping 3 and 4 on Node 2)
6
7
8
9
10
1
2
3  <- erm. Hang on. Node 2 has this!
4  <- and this!!!
5
6
7
8
9
10
1
NODE2 : /
NEXTVAL
4 <- Reading from the local cache
NODE2 : /
NEXTVAL
3 <- And get a new cache value

So, if you are using sequences and you can’t guarantee sequence and you can’t guarantee no gaps and, as this demo shows, you can’t guarantee uniqueness, and they cause hot block right hand index problems, tell me again why you are using them?

Because they are fast and you don’t CYCLE them.

However, sequences are becoming the new evil in the database, right behind triggers.

Follow

Get every new post delivered to your Inbox.

Join 30 other followers

%d bloggers like this: