Archive for November 2011
Implicit Conversion Errors
A while ago, I failed over a database (as planned) to it’s Dataguard copy, and of course everything worked as expected. Everything, that is, except a couple of reports which get sent directly from the database server early in the morning. The report generation had failed. After some investigation, we discovered that the newly active Dataguard server did not have NLS_DATE_FORMAT set in the environment, and the 2 reports in question were coded something like:
select col1, col2, col3 from user_data where user_date >= '25-Nov-2011 00:00:00';The select was failing with the error. ORA-01821: date format not recognized
If the developer had coded an explicit conversion, then we would not have experienced a problem.
select col1, col2, col3 from user_data
where user_date >= to_date('25-Nov-2011 00:00:00','DD-MON-YYYY HH24:MI:SS');
Coding with an implicit date mask is great and works successfully every time, as long as the NLS_DATE_FORMAT in your session matches the date mask you have supplied, which is course it always does. Until something changes and it doesn’t. In my experience, implicit conversion is probably the single greatest source of failure in systems and also one of the hardest to track down. It frequently occurs in a badly designed schema which doesn’t use the correct datatypes. I have seen schemas where everything is being stored a VARCHAR2, including numeric fields. This works fine as Oracle will happily insert implicit to_number functions into your code and return answers in ways which seem correct, until you get some rogue data into the database and everything falls apart.
USER1 @ orcl > -- Create a table but allow generic data, rather than specifying numeric data USER1 @ orcl > -- The client will take care of validation. Of course it will. USER1 @ orcl > create table implicit_problem (col1 varchar2(10), col2 varchar2(10)); Table created. USER1 @ orcl > USER1 @ orcl > -- Lets fill the table with reasonable data USER1 @ orcl > insert into implicit_problem values (1,1); 1 row created. USER1 @ orcl > insert into implicit_problem values (2,10); 1 row created. USER1 @ orcl > insert into implicit_problem values (3,66); 1 row created. USER1 @ orcl > USER1 @ orcl > -- Oracle is putting an implicit to_number around col1*col2 to allow the calculation USER1 @ orcl > select col1,col2,col1*col2 from implicit_problem; COL1 COL2 COL1*COL2 ---------- ---------- ---------- 1 1 1 2 10 20 3 66 198 USER1 @ orcl > USER1 @ orcl > -- And now lets have some incorrectly validated data USER1 @ orcl > insert into implicit_problem values (4,'A'); 1 row created. USER1 @ orcl > USER1 @ orcl > USER1 @ orcl > -- And now the implicit conversion fails USER1 @ orcl > select col1,col2,col1*col2 from implicit_problem; ERROR: ORA-01722: invalid number no rows selected USER1 @ orcl > USER1 @ orcl > -- Cleanup USER1 @ orcl > drop table implicit_problem; Table dropped.
It’s much easier (and quicker) to catch bad data going into a system than it is to perform problem resolution. Always code explicitly for your data types. Implicit conversion in yuor coding invariably leads to hard-to-find bugs.
The 10046 trace. Largely useless, isn’t it?
The other night I was sat in the pub with some like-minded individuals discussing the relative merits of the 10046 trace (we Rock! in the pub, dudes!) and somebody asked me how often I has actually used it in anger? A well-respected DBA / Architect maintained it was a pretty useless and difficult option to use, given the topology of modern applications (e.g. How do you find the correct session with all that connection pooling going on from multiple web servers.)
My answer surprised me – I thought back to one client where I spent 90% of my time performance tuning a large (TiB-ish) OLTP/Batch hybrid system and concluded that I had ran a 10046 against production about once a year. Once. So if the 10046 is the holy grail of plan information, why wasn’t I using it that much. And why did I never use a 10053 against Production there?
The answer for me is a little more complex than that given in the pub:
1. as stated above, it’s hard to catch the in-flight session unless the application is instrumented to inject the trace statement when needed (and how many applications are instrumented to help you discover problems? Screen ST03 in SAP is very helpful. Any others in major ERP’s? Thought not.)
2. In many places that I have worked, getting authorisation to make any a change to a 24×7 mission-critical system is highly bureaucratic, involving cast-iron justification for the change and it’s positive benefits, requirement that there will be no adverse effects because of the change, very senior sign-off, more red-tape, etc. This causes a significant amount of work simply to put a trace on, even if you can catch the SQL. This can end up being more work than actually fixing the problem.
3. An awful lot of SQL tuning is a fairly blunt affair, as the developer (who is frequently database-blind) has usually missed something obvious. It is frequently to do with incorrectly using or not using an index (or using a poor index), or lack of filtering data at the right point to minimise the I/O.
4. Most importantly, if you have AWR and ASH, it’s not really needed. For each plan created by the optimizer the database stores the bind variables along with it, so we can usually understand why the optimizer makes the decisions it makes. ASH contains the main event waits. Why bother trying to capture all of the detail in a trace when you really don’t need that much detail, and it’s all already there; ready to be extracted from the relevant tables (e.g. dba_hist_active_sess_history, dba_hist_sql_plan and dba_hist_sql_bind.)
I have never used a 10053 trace on a Production system. I have simply never needed to know the decisions taken by the optimizer in that much detail. Like most DBA’s and Oracle consultants, I don’t go from site-to-site on a weekly basis resolving edge-case problems that the incumbent DBA’s haven’t had the time, or possibly don’t have the skills, to resolve themselves. I usually don’t need that level of confirmation that I’m right about why the plan is wrong, and I don’t have the time to conclusively prove it over and over again – I just need to get the fix into place and move onto the next problem.
That said, perhaps I should get fully to the bottom of these problems to ensure that they never occur again – which is the fundamental problem with Adaptive Cursor Sharing.
Oracle Timestamp Processing – mildly annoying
I was writing a small piece of SQL this morning which I needed to account for daylight savings time correctly. All of my databases run in UTC, so a quick foray into using TIMESTAMP AS TIME ZONE seemed the easiest way to accomplish this. So, I code it up and want to test my code to ensure that the timestamp operates correctly on both sides of UK Daylight savings. I figured that the easiest way to do this would be to used the old Oracle initialisation parameter FIXED_DATE. You can set this in the database on the fly and observe the results immediately. So to for testing (in a Dev database which only I was using). Guess what? FIXED_DATE works perfectly for SYSDATE. However, it is completely ignored for SYSTIMESTAMP! WHAT??? Who in Oracle missed this one? Let me show you how this (doesn’t) work, with a workaround for my testing included in the example to show how neatly TIMESTAMP AS TIME ZONE does work:
> alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS' Session altered. > select * from v$timezone_names where tzname like '%London%' TZNAME TZABBREV ---------------------------------------------------------------- ---------------------------------------------------------------- Europe/London LMT Europe/London GMT Europe/London BST Europe/London BDST Before we start, check the current date > select sysdate from dual SYSDATE -------------------- 01-NOV-2011 11:50:15 Before Daylight Savings Time changes (on 29th October 02:00:00) > alter system set fixed_date = '28-OCT-2011 08:00:00' scope=memory System altered. > select systimestamp at time zone 'Europe/London' from dual SYSTIMESTAMPATTIMEZONE'EUROPE/LONDON' --------------------------------------------------------------------------- 01-NOV-11 11.50.15.666000 EUROPE/LONDON The timestamp is unaffected by setting FIXED_DATE!!! Oracle! Grrrr! > select sysdate from dual SYSDATE -------------------- 28-OCT-2011 08:00:00 The UTC time is correct. So I need to take the sysdate and transform it into a timestamp at timezone London should be 1 hour ahead of UTC at this point > select to_timestamp(sysdate) at time zone 'Europe/London' from dual TO_TIMESTAMP(SYSDATE)ATTIMEZONE'EUROPE/LONDON' --------------------------------------------------------------------------- 28-OCT-11 09.00.00 EUROPE/LONDON ########################################################################### Now to roll the time on and re-test ########################################################################### After Daylight Savings Time changes (on 29th October 02:00:00) > alter system set fixed_date = '30-OCT-2011 08:00:00' scope=memory System altered. > select sysdate from dual SYSDATE -------------------- 30-OCT-2011 08:00:00 Now Daylight savings should by the same as UTC, not 1 hour ahead > select to_timestamp(sysdate) at time zone 'Europe/London' from dual TO_TIMESTAMP(SYSDATE)ATTIMEZONE'EUROPE/LONDON' --------------------------------------------------------------------------- 30-OCT-11 08.00.00 EUROPE/LONDON Yey! Daylight davings is correct for London. Remove the fixed_date setting > alter system set fixed_date=NONE scope=memory System altered. And check the date > select sysdate from dual SYSDATE -------------------- 01-NOV-2011 11:50:16 ------------------------------------------------------------
There you have it. How very mildly annoying. Can’t use TIME ZONE with SYSDATE, can’t use FIXED_DATE with SYSTIMESTAMP.