Neil Chandler's DBA Blog

A resource for Database Professionals

Archive for November 2011

Implicit Conversion Errors

with 6 comments

A while ago, I failed over a database (as planned) to it’s Dataguard copy, and of course everything worked as expected. Everything, that is, except a couple of reports which get sent directly from the database server early in the morning. The report generation had failed. After some investigation, we discovered that the newly active Dataguard server did not have NLS_DATE_FORMAT set in the environment, and the 2 reports in question were coded something like:

select col1, col2, col3 from user_data where user_date >= '25-Nov-2011 00:00:00';

The select was failing with the error.
 
ORA-01821: date format not recognized
 

If the developer had coded an explicit conversion, then we would not have experienced a problem.

select col1, col2, col3 from user_data
where user_date >= to_date('25-Nov-2011 00:00:00','DD-MON-YYYY HH24:MI:SS');
 

Coding with an implicit date mask is great and works successfully every time, as long as the NLS_DATE_FORMAT in your session matches the date mask you have supplied, which is course it always does. Until something changes and it doesn’t. In my experience, implicit conversion is probably the single greatest source of failure in systems and also one of the hardest to track down. It frequently occurs in a badly designed schema which doesn’t use the correct datatypes. I have seen schemas where everything is being stored a VARCHAR2, including numeric fields. This works fine as Oracle will happily insert implicit to_number functions into your code and return answers in ways which seem correct, until you get some rogue data into the database and everything falls apart.

USER1 @ orcl > -- Create a table but allow generic data, rather than specifying numeric data
USER1 @ orcl > -- The client will take care of validation. Of course it will.
USER1 @ orcl > create table implicit_problem (col1 varchar2(10), col2 varchar2(10));

Table created.

USER1 @ orcl >
USER1 @ orcl > -- Lets fill the table with reasonable data
USER1 @ orcl > insert into implicit_problem values (1,1);

1 row created.

USER1 @ orcl > insert into implicit_problem values (2,10);

1 row created.

USER1 @ orcl > insert into implicit_problem values (3,66);

1 row created.

USER1 @ orcl >
USER1 @ orcl > -- Oracle is putting an implicit to_number around col1*col2 to allow the calculation
USER1 @ orcl > select col1,col2,col1*col2 from implicit_problem;

COL1       COL2        COL1*COL2
---------- ---------- ----------
1          1                   1
2          10                 20
3          66                198

USER1 @ orcl >
USER1 @ orcl > -- And now lets have some incorrectly validated data
USER1 @ orcl > insert into implicit_problem values (4,'A');

1 row created.

USER1 @ orcl >
USER1 @ orcl >
USER1 @ orcl > -- And now the implicit conversion fails
USER1 @ orcl > select col1,col2,col1*col2 from implicit_problem;
ERROR:
ORA-01722: invalid number

no rows selected

USER1 @ orcl >
USER1 @ orcl > -- Cleanup
USER1 @ orcl > drop table implicit_problem;

Table dropped.
 



It’s much easier (and quicker) to catch bad data going into a system than it is to perform problem resolution. Always code explicitly for your data types. Implicit conversion in yuor coding invariably leads to hard-to-find bugs.

Written by Neil Chandler

28th November 2011 at 09:31

The 10046 trace. Largely useless, isn’t it?

with 13 comments

The other night I was sat in the pub with some like-minded individuals discussing the relative merits of the 10046 trace (we Rock! in the pub, dudes!) and somebody asked me how often I has actually used it in anger? A well-respected DBA / Architect maintained it was a pretty useless and difficult option to use, given the topology of modern applications (e.g. How do you find the correct session with all that connection pooling going on from multiple web servers.)

My answer surprised me – I thought back to one client where I spent 90% of my time performance tuning a large (TiB-ish) OLTP/Batch hybrid system and concluded that I had ran a 10046 against production about once a year. Once. So if the 10046 is the holy grail of plan information, why wasn’t I using it that much. And why did I never use a 10053 against Production there?

The answer for me is a little more complex than that given in the pub:

1. as stated above, it’s hard to catch the in-flight session unless the application is instrumented to inject the trace statement when needed (and how many applications are instrumented to help you discover problems? Screen ST03 in SAP is very helpful. Any others in major ERP’s? Thought not.)

2. In many places that I have worked, getting authorisation to make any a change to a 24×7 mission-critical system is highly bureaucratic, involving cast-iron justification for the change and it’s positive benefits, requirement that there will be no adverse effects because of the change, very senior sign-off, more red-tape, etc. This causes a significant amount of work simply to put a trace on, even if you can catch the SQL. This can end up being more work than actually fixing the problem.

3. An awful lot of SQL tuning is a fairly blunt affair, as the developer (who is frequently database-blind) has usually missed something obvious. It is frequently to do with incorrectly using or not using an index (or using a poor index), or lack of filtering data at the right point to minimise the I/O.

4. Most importantly, if you have AWR and ASH, it’s not really needed. For each plan created by the optimizer the database stores the bind variables along with it, so we can usually understand why the optimizer makes the decisions it makes. ASH contains the main event waits. Why bother trying to capture all of the detail in a trace when you really don’t need that much detail, and it’s all already there; ready to be extracted from the relevant tables (e.g. dba_hist_active_sess_history, dba_hist_sql_plan and dba_hist_sql_bind.)

I have never used a 10053 trace on a Production system. I have simply never needed to know the decisions taken by the optimizer in that much detail. Like most DBA’s and Oracle consultants, I don’t go from site-to-site on a weekly basis resolving edge-case problems that the incumbent DBA’s haven’t had the time, or possibly don’t have the skills, to resolve themselves. I usually don’t need that level of confirmation that I’m right about why the plan is wrong, and I don’t have the time to conclusively prove it over and over again – I just need to get the fix into place and move onto the next problem.

That said, perhaps I should get fully to the bottom of these problems to ensure that they never occur again – which is the fundamental problem with Adaptive Cursor Sharing.

Written by Neil Chandler

22nd November 2011 at 20:06

Oracle Timestamp Processing – mildly annoying

with 3 comments

I was writing a small piece of SQL this morning which I needed to account for daylight savings time correctly. All of my databases run in UTC, so a quick foray into using TIMESTAMP AS TIME ZONE seemed the easiest way to accomplish this. So, I code it up and want to test my code to ensure that the timestamp operates correctly on both sides of UK Daylight savings. I figured that the easiest way to do this would be to used the old Oracle initialisation parameter FIXED_DATE. You can set this in the database on the fly and observe the results immediately. So to for testing (in a Dev database which only I was using). Guess what? FIXED_DATE works perfectly for SYSDATE. However, it is completely ignored for SYSTIMESTAMP! WHAT??? Who in Oracle missed this one? Let me show you how this (doesn’t) work, with a workaround for my testing included in the example to show how neatly TIMESTAMP AS TIME ZONE does work:

> alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS'

Session altered.

> select * from v$timezone_names where tzname like '%London%'

TZNAME                                                           TZABBREV
---------------------------------------------------------------- ----------------------------------------------------------------
Europe/London                                                    LMT
Europe/London                                                    GMT
Europe/London                                                    BST
Europe/London                                                    BDST

Before we start, check the current date
> select sysdate from dual

SYSDATE
--------------------
01-NOV-2011 11:50:15

Before Daylight Savings Time changes (on 29th October 02:00:00)
> alter system set fixed_date = '28-OCT-2011 08:00:00' scope=memory

System altered.

> select systimestamp at time zone 'Europe/London' from dual

SYSTIMESTAMPATTIMEZONE'EUROPE/LONDON'
---------------------------------------------------------------------------
01-NOV-11 11.50.15.666000 EUROPE/LONDON

The timestamp is unaffected by setting FIXED_DATE!!! Oracle! Grrrr!

> select sysdate from dual

SYSDATE
--------------------
28-OCT-2011 08:00:00

The UTC time is correct.

So I need to take the sysdate and transform it into a timestamp at timezone
London should be 1 hour ahead of UTC at this point
> select to_timestamp(sysdate) at time zone 'Europe/London' from dual

TO_TIMESTAMP(SYSDATE)ATTIMEZONE'EUROPE/LONDON'
---------------------------------------------------------------------------
28-OCT-11 09.00.00 EUROPE/LONDON

###########################################################################
Now to roll the time on and re-test
###########################################################################
After Daylight Savings Time changes (on 29th October 02:00:00)
> alter system set fixed_date = '30-OCT-2011 08:00:00' scope=memory

System altered.

> select sysdate from dual

SYSDATE
--------------------
30-OCT-2011 08:00:00

Now Daylight savings should by the same as UTC, not 1 hour ahead
> select to_timestamp(sysdate) at time zone 'Europe/London' from dual

TO_TIMESTAMP(SYSDATE)ATTIMEZONE'EUROPE/LONDON'
---------------------------------------------------------------------------
30-OCT-11 08.00.00 EUROPE/LONDON

Yey! Daylight davings is correct for London. 

Remove the fixed_date setting
> alter system set fixed_date=NONE scope=memory

System altered.

And check the date
> select sysdate from dual

SYSDATE
--------------------
01-NOV-2011 11:50:16
------------------------------------------------------------

There you have it. How very mildly annoying. Can’t use TIME ZONE with SYSDATE, can’t use FIXED_DATE with SYSTIMESTAMP.

Written by Neil Chandler

1st November 2011 at 12:18

Posted in Programming

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: