Neil Chandler's DBA Blog

A resource for Database Professionals

Implicit Conversion Errors

with 6 comments

A while ago, I failed over a database (as planned) to it’s Dataguard copy, and of course everything worked as expected. Everything, that is, except a couple of reports which get sent directly from the database server early in the morning. The report generation had failed. After some investigation, we discovered that the newly active Dataguard server did not have NLS_DATE_FORMAT set in the environment, and the 2 reports in question were coded something like:

select col1, col2, col3 from user_data where user_date >= '25-Nov-2011 00:00:00';

The select was failing with the error.
 
ORA-01821: date format not recognized
 

If the developer had coded an explicit conversion, then we would not have experienced a problem.

select col1, col2, col3 from user_data
where user_date >= to_date('25-Nov-2011 00:00:00','DD-MON-YYYY HH24:MI:SS');
 

Coding with an implicit date mask is great and works successfully every time, as long as the NLS_DATE_FORMAT in your session matches the date mask you have supplied, which is course it always does. Until something changes and it doesn’t. In my experience, implicit conversion is probably the single greatest source of failure in systems and also one of the hardest to track down. It frequently occurs in a badly designed schema which doesn’t use the correct datatypes. I have seen schemas where everything is being stored a VARCHAR2, including numeric fields. This works fine as Oracle will happily insert implicit to_number functions into your code and return answers in ways which seem correct, until you get some rogue data into the database and everything falls apart.

USER1 @ orcl > -- Create a table but allow generic data, rather than specifying numeric data
USER1 @ orcl > -- The client will take care of validation. Of course it will.
USER1 @ orcl > create table implicit_problem (col1 varchar2(10), col2 varchar2(10));

Table created.

USER1 @ orcl >
USER1 @ orcl > -- Lets fill the table with reasonable data
USER1 @ orcl > insert into implicit_problem values (1,1);

1 row created.

USER1 @ orcl > insert into implicit_problem values (2,10);

1 row created.

USER1 @ orcl > insert into implicit_problem values (3,66);

1 row created.

USER1 @ orcl >
USER1 @ orcl > -- Oracle is putting an implicit to_number around col1*col2 to allow the calculation
USER1 @ orcl > select col1,col2,col1*col2 from implicit_problem;

COL1       COL2        COL1*COL2
---------- ---------- ----------
1          1                   1
2          10                 20
3          66                198

USER1 @ orcl >
USER1 @ orcl > -- And now lets have some incorrectly validated data
USER1 @ orcl > insert into implicit_problem values (4,'A');

1 row created.

USER1 @ orcl >
USER1 @ orcl >
USER1 @ orcl > -- And now the implicit conversion fails
USER1 @ orcl > select col1,col2,col1*col2 from implicit_problem;
ERROR:
ORA-01722: invalid number

no rows selected

USER1 @ orcl >
USER1 @ orcl > -- Cleanup
USER1 @ orcl > drop table implicit_problem;

Table dropped.
 



It’s much easier (and quicker) to catch bad data going into a system than it is to perform problem resolution. Always code explicitly for your data types. Implicit conversion in yuor coding invariably leads to hard-to-find bugs.

Written by Neil Chandler

28th November 2011 at 09:31

The 10046 trace. Largely useless, isn’t it?

with 13 comments

The other night I was sat in the pub with some like-minded individuals discussing the relative merits of the 10046 trace (we Rock! in the pub, dudes!) and somebody asked me how often I has actually used it in anger? A well-respected DBA / Architect maintained it was a pretty useless and difficult option to use, given the topology of modern applications (e.g. How do you find the correct session with all that connection pooling going on from multiple web servers.)

My answer surprised me – I thought back to one client where I spent 90% of my time performance tuning a large (TiB-ish) OLTP/Batch hybrid system and concluded that I had ran a 10046 against production about once a year. Once. So if the 10046 is the holy grail of plan information, why wasn’t I using it that much. And why did I never use a 10053 against Production there?

The answer for me is a little more complex than that given in the pub:

1. as stated above, it’s hard to catch the in-flight session unless the application is instrumented to inject the trace statement when needed (and how many applications are instrumented to help you discover problems? Screen ST03 in SAP is very helpful. Any others in major ERP’s? Thought not.)

2. In many places that I have worked, getting authorisation to make any a change to a 24×7 mission-critical system is highly bureaucratic, involving cast-iron justification for the change and it’s positive benefits, requirement that there will be no adverse effects because of the change, very senior sign-off, more red-tape, etc. This causes a significant amount of work simply to put a trace on, even if you can catch the SQL. This can end up being more work than actually fixing the problem.

3. An awful lot of SQL tuning is a fairly blunt affair, as the developer (who is frequently database-blind) has usually missed something obvious. It is frequently to do with incorrectly using or not using an index (or using a poor index), or lack of filtering data at the right point to minimise the I/O.

4. Most importantly, if you have AWR and ASH, it’s not really needed. For each plan created by the optimizer the database stores the bind variables along with it, so we can usually understand why the optimizer makes the decisions it makes. ASH contains the main event waits. Why bother trying to capture all of the detail in a trace when you really don’t need that much detail, and it’s all already there; ready to be extracted from the relevant tables (e.g. dba_hist_active_sess_history, dba_hist_sql_plan and dba_hist_sql_bind.)

I have never used a 10053 trace on a Production system. I have simply never needed to know the decisions taken by the optimizer in that much detail. Like most DBA’s and Oracle consultants, I don’t go from site-to-site on a weekly basis resolving edge-case problems that the incumbent DBA’s haven’t had the time, or possibly don’t have the skills, to resolve themselves. I usually don’t need that level of confirmation that I’m right about why the plan is wrong, and I don’t have the time to conclusively prove it over and over again – I just need to get the fix into place and move onto the next problem.

That said, perhaps I should get fully to the bottom of these problems to ensure that they never occur again – which is the fundamental problem with Adaptive Cursor Sharing.

Written by Neil Chandler

22nd November 2011 at 20:06

Oracle Timestamp Processing – mildly annoying

with 3 comments

I was writing a small piece of SQL this morning which I needed to account for daylight savings time correctly. All of my databases run in UTC, so a quick foray into using TIMESTAMP AS TIME ZONE seemed the easiest way to accomplish this. So, I code it up and want to test my code to ensure that the timestamp operates correctly on both sides of UK Daylight savings. I figured that the easiest way to do this would be to used the old Oracle initialisation parameter FIXED_DATE. You can set this in the database on the fly and observe the results immediately. So to for testing (in a Dev database which only I was using). Guess what? FIXED_DATE works perfectly for SYSDATE. However, it is completely ignored for SYSTIMESTAMP! WHAT??? Who in Oracle missed this one? Let me show you how this (doesn’t) work, with a workaround for my testing included in the example to show how neatly TIMESTAMP AS TIME ZONE does work:

> alter session set nls_date_format='DD-MON-YYYY HH24:MI:SS'

Session altered.

> select * from v$timezone_names where tzname like '%London%'

TZNAME                                                           TZABBREV
---------------------------------------------------------------- ----------------------------------------------------------------
Europe/London                                                    LMT
Europe/London                                                    GMT
Europe/London                                                    BST
Europe/London                                                    BDST

Before we start, check the current date
> select sysdate from dual

SYSDATE
--------------------
01-NOV-2011 11:50:15

Before Daylight Savings Time changes (on 29th October 02:00:00)
> alter system set fixed_date = '28-OCT-2011 08:00:00' scope=memory

System altered.

> select systimestamp at time zone 'Europe/London' from dual

SYSTIMESTAMPATTIMEZONE'EUROPE/LONDON'
---------------------------------------------------------------------------
01-NOV-11 11.50.15.666000 EUROPE/LONDON

The timestamp is unaffected by setting FIXED_DATE!!! Oracle! Grrrr!

> select sysdate from dual

SYSDATE
--------------------
28-OCT-2011 08:00:00

The UTC time is correct.

So I need to take the sysdate and transform it into a timestamp at timezone
London should be 1 hour ahead of UTC at this point
> select to_timestamp(sysdate) at time zone 'Europe/London' from dual

TO_TIMESTAMP(SYSDATE)ATTIMEZONE'EUROPE/LONDON'
---------------------------------------------------------------------------
28-OCT-11 09.00.00 EUROPE/LONDON

###########################################################################
Now to roll the time on and re-test
###########################################################################
After Daylight Savings Time changes (on 29th October 02:00:00)
> alter system set fixed_date = '30-OCT-2011 08:00:00' scope=memory

System altered.

> select sysdate from dual

SYSDATE
--------------------
30-OCT-2011 08:00:00

Now Daylight savings should by the same as UTC, not 1 hour ahead
> select to_timestamp(sysdate) at time zone 'Europe/London' from dual

TO_TIMESTAMP(SYSDATE)ATTIMEZONE'EUROPE/LONDON'
---------------------------------------------------------------------------
30-OCT-11 08.00.00 EUROPE/LONDON

Yey! Daylight davings is correct for London. 

Remove the fixed_date setting
> alter system set fixed_date=NONE scope=memory

System altered.

And check the date
> select sysdate from dual

SYSDATE
--------------------
01-NOV-2011 11:50:16
------------------------------------------------------------

There you have it. How very mildly annoying. Can’t use TIME ZONE with SYSDATE, can’t use FIXED_DATE with SYSTIMESTAMP.

Written by Neil Chandler

1st November 2011 at 12:18

Posted in Programming

Dennis Ritchie RIP

leave a comment »

There can be few scientists who have contributed so much to the world as Dennis Ritchie. Completely anonymous to the world at large, and to far too much of the computing fraternity too, his involvement in the development of C – the first portable programming language, and Unix cannot be understated. Who uses Unix? Well, everybody. What is the O/S on embedded devices? What did Steve Jobs base his Mac O/S and Mobile Operating systems on? Upon what did Bill Gates base the operating system for the new IBM PC in 1980? What do corporations across the world use to power their servers? What is it all built on? Unless you’re running some proprietary mainframe software, the building blocks of what you are using were set, to some degree, by Dennis Ritchie.

Whilst I never met him myself I did work with his sister, Lynn, for several years at a software house in the North East of England, where she still resides. She told me some of the stories about him; how the invention of C and Unix were really aims in making computing more portable, more standard, easier to use. How all that Ritchie, Ken Thompson, Brian Kernighan and the others got for this was their salary [although they did make a few quid on top out of their books :-) ], and the ability to distribute Unix free, with C, to all and sundry as Bell labs, a subsidiary of AT&T, had no financial interest in computing as they were a regulated telephone monopoly at the time.

Thanks Dennis.

Written by Neil Chandler

16th October 2011 at 17:19

Posted in Programming, Unix

Tagged with , , ,

Management and Infrastructure SIG – Thank You

leave a comment »

Well, we had the final Management and Infrastructure SIG last week, and whilst the attendance wasn’t huge, the content was simply excellent. Two great presentations about Enterprise Manager from Niall Litchfield of Maxima, and from Mark Westwood and Carl Holmes of Morrissons.

We also had an insightful and revealing presentation on outsourcing from Piet De Visser from Logica, and an eye-opening technical feast showing how to maximise the performance from your Intel hardware from Steve Shaw of Intel.

I would also like to thank Oracle for their support, and especially Mike Edgington for his talk about the Oracle support organization and how it is evolving.

Ye Gods, this feels like an Oscars acceptance speech!

Many thanks to the UKOUG Staff in general, and specifically Marisa Harris and Jake Potter for their organisation and assistance in making this SIG work.

Finally, I would like to thank Martin Widlake for his Chairmanship, and his opposition during our presentation about what DBA’s really need to know.

And so ends the UKOUG Management and Infrastructure SIG, now to be merged with the RAC SIG and be reborn as the UKOUG Availability, Infrastructure and Management (AIM) SIG, where we will still talk about everything that we have previously, but extend the remit around RAC, Exadata, Goldengate, Partitioning, VLDB’s, and all those other lovely cost-extra options that Oracle dearly would like to sell to us all.

Thank you to everyone who attended the M&I SIG – we hope you got as much out of it as we did.

Written by Neil Chandler

5th October 2011 at 14:22

Posted in UKOUG

UKOUG Management & Infrastructure SIG – New Date

leave a comment »

To blatantly steal this post from Martin Widlake, as I’m Deputy Chairman of the SIG, and I’m also presenting:

I ought to just mention that the UKOUG Management and Infrastructure SIG has moved from Tuesday September 20th to Tuesday September 27th (so two weeks from today). It had to be moved as we had a bit of a problem with the room booking. It will be in the usual venue of the Oracle City Office in London and is, of course, free to members of the UK Oracle User Group. {If you are not a member, you can come along for a fee – but if you are interested in coming along to see what a UKOUG Special Interest Group meeting is all about, send me a mail}.

So, if you fancy some free information about:

  • Getting the best out of your intel hardware (and BIOS in general) {Steve Shaw from Intel}
  • The latest on Oracle GRID and OEM {both presentations by customers not Oracle, one by Niall Litchfield and one by ‘Morrisons’,though Oracle supported us very well by finding one of the customers!)}
  • A presentation and discussion on Outsourcing by Piet de Visser
  •  A consideration of how deep into the technology real-world DBAs need to go to solve issues (Martin Widlake and myself)
  • An Oracle support update

Well, register for the event and I’ll see you in two weeks!

Written by Neil Chandler

13th September 2011 at 12:34

Broken Kindle

with 2 comments

I just thought I would try to get restarted on this blog by telling a quick story about my broken Kindle. I’ll crowbar a bit about Oracle in at the end…

Last night I left my friends in the pub at 9pm and got onto the Tube for my 40 minute journey home to North London. I got out my (2 month old) Kindle and switched it on to continue with the book I’m reading… and the screen only half-restored. The top-half was static and unchanging from the screen saver. Obviously broken, but how? There has been no impact, no twisting or anything. Apparently (checking the internet), this just happens to Kindles sometimes. Manufacturing fault or some such.

Now, I bought this Kindle at PC World (OK, I know, but I did – same price and no couriers involved) so this morning I packaged it up in the original packaging (but without receipt) and drove down to PC World ready for a good argument with “The Tech Guys” about getting a replacement. It all started so promisingly:

Me: “My kindle is broken. The screen is broken.”
TechGuy: “Did you buy extended cover”
Me: “No”
TechGuy: “Well it’s not replaceable as that would be classed as accidental damage.”
Me: “It hasn’t been in an accident. It’s not damaged, it’s broken”
TechGuy: “OK, you’ll have to call Amazon about it”
Me – getting all warmed up, smiling: “No. I bought it here**”
TechGuy (immediately): “OK, I’ll get a manager”

What? No argument? Eh? Anyway, the manager comes over, checks the kindle for damage (there is none), asks for receipt but I don’t have it. However, I have proof of purchase (credit card statement) so he checks the computer, goes into the back shop and hands me a new 3G Kindle. Thank you PC World. Unexpectedly good service (despite the initial fob-off attempt.)

Right now, I’m re-loading all of the Oracle documentation back onto it. Oracle very thoughtfully provide all of their doco in .mobi format. Given that I have been to sites which have no internet link available and the servers are in very closed DMZ’s, and you’re not allowed to use USB sticks or DVD’s (the ports are blocked), having all of the Oracle doco in your pocket on a searchable Kindle can be pretty handy.

For the record, I think the Kindle is an truely excellent device. I’m hooked – it saves me carrying around books, manuals, PDF’s, white paper and (at a pinch) it can browse the web in most countries in the world for free over 3G. Not the best browser in the world, but great to check your emails and the news on a beach in Spain, the USA or New Zealand, whilst listening to the (limited but adequate) MP3 player.

Neil.

**UK Law dictates that your contract is with the seller (i.e. the shop) and not the manufacturer. Don’t be fobbed off!

Written by Neil Chandler

27th July 2011 at 13:07

My alert.log is broken

leave a comment »

Well, not mine, but a recent post on the oracle-l mailing list which I thought was worth linking to and repeating. Firstly, because it’s interesting, and secondly, because it shows some good problem solving skills by both the poster and the wider Oracle community.

The poster in question was experiencing a problem whereby his alert log was not updating…

We have an alert.log that was last updated by the database on May 6th.
Strangely enough, the log.xml in the alert directory of the diag destination
is being updated normally, it is just the plain text alert.log in the trace
directory that is not updated. We have bounced the database, changed the
diag_destination parameter and I have even grepped all the file descriptors
in /proc/*/fd for traces of a possibly opened alert.log - nothing, the alert.log
is still not being updated. I tried dbms_system.ksdwrt to force a write to
the alert.log - again, the log.xml is updated, the plain text is not.

Read the rest of this entry »

Written by Neil Chandler

24th May 2011 at 07:30

So how big is that disk?

with one comment

I’m doing a fair bit of SAN-based work at the moment, migrating a bunch of Oracle databases from EMC to HP (posts to follow regarding Orion testing and other related topics). One thing that annoys me is the way SAN Manufacturers have changed the meaning of technical terms over time to suit themselves. Firstly, they changed the meaning of RAID from “Redundant Array of Inexpensive Disk” to “Redundant Array of Independent Disk”. Apparently it’s hard to put a $1,000,000 price tag on “inexpensive” disk.

Also, the industry changed what a Megabyte is. This really annoys me as it can lead to unexpected shortfalls in the LUN allocations if your traditional Oracle Megabyte is different to the SAN Admin’s Megabyte.

In our industry, the terms “kilo”, “mega”, “giga“, “tera”, “peta”, and “exa” are commonly used prefixes for computing performance and capacity. SAN manufacturers use the terms defined in “powers of ten” measurement units:

  • A kilobyte (KB) is equal to 1,000 (103) bytes.
  • A megabyte (MB) is equal to 1,000,000 (106) bytes.
  • A gigabyte (GB) is equal to 1,000,000,000 (109) bytes.
  • A terabyte (TB) is equal to 1,000,000,000,000 (1012) bytes.
  • A petabyte (PB) is equal to 1,000,000,000,000,000 (1015) bytes
  • An exabyte (EB) is equal to 1,000,000,000,000,000,000 (1018) bytes

Most operating systems components and (importantly for us) Oracle use “powers of two” measurement units rather than “power of ten” units. They are defined as:

  • A kibibyte (KiB) is equal to 1,024 (210) bytes.
  • A mebibyte (MiB) is equal to 1,048,576 (220) bytes.
  • A gibibyte (GiB) is equal to 1,073,741,824 (230) bytes.
  • A tebibyte (TiB) is equal to 1,099,511,627,776 (240) bytes.
  • A pebibyte (PiB) is equal to 1,125,899,906,842,624 (250) bytes.
  • An exbibyte (EiB) is equal to 1,152,921,504,606,846,967 (260) bytes.

That means that 1 terabyte is about 9% smaller than a tebibyte. Thanks, SAN manufacturers, for making our lives just a tiny bit more difficult than it needed to be so you can market drives as being seemingly larger than they really are (in my head anyway.)

Written by Neil Chandler

10th May 2011 at 10:47

A little rant about DBA’s

with 17 comments

Well, a while ago I was doing some interviews for a client for a Production support DBA. This was for a short term contract to look after a few systems while the incumbent was off doing more interesting project work. The thing I discovered was the absolutely dire level of knowledge displayed by the interviewees about Oracle. Things that I regard as fundamental to the understanding of how Oracle works were simply unanswered.

Q: What’s the difference between and instance and a database?
Q: What does “nested loop” mean in an execution plan?
Q: Name the memory areas within an Oracle instance.

Several candidates answered these either very badly, or not at all. Not at all? Any you say you’ve been working with Oracle since Oracle 7 and you can’t answer these questions? You have been working with Oracle for 10 years and you can’t name the PGA? or ANY components within the SGA?

If you are reading this blog, I suspect that you know the answers to the above questions. You’re the sort of person who spends a little of their own time doing research. I don’t think the questions are demanding (or are they? please tell me they are not.) Who actually hires these people? Are these chancers and charletans deceiving everyone, or just deceiving themselves.

And they were all OCP certified. How? And how little does that certification mean in reality. A tick on a CV / resume to get it past the box tickers in HR and agents.

You know, I’m not angry, I’m just disappointed in the lack of professional standards that you sometimes come across in our industry.

(Note: this post has been delayed to protect the guilty)

Written by Neil Chandler

5th May 2011 at 10:56

Posted in Management

Follow

Get every new post delivered to your Inbox.