Neil Chandler's DBA Blog

A resource for Database Professionals

The 10046 trace. Largely useless, isn’t it?

with 13 comments

The other night I was sat in the pub with some like-minded individuals discussing the relative merits of the 10046 trace (we Rock! in the pub, dudes!) and somebody asked me how often I has actually used it in anger? A well-respected DBA / Architect maintained it was a pretty useless and difficult option to use, given the topology of modern applications (e.g. How do you find the correct session with all that connection pooling going on from multiple web servers.)

My answer surprised me – I thought back to one client where I spent 90% of my time performance tuning a large (TiB-ish) OLTP/Batch hybrid system and concluded that I had ran a 10046 against production about once a year. Once. So if the 10046 is the holy grail of plan information, why wasn’t I using it that much. And why did I never use a 10053 against Production there?

The answer for me is a little more complex than that given in the pub:

1. as stated above, it’s hard to catch the in-flight session unless the application is instrumented to inject the trace statement when needed (and how many applications are instrumented to help you discover problems? Screen ST03 in SAP is very helpful. Any others in major ERP’s? Thought not.)

2. In many places that I have worked, getting authorisation to make any a change to a 24×7 mission-critical system is highly bureaucratic, involving cast-iron justification for the change and it’s positive benefits, requirement that there will be no adverse effects because of the change, very senior sign-off, more red-tape, etc. This causes a significant amount of work simply to put a trace on, even if you can catch the SQL. This can end up being more work than actually fixing the problem.

3. An awful lot of SQL tuning is a fairly blunt affair, as the developer (who is frequently database-blind) has usually missed something obvious. It is frequently to do with incorrectly using or not using an index (or using a poor index), or lack of filtering data at the right point to minimise the I/O.

4. Most importantly, if you have AWR and ASH, it’s not really needed. For each plan created by the optimizer the database stores the bind variables along with it, so we can usually understand why the optimizer makes the decisions it makes. ASH contains the main event waits. Why bother trying to capture all of the detail in a trace when you really don’t need that much detail, and it’s all already there; ready to be extracted from the relevant tables (e.g. dba_hist_active_sess_history, dba_hist_sql_plan and dba_hist_sql_bind.)

I have never used a 10053 trace on a Production system. I have simply never needed to know the decisions taken by the optimizer in that much detail. Like most DBA’s and Oracle consultants, I don’t go from site-to-site on a weekly basis resolving edge-case problems that the incumbent DBA’s haven’t had the time, or possibly don’t have the skills, to resolve themselves. I usually don’t need that level of confirmation that I’m right about why the plan is wrong, and I don’t have the time to conclusively prove it over and over again – I just need to get the fix into place and move onto the next problem.

That said, perhaps I should get fully to the bottom of these problems to ensure that they never occur again – which is the fundamental problem with Adaptive Cursor Sharing.

About these ads

Written by Neil Chandler

22nd November 2011 at 20:06

13 Responses

Subscribe to comments with RSS.

  1. [...] a friend of mine, Neil Chandler, has just posted on this topic, giving his reasons why you don’t run a 10046 trace on production. Neil raises some good points about how difficult it can be to get permission to do something as [...]

  2. Hi.

    Getting the trace “in flight” is easy using DBMS_MONITOR if the developers have used DBMS_APPLICATION_INFO to write module and action information to the V$SESSION view. Trouble is they never do. :)

    In many cases, it’s ASH or nothing. If you don’t have the D&T pack, that’ll be nothing then… :)

    Agree with the differentiation between DBA and performance consultant to the stars. Getting things working again is the number one priority for most DBAs, even if that means never getting to the bottom of the problem.

    Cheers

    Tim…

    Tim Hall

    22nd November 2011 at 21:23

    • Tim,

      You can simulate bits of ASH/ASM with a bunch of scripts querying v$ tables, but it’s painful compared to the built in stuff, and SQL Monitoring too.

      Note to license purchasers – get the D&T pack, it pays for itself.

      regs

      Neil.

      Neil Chandler

      22nd November 2011 at 22:06

    • All good points.

      As you say, most sql performance issues are diagnosable via the execution plan (including the predicate section of course) and there are very few other serious performance issues which don’t leave big clues lying aroung AWR & ASH.
      And the excellent real-time sql monitoring is a mixture of both features above – and actually so much easier for parallel executions – meaning that there’s even less need for a 10046.

      The main usage that I’ve had for 10053s is for confirming the odd optimizer bug and even then it’s usually easier to rewrite the query to avoid than to go raising bugs and going through the internal red tape for getting patches approved/applied.

      But it all depends on license and version. And I have had to ask for a number of 10046 production traces on 9.2.0.8 recently after being asked the standard challenge of saying why a particular process in the batch overran using only statspack.

      Cheers,
      Dominic

      Dom Brooks

      22nd November 2011 at 22:21

    • even, if you know the SQL_ID, you can trace it. please see
      http://blog.tanelpoder.com/2010/06/23/the-full-power-of-oracles-diagnostic-events-part-2-oradebug-doc-and-11g-improvements/ – if you need to flush the statement from shared pool for any reasons, it’s not that complicated: http://blogs.oracle.com/mandalika/entry/oracle_rdbms_flushing_a_single
      this should solve 1) and reduce the problems with 2)
      for 4) it’s often funny to see how customers react if you ask them for valid licenses – where only 10046 is for free – and helps in negotiating 2)
      I just agree, 10053 is REALLY rare – probably that’s the reason why there are more tools to visualise 10046 than 10053 :)

      Martin Berger (@martinberx)

      22nd November 2011 at 22:23

      • Martin,

        Thanks for highlighting that. I was unaware that you could enable trace for a single SQL-ID in 11G. I do know sites that wouldn’t allow you to do that as it’s undocumented and therefore technically “unsupported” but that’s so useful (thanks Tanel, brilliant as always).

        regards

        Neil

        Neil Chandler

        22nd November 2011 at 22:37

  3. After SQL Monitoring and session snapper I can say I use tracing just for additional opinion. I maybe usually gather trace just because I think I like the output of OraSRP :)

    coskan

    22nd November 2011 at 21:46

    • Time is the enemy of too many opinions though. Do you want to be thorough or get through your workload?

      regs

      Neil

      Neil Chandler

      22nd November 2011 at 22:08

  4. Transaction codes ST03, ST03N or ST03G isn’t that useful in SAP. TCode STAD is a lot more useful as an aggregate to pinpoint a bad application execution in either foreground or background.

    STAD doesn’t quite account for DB locks, enqueues (SAP’s application-based row locking) accurately and misses some of the stuff at OS level. But as you tangentially point out ST03 does this based on the instrumentation of the SAP Kernel. There’s other separate pieces to the instrumentation to trace the buffering, gateways, roll-in/out of programs and the like. STAD basically points to the aggregate of the time spent which only tells me where I might want to start looking. Once I locate a likely culprit in SAP, the next step is either an SQL, system, RFC (communications) trace or most importantly the ABAP or Java traces. The latter two are important as they tell you how many times a routine has executed and how long it took as well as how much time was spent processing in the DB or in the internal code operations of the program(s). It also, in later versions allows a direct look at the offending source code without having to step through their (reasonably good and always improving) debugger.

    Your base point is correct, I don’t do very many 10046 traces on SAP and it’s instrumented quite well. There’s also an “end to end” trace facility which has taken forever to get working and is part of a bloated creature called “Solution Manager” but that’s another story.

    Most of the reason I don’t need a 10046 trace in SAP is that the custom code is so bad that it’s readily identifiable and easy to spot via the other tools. Of course getting it corrected is another matter. I have one particularly clueless mess that’s coming up on it’s second anniversary with no fix in sight from the folks who are supposed to correct it despite explicit instructions.

    Kevin Fries

    23rd November 2011 at 00:08

    • Kevin,

      I’m not a SAP person and was just using ST03 as an illustration (it only proves in which layer the performance problem lies), and have only a little experience of SAP. All big Enterprise systems (Apps, Peoplesoft, JDE, SAP, etc) have serious faults due to their generic nature and significant size, but SAP has implemented some very good instrumentation which really does help. ST03 was just one that I had used and could remember. Thank you for your input – some really good learning for me too :-)

      I sympathise with the fix delay issue. Sometimes you just have to fix it yourself and send them the code with an “implement this please” post-it attached. :-|

      regards

      Neil.

      Neil Chandler

      23rd November 2011 at 10:05

  5. [...] a friend of mine, Neil Chandler, has just posted on this topic, giving his reasons why you don’t run a 10046 trace on production. Neil raises some good points about how difficult it can be to get permission to do something as [...]

  6. Your comments are noted, I have to deal with folks who stop at ST03 and blame the system, network, SAP and never their code. So just in case one of them is reading this, it couldn’t hurt to clarify.

    On to the second part, I have noted corrections and suggested sample code as well as detailed explanations of why it is bad, wrong or just not needed[1]. It doesn’t seem to matter. One of the more frequent responses is that they didn’t understand point “X” and thus disregarded the entire set of points rather than ask for an explanation. As such, I simply maintain a little niche where all the instances I’ve pointed out things to correct can be viewed, complete with all communications and dates. This serves to defend not just myself but also my co-workers. We are constrained here by silos of job duties. In fact, I’m not allowed to change code.

    [1] Q. What’s the quickest way to do something? A. Don’t do it. <- Hats off to Cary for this!

    Kevin Fries

    23rd November 2011 at 11:46

  7. All very good points. Personally I use 10046 a little more often, but still only a handful of times a year. The incorporation of capturing 10046 directly in sql developer using Cary’s plug-in (MR-Trace) should certainly see the number of times it’s used in development increase – I was blown away at how simple this tool makes it. Having said that I don’t believe it will help in production situations when you’re trying to capture ‘in flight’.

    Graham

    25th November 2011 at 10:55


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 27 other followers

%d bloggers like this: