latch row cache objects等待事件的诊断

在awr报告中的top wait中显示了’latch: row cache objects’竞争
Top 5 Timed Foreground Events

Event	                        Waits	        Time(s)	        Avg wait (ms)	% DB time	Wait Class
latch: row cache objects	11,509,891	291,065	        25	        71.65	        Concurrency
DB CPU	 	                                37,417	 	                9.21	 
db file sequential read	        2,221,943	19,023	        9	        4.68	        User I/O
read by other session	        1,464,324	9,340	        6	        2.30	        User I/O
enq: TX - row lock contention	19,750	        4,928	        250	        1.21	        Application

	        Per Second	Per Transaction	 Per Exec	Per Call
DB Time(s):	     112.9	            2.4	     0.02	    0.05
DB CPU(s):	      10.4	            0.2	     0.00	    0.00
Redo size:	 580,567.8	       12,152.7	 	 
Logical reads:	 417,007.5	        8,729.0	 	 
Block changes:	   2,931.4	           61.4	 	 
Physical reads:	  34,146.3	           714.8	 	 
Physical writes      288.6	             6.0	 	 
User calls:	   2,507.1	            52.5	 	 
Parses:	           1,023.5	            21.4	 	 
Hard parses:	     528.4	            11.1	 	 
W/A MB processed:      7.6	             0.4	 	 
Logons:	               0.3	             0.0	 	 
Executes:	   5,231.8	            109.5	 	 
Rollbacks:	       0.7	            0.0	 	 
Transactions:	      47.8

SQL ordered by Elapsed Time
Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
% Total DB Time is the Elapsed Time of the SQL statement divided into the Total Database Time multiplied by 100
%Total – Elapsed Time as a percentage of Total DB time
%CPU – CPU Time as a percentage of Elapsed Time
%IO – User I/O Time as a percentage of Elapsed Time
Captured SQL account for 5.1% of Total DB Time (s): 406,212
Captured PL/SQL account for 0.3% of Total DB Time (s): 406,212

Elapsed Time (s)	Executions	Elapsed Time per Exec (s)	%Total	%CPU	   %IO	     SQL Id	                                   SQL Module	SQL Text
       7,316.50	           3	                2,438.83    1.80	0.74	97.96	4ux9hyv2xrxqp	 JDBC Thin Client	                            select * from (select page$tem...
       3,240.46	           0	 	                  0.80	  0.00	0.00	      3jkhph1czbx52	 JDBC Thin Client	                            update (select c.paf045 c045, ...
       3,092.00	      41,263	                    0.07	  0.76	1.08	98.03	44njhv5k4r2bx	 JDBC Thin Client	                            select a.akb020 hospital_id, a...
       1,405.07	           4	                  351.27	  0.35	2.50	93.19	92st5dkx3b59d	 JDBC Thin Client	                            select count(1) from ( SELECT ...
       1,399.77	           1	                1,399.77	  0.34	0.00	0.00	0q7cfvguf04k9	 JDBC Thin Client	                            update (select c.paf045 c045, ...
       1,037.16	      11,147	                    0.09	  0.26	29.24	35.89	1w68qv0t71aak	 siserv@el01cn04.exalogic.gzsi.cn (TNS V1-V3)	begin PKG_SERVICES . PRC_BIZ_S...
         707.83	         169	                    4.19	  0.17	38.56	23.96	b29wgyggbpj29	 JDBC Thin Client	                            select * from ( select b.akb02...
         615.84	  10,590,959	                    0.00	  0.15	40.13	0.00	grwydz59pu6mc	                                              select text from view$ where r...
         408.78	           4	                  102.20	  0.10	13.63	66.25	5xtsfsftqxj7v	 JDBC Thin Client	                            select v.aac001, (select c.aac...
         359.40	      78,750	                    0.00	  0.09	2.95	0.00	7dsyr136296ty	 JDBC Thin Client	                            select cur_num_no from SYS_SER...

SQL ordered by CPU Time
Resources reported for PL/SQL code includes the resources used by all SQL statements called by the code.
%Total – CPU Time as a percentage of Total DB CPU
%CPU – CPU Time as a percentage of Elapsed Time
%IO – User I/O Time as a percentage of Elapsed Time
Captured SQL account for 3.3% of Total CPU Time (s): 37,417
Captured PL/SQL account for 0.8% of Total CPU Time (s): 37,417

CPU Time (s)	Executions	CPU per Exec (s)	%Total	Elapsed Time (s)	%CPU	  %IO	       SQL Id	                                  SQL Module	SQL Text
     303.26	      11,147	           0.03	    0.81	       1,037.16	 29.24	35.89	1w68qv0t71aak	 siserv@el01cn04.exalogic.gzsi.cn (TNS V1-V3)	begin PKG_SERVICES . PRC_BIZ_S...
     272.93	    169	1.61	           0.73	  707.83	          38.56	 23.96	      b29wgyggbpj29	                            JDBC Thin Client	select * from ( select b.akb02...
     247.13	  10,590,959	           0.00	    0.66	         615.84	 40.13	 0.00	grwydz59pu6mc	                            	                select text from view$ where r...
      68.86	       4,972	           0.01	    0.18	         131.43	 52.40	 1.14	df3fssxryx4g5	                            JDBC Thin Client	select nvl(ac01.aac012, ac01.B...
      55.86	          18	           3.10	    0.15	         120.11	 46.51	 0.01	admybmxruwxkz	                            PL/SQL Developer	select * from v$open_cursor wh...
      55.70	           4	          13.93	    0.15	         408.78	 13.63	66.25	5xtsfsftqxj7v	                            JDBC Thin Client	select v.aac001, (select c.aac...
      54.33	           3	          18.11	    0.15	       7,316.50	  0.74	97.96	4ux9hyv2xrxqp	                            JDBC Thin Client	select * from (select page$tem...
      37.54	           4	           9.38	    0.10	         113.28	 33.14	30.75	453gakrhkhmnw	                            JDBC Thin Client	with rs as(select m.aac001, m....
      37.35	           4	           9.34	    0.10	         210.48	 17.74	61.40	5w5gchz2ymag5	                            JDBC Thin Client	with rs as(select m.aac001, m....
      35.17	           4	           8.79	    0.09	       1,405.07	 2.50	  93.19	92st5dkx3b59d	                            JDBC Thin Client	select count(1) from ( SELECT ...

在oracle11.2.0.3中,会动态地增加’latch:row cache objects’的等待可以查看awr报告的’Latch Miss Sources’部分,top location calling部分是’row cache objects kqrpre: find obj’:

Latch Miss Sources
only latches with sleeps are shown
ordered by name, sleeps desc
Latch Name	        Where	          NoWait Misses	      Sleeps	Waiter Sleeps
..........
row cache objects	kqrpre: find obj	      0	   6,049,715	    4,177,088
row cache objects	kqreqd: reget	              0	   3,227,630	    3,311,481
row cache objects	kqreqd	                      0	   2,286,734	    4,076,892
row cache objects	kqrso	                      0	       8,571	        7,388
row cache objects	kqrbgl	                      0	         428	          408
row cache objects	kqrpre: init complete	      0	         310	          323
row cache objects	kqrigt	                      0	         232	          210
row cache objects	kqrigt2	                      0	         227	          237
row cache objects	kqrbip	                      0	         220	          114
row cache objects	kqrbtm: pop parent	      0	         209	          222
row cache objects	kqrbpr: KQRRSNRL	      0	         103	          198
row cache objects	kqrbfr	                      0	          15	           47
row cache objects	kqrigt3	                      0	          12	            3
row cache objects	ksucallcbksafely: kqrhngc     0	           8	            0
row cache objects	kqrsrd	                      0	           3	           19
row cache objects	kqrcmt: while loop	      0	           2	            3

‘row cache objects’闩锁被调用的最频繁的是’kqrpre: find obj’.这个模式尝试在行缓存中找到正被解析对象的详细信息.在解析时行缓存会被搜索.进程搜索是在缓存中通过一个由’row cache objects’闩锁所保护的链表来找到它所需要的信息.当另外的进程也在解析并通过行缓存进行查找时就可能会发生竞争,特别是解析活过于频繁或执行一些不必要的解析时产生竞争的机会就会增大.

从系统状态跟踪文件中通常可以看到以下相关进程:
()+492< -kslges()+2152<-kqreqd()+256<-kkoiqb()+9436<-kkooqb()+684<-kkoqbc()+2256<-apakkoqb()+140
<-apaqbdDescendents()+460<-apadrv()+2020<-opitca()+2112 <-kksFullTypeCheck()+72<-rpiswu2()

在这个跟踪堆中大部分时间花在KKOIQB函数.这个函数使用行缓存以顺序调用来找索引名字因此这个表的索引会以索引名字字母的顺序的来排序.在这种情况下在不同执行计划之间优化器需要一个一致的方法来选择一个索引.通过按索引名字字母来排序来完成.

造成这个问题的原因是因为在oracle11.2中优化器的结构的改变.在oracle11.2.0.3中对于使用视图或子查询的查询会有更多的查询转换.这就会产生更多的优化操作也就意味着会有更多的row cache latch调用.

IOStat by Function summary
'Data' columns suffixed with M,G,T,P are in multiples of 1024 other columns suffixed with K,M,G,T,P are in multiples of 1000
ordered by (Data Read + Write) desc
Function Name	    Reads:Data Reqs per sec Data per sec Writes:Data Reqs per sec Data per sec Waits:Count Avg Tm(ms)
Direct Reads	        223.3G	     119.22	 63.6844	  0M	     0.00	    0M	         0
RMAN	                 62.2G	      64.06	 17.7467	216M	     2.10      .060147	    113.5K	 0.27
Buffer Cache Reads       37.3G	    1270.31	 10.6290	  0M	     0.00	    0M	   2609.5K	 6.30
Others	                  3.8G	      17.59	 1.08626	3.1G	     7.79      .884942	     65.7K	 0.59
DBWR	                   0M	       0.00	      0M	5.3G	    73.69      1.50228	         0
LGWR	                   2M	       0.04	 .000556	2.4G	    71.72      .683616	    234.4K	 0.11
Direct Writes	           0M	       0.00	      0M	687M	     0.65      .191301	         0
Streams AQ	           0M	       0.00	      0M	  0M	     0.00	    0M	         2	 8.00
TOTAL:	               326.7G	    1471.23	 93.1470       11.7G	   155.95      3.32229	   3023.2K	 5.47

解决方法:
1.检查程序使用绑定变量来降低每秒528的硬解析次数
2.在业务高峰期间停止rman备份
3.优化消耗大量cpu时间和执行时间长的sql语句(执行时间长和消耗cpu时间大的sql有些是重复的)

在使用以上解决方法后在第二天业务高峰期间恢复正常了

发表评论

电子邮件地址不会被公开。