oracle rac 单个实例不能生成awr报告的问题

同事对rac集群生成性能报告时发现rac集群有一个实例没有生成awr快照,另一个实例快照正常。下面是具体处理步骤。
1号实例没有生成awr快照

SQL> select SNAP_ID,END_INTERVAL_TIME,instance_number from dba_hist_snapshot where instance_number=1;

no rows selected

2号实快照正常

SQL> set long 200   
SQL> set linesize 200
SQL> select * from ( select SNAP_ID,END_INTERVAL_TIME,instance_number from dba_hist_snapshot where instance_number=2 order by SNAP_ID desc) where rownum < =10;

   SNAP_ID END_INTERVAL_TIME                                                           INSTANCE_NUMBER
---------- --------------------------------------------------------------------------- ---------------
     24405 17-AUG-19 07.00.47.595 PM                                                                 2
     24404 17-AUG-19 06.00.42.150 PM                                                                 2
     24403 17-AUG-19 05.00.37.041 PM                                                                 2
     24402 17-AUG-19 04.00.31.774 PM                                                                 2
     24401 17-AUG-19 03.00.26.414 PM                                                                 2
     24400 17-AUG-19 02.00.21.176 PM                                                                 2
     24399 17-AUG-19 01.00.16.316 PM                                                                 2
     24398 17-AUG-19 12.00.10.997 PM                                                                 2
     24397 17-AUG-19 11.00.05.446 AM                                                                 2
     24396 17-AUG-19 10.00.59.801 AM                                                                 2

10 rows selected.

mmon进程与awr快照相关,mmnl与ash相关,如是查看两个实例的mmon与mmnl进程
2号实例

[root@db2 ~]# ps -ef | grep mmon
root     128329 127956  0 18:11 pts/2    00:00:00 grep mmon
oracle   201527      1  0  2018 ?        17:17:11 ora_mmon_RLZY2
[root@db2 ~]# ps -ef | grep mmnl
root     131772 127956  0 18:17 pts/2    00:00:00 grep mmnl
oracle   201531      1  0  2018 ?        1-06:06:24 ora_mmnl_RLZY2

1号实例

[root@db1 ~]# ps -ef | grep mmon
root     239020 238963  0 18:52 pts/2    00:00:00 grep mmon
[root@db1 ~]# ps -ef | grep mmnl
root     239052 238963  0 18:52 pts/2    00:00:00 grep mmnl

可以看到1号实例没有mmon与mmnl进程了。

如是查看1号实例的mmon进程的跟踪文件

[root@db1 trace]# ls -lrt *mmon*.trc
-rw-r----- 1 oracle asmadmin 1351052 Jan 19  2018 RLZY1_mmon_20073.trc
-rw-r----- 1 oracle asmadmin  173031 Jan 22  2018 RLZY1_mmon_49119.trc
[root@db1 trace]# more RLZY1_mmon_49119.trc
Trace file /u01/app/oracle/diag/rdbms/rlzy/RLZY1/trace/RLZY1_mmon_49119.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/db_1
System name:    Linux
Node name:      db1
Release:        3.8.13-68.3.4.el6uek.x86_64
Version:        #2 SMP Tue Jul 14 15:03:36 PDT 2015
Machine:        x86_64
Instance name: RLZY1
Redo thread mounted by this instance: 1
Oracle process number: 36
Unix process pid: 49119, image: oracle@db1 (MMON)


*** 2018-01-19 13:55:20.030
*** SESSION ID:(1369.1) 2018-01-19 13:55:20.030
*** CLIENT ID:() 2018-01-19 13:55:20.030
*** SERVICE NAME:() 2018-01-19 13:55:20.030
*** MODULE NAME:() 2018-01-19 13:55:20.030
*** ACTION NAME:() 2018-01-19 13:55:20.030
 
minact-scn slave-status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000

*** 2018-01-19 14:00:20.643
minact-scn master-status: grec-scn:0x0e0e.55ad96ef gmin-scn:0x0e0e.55abf256 gcalc-scn:0x0e0e.55ac2a0a

..........
KEBM: MMON action policy violation. 'Block Cleanout Optim, Undo Segment Scan' viol=1; err=12751
minact-scn master-status: grec-scn:0x0e0e.5f0ebf8c gmin-scn:0x0e0e.5f0eac2e gcalc-scn:0x0e0e.5f0ead91
DDE rules only execution for: ORA 12751

*** 2018-01-22 07:06:04.060
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- START DDE Action: 'ORA_12751_DUMP' (Sync) -----
Runtime exceeded 300 seconds
Time limit violation detected at:
ksedsts()+465< -kspol_12751_dump()+145<-dbgdaExecuteAction()+1065<-dbgerRunAction()+109<-dbgerRunActions()+4134<-dbgexPhaseII()+1873<-dbgexProcessError()+2680<-dbgeExecuteForError()+88<-dbgePostErrorKGE()+2136<-dbkePostKGE_kgsf()+71<-kge
selv()+276<-ksesecl0()+162
<-ksucin()+147<-kcbzwb()+2727<-kcbgtcr()+31325<-ktucloUsMinScn()+539<-ktucloUsegScan()+992<-ksb_run_managed_action()+384<-ksbcti()+2490<-ksbabs()+1735<-kebm_mmon_main()+209<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai
_real()+250<-ssthrdmain()+265
<-main()+201<-__libc_start_main()+253Current Wait Stack:
 0: waiting for 'gc buffer busy acquire'
    file#=0x5, block#=0x278, class#=0x49
    wait_id=255378 seq_num=59358 snap_id=1
    wait times: snap=5 min 5 sec, exc=5 min 5 sec, total=5 min 5 sec
    wait times: max=infinite, heur=5 min 5 sec
    wait counts: calls=358 os=358
    in_wait=1 iflags=0x15a2
There is at least one session blocking this session.
  Dumping 1 direct blocker(s):
    inst: 1, sid: 990, ser: 1
  Dumping final blocker:
    inst: 1, sid: 990, ser: 1
Wait State:
  fixed_waits=0 flags=0x22 boundary=(nil)/-1
Session Wait History:
    elapsed time of 0.000061 sec since current wait
 0: waited for 'gc cr block 2-way'
    =0x5, =0x258, =0x47
    wait_id=255377 seq_num=59357 snap_id=1
    wait times: snap=0.000478 sec, exc=0.000478 sec, total=0.000478 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000122 sec of elapsed time
 1: waited for 'gc cr block 2-way'
    =0x5, =0x228, =0x45
    wait_id=255376 seq_num=59356 snap_id=1
    wait times: snap=0.000741 sec, exc=0.000741 sec, total=0.000741 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000120 sec of elapsed time
 2: waited for 'gc cr block 2-way'
    =0x5, =0x138, =0x43
    wait_id=255375 seq_num=59355 snap_id=1
    wait times: snap=0.000528 sec, exc=0.000528 sec, total=0.000528 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000111 sec of elapsed time
 3: waited for 'gc cr block 2-way'
    =0x5, =0xb8, =0x41
    wait_id=255374 seq_num=59354 snap_id=1
    wait times: snap=0.000583 sec, exc=0.000583 sec, total=0.000583 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000139 sec of elapsed time
 4: waited for 'gc cr block 2-way'
    =0x5, =0x110, =0x37
    wait_id=255373 seq_num=59353 snap_id=1
    wait times: snap=0.000541 sec, exc=0.000541 sec, total=0.000541 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000110 sec of elapsed time
 5: waited for 'gc cr block 2-way'
    =0x5, =0x100, =0x35
    wait_id=255372 seq_num=59352 snap_id=1
    wait times: snap=0.000629 sec, exc=0.000629 sec, total=0.000629 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000158 sec of elapsed time
 6: waited for 'gc cr block 2-way'
    =0x5, =0xf0, =0x33
    wait_id=255371 seq_num=59351 snap_id=1
    wait times: snap=0.000617 sec, exc=0.000617 sec, total=0.000617 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000128 sec of elapsed time
 7: waited for 'gc cr block 2-way'
    =0x5, =0xe0, =0x31
    wait_id=255370 seq_num=59350 snap_id=1
    wait times: snap=0.000561 sec, exc=0.000561 sec, total=0.000561 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000124 sec of elapsed time
 8: waited for 'gc cr block 2-way'
    =0x5, =0xd0, =0x2f
    wait_id=255369 seq_num=59349 snap_id=1
    wait times: snap=0.000565 sec, exc=0.000565 sec, total=0.000565 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000128 sec of elapsed time
 9: waited for 'gc cr block 2-way'
    =0x5, =0xc0, =0x2d
    wait_id=255368 seq_num=59348 snap_id=1
    wait times: snap=0.000555 sec, exc=0.000555 sec, total=0.000555 sec
    wait times: max=infinite
    wait counts: calls=1 os=1
    occurred after 0.000125 sec of elapsed time
Sampled Session History of session 1369 serial 1
---------------------------------------------------
The sampled session history is constructed by sampling
the target session every 1 second. The sampling process
captures at each sample if the session is in a non-idle wait,
an idle wait, or not in a wait. If the session is in a
non-idle wait then one interval is shown for all the samples
the session was in the same non-idle wait. If the
session is in an idle wait or not in a wait for
consecutive samples then one interval is shown for all
the consecutive samples. Though we display these consecutive
samples  in a single interval the session may NOT be continuously
idle or not in a wait (the sampling process does not know).
 
The history is displayed in reverse chronological order.
 
sample interval: 1 sec, max history 120 sec
---------------------------------------------------
  [121 samples,                                            07:04:03 - 07:06:03]
    waited for 'gc buffer busy acquire', seq_num: 59358
      p1: 'file#'=0x5
      p2: 'block#'=0x278
      p3: 'class#'=0x49
      time_waited: >= 120 sec (still in wait)
---------------------------------------------------
Sampled Session History Summary:
  longest_non_idle_wait: 'gc buffer busy acquire'
  [121 samples, 07:04:03 - 07:06:03]
      time_waited: >= 120 sec (still in wait)
---------------------------------------------------
----- END DDE Action: 'ORA_12751_DUMP' (SUCCESS, 1 csec) -----
----- END DDE Actions Dump (total 1 csec) -----
KEBM: MMON action policy violation. 'Block Cleanout Optim, Undo Segment Scan' viol=1; err=12751
minact-scn master-status: grec-scn:0x0e0e.5f0ec4ce gmin-scn:0x0e0e.5f0eac2e gcalc-scn:0x0e0e.5f0ead91

*** 2018-01-22 07:11:11.071
DDE rules only execution for: ORA 12751
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----

ORA12751的错误原因是陈旧的SYS对象统计数据会导致生成次优执行计划,从而使AWR自动刷新从操作的语句运行更长时间和超时。

解决方法就是收集新的SYS对象统计信息,为优化器提供更好的统计信息,并生成更高效的执行计划

SQL> EXEC DBMS_STATS.GATHER_FIXED_OBJECTS_STATS;

PL/SQL procedure successfully completed.

SQL> EXEC DBMS_STATS.GATHER_SCHEMA_STATS ('SYS');

PL/SQL procedure successfully completed.

下面就是重启mmon和mmnl进程

SQL> alter system enable restricted session;

System altered.

SQL> alter system disable restricted session;

System altered.

查看alert日志可以看到mmon和mmnl进程已经重启了

Sat Aug 17 19:18:22 2019
Starting background process MMON
Sat Aug 17 19:18:22 2019
Starting background process MMNL
MMON started with pid=399, OS id=10373 
Sat Aug 17 19:18:22 2019
MMNL started with pid=405, OS id=10375 
ALTER SYSTEM enable restricted session;
Sat Aug 17 19:18:25 2019
Some DDE async actions failed or were cancelled
Sat Aug 17 19:18:25 2019
Sweep [inc][48021]: completed
Sweep [inc][48011]: completed
Sweep [inc][48002]: completed
Sweep [inc][35010]: completed
Sweep [inc][34706]: completed
Sweep [inc][34242]: completed
Sweep [inc][33546]: completed
Sweep [inc][33394]: completed
Sweep [inc2][48021]: completed
Sweep [inc2][48011]: completed
Sweep [inc2][48002]: completed
Sweep [inc2][35010]: completed
Sweep [inc2][34706]: completed
Sweep [inc2][34242]: completed
Sweep [inc2][33546]: completed
Sweep [inc2][33394]: completed
minact-scn: Inst 1 is a slave inc#:30 mmon proc-id:10373 status:0x2
minact-scn status: grec-scn:0x0e0e.61cb2e9c gmin-scn:0x0e0e.5f0eac2e gcalc-scn:0x0e0e.5f0ead91
Sat Aug 17 19:18:29 2019
db_recovery_file_dest_size of 10240 MB is 20.87% used. This is a
user-specified limit on the amount of space that will be used by this
database for recovery-related files, and does not reflect the amount of
space available in the underlying filesystem or ASM diskgroup.
Sat Aug 17 19:18:42 2019
ALTER SYSTEM disable restricted session;

再查看1号实例的mmon与mmnl进程状态

[root@db1 ~]# ps -ef | grep mmnl
oracle    10375      1  0 19:18 ?        00:00:00 ora_mmnl_RLZY1
root      10611 238963  0 19:18 pts/2    00:00:00 grep mmnl

[root@db1 ~]# ps -ef | grep mmon
oracle    10373      1  7 19:18 ?        00:00:02 ora_mmon_RLZY1
root      10630 238963  0 19:18 pts/2    00:00:00 grep mmon
过了两个小时去查看1号实例已经生成了两条快照信息
SQL> set long 200
SQL> set linesize 200
SQL> select * from ( select SNAP_ID,END_INTERVAL_TIME,instance_number from dba_hist_snapshot where instance_number=1 order by SNAP_ID desc) where rownum < =10;

   SNAP_ID END_INTERVAL_TIME                                                           INSTANCE_NUMBER
---------- --------------------------------------------------------------------------- ---------------
     24407 17-AUG-19 09.00.58.595 PM                                                                 1
     24406 17-AUG-19 08.00.40.244 PM                                                                 1

到此问题解决了。

发表评论

电子邮件地址不会被公开。