某客户的操作人员反应很慢不能操作,管理人员登录小机系统后发现CPU使用到了95%以上。而且这种情况持续了几个月。小机是IBM的P520,配置是2颗4核的CPU,内存是32G,Oracle是10.2.0.5。topas与生成的AWR报告如下:
从上面的load profile部分可以看到每秒执行的sql与事务数并不高,因为是周末并没有太多人使用系统。
从上面的top 等待事件来看主要是CPU time。如是查看这个时间段生成的ADDM报告:
DETAILED ADDM REPORT FOR TASK 'TASK_72988' WITH ID 72988 -------------------------------------------------------- Analysis Period: 20-DEC-2015 from 09:00:33 to 10:00:34 Database ID/Instance: 1329630138/1 Database/Instance Names: ORCL/orcl Host Name: dbserv Database Version: 10.2.0.5.0 Snapshot Range: from 26245 to 26246 Database Time: 16644 seconds Average Database Load: 4.6 active sessions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ FINDING 1: 93% impact (15444 seconds) ------------------------------------- SQL statements consuming significant database time were found. RECOMMENDATION 1: SQL Tuning, 100% benefit (38004 seconds) ACTION: Run SQL Tuning Advisor on the SQL statement with SQL_ID "fb44z8kbnu8wg". RELEVANT OBJECT: SQL statement with SQL_ID fb44z8kbnu8wg and PLAN_HASH 1055903002 SELECT COUNT(EmailDetail.F_EMAIL_DETAIL_ID) AS UnReadCount FROM T_LK_EMAIL_EMAILGROUP EmailGroup LEFT OUTER JOIN T_LK_EMAIL_DETAIL EmailDetail ON EmailGroup.F_EMAIL_GROUP_ID=EmailDetail.F_EMAIL_GROUP_ ID AND (EmailDetail.F_RECEIVER_ID=:param0 OR (EmailDetail.F_SENDER_ID=:param0 AND EmailDetail.F_RECEIVER_ID IS NULL)) AND EmailDetail.F_EMAIL_STATE='0' AND EmailDetail.F_RECORD_STATE='0' WHERE EmailGroup.F_PARENT_ID=0 AND (EmailGroup.F_EMAIL_GROUP_ID!=4 OR (EmailGroup.F_GROUP_TYPE='USER' AND EmailGroup.F_USER_ID=:param0)) ACTION: Investigate the SQL statement with SQL_ID "fb44z8kbnu8wg" for possible performance improvements. RELEVANT OBJECT: SQL statement with SQL_ID fb44z8kbnu8wg and PLAN_HASH 1055903002 SELECT COUNT(EmailDetail.F_EMAIL_DETAIL_ID) AS UnReadCount FROM T_LK_EMAIL_EMAILGROUP EmailGroup LEFT OUTER JOIN T_LK_EMAIL_DETAIL EmailDetail ON EmailGroup.F_EMAIL_GROUP_ID=EmailDetail.F_EMAIL_GROUP_ ID AND (EmailDetail.F_RECEIVER_ID=:param0 OR (EmailDetail.F_SENDER_ID=:param0 AND EmailDetail.F_RECEIVER_ID IS NULL)) AND EmailDetail.F_EMAIL_STATE='0' AND EmailDetail.F_RECORD_STATE='0' WHERE EmailGroup.F_PARENT_ID=0 AND (EmailGroup.F_EMAIL_GROUP_ID!=4 OR (EmailGroup.F_GROUP_TYPE='USER' AND EmailGroup.F_USER_ID=:param0)) RATIONALE: SQL statement with SQL_ID "fb44z8kbnu8wg" was executed 16029 times and had an average elapsed time of 0.93 seconds. FINDING 2: 76% impact (12602 seconds) ------------------------------------- Time spent on the CPU by the instance was responsible for a substantial part of database time. RECOMMENDATION 1: SQL Tuning, 100% benefit (38004 seconds) ACTION: Run SQL Tuning Advisor on the SQL statement with SQL_ID "fb44z8kbnu8wg". RELEVANT OBJECT: SQL statement with SQL_ID fb44z8kbnu8wg and PLAN_HASH 1055903002 SELECT COUNT(EmailDetail.F_EMAIL_DETAIL_ID) AS UnReadCount FROM T_LK_EMAIL_EMAILGROUP EmailGroup LEFT OUTER JOIN T_LK_EMAIL_DETAIL EmailDetail ON EmailGroup.F_EMAIL_GROUP_ID=EmailDetail.F_EMAIL_GROUP_ ID AND (EmailDetail.F_RECEIVER_ID=:param0 OR (EmailDetail.F_SENDER_ID=:param0 AND EmailDetail.F_RECEIVER_ID IS NULL)) AND EmailDetail.F_EMAIL_STATE='0' AND EmailDetail.F_RECORD_STATE='0' WHERE EmailGroup.F_PARENT_ID=0 AND (EmailGroup.F_EMAIL_GROUP_ID!=4 OR (EmailGroup.F_GROUP_TYPE='USER' AND EmailGroup.F_USER_ID=:param0)) ACTION: Investigate the SQL statement with SQL_ID "fb44z8kbnu8wg" for possible performance improvements. RELEVANT OBJECT: SQL statement with SQL_ID fb44z8kbnu8wg and PLAN_HASH 1055903002 SELECT COUNT(EmailDetail.F_EMAIL_DETAIL_ID) AS UnReadCount FROM T_LK_EMAIL_EMAILGROUP EmailGroup LEFT OUTER JOIN T_LK_EMAIL_DETAIL EmailDetail ON EmailGroup.F_EMAIL_GROUP_ID=EmailDetail.F_EMAIL_GROUP_ ID AND (EmailDetail.F_RECEIVER_ID=:param0 OR (EmailDetail.F_SENDER_ID=:param0 AND EmailDetail.F_RECEIVER_ID IS NULL)) AND EmailDetail.F_EMAIL_STATE='0' AND EmailDetail.F_RECORD_STATE='0' WHERE EmailGroup.F_PARENT_ID=0 AND (EmailGroup.F_EMAIL_GROUP_ID!=4 OR (EmailGroup.F_GROUP_TYPE='USER' AND EmailGroup.F_USER_ID=:param0)) RATIONALE: SQL statement with SQL_ID "fb44z8kbnu8wg" was executed 16029 times and had an average elapsed time of 0.93 seconds. RATIONALE: Average CPU used per execution was 0.76 seconds. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ADDITIONAL INFORMATION ---------------------- Wait class "Application" was not consuming significant database time. Wait class "Commit" was not consuming significant database time. Wait class "Concurrency" was not consuming significant database time. Wait class "Configuration" was not consuming significant database time. Wait class "Network" was not consuming significant database time. Wait class "User I/O" was not consuming significant database time. Session connect and disconnect calls were not consuming significant database time. Hard parsing of SQL statements was not consuming significant database time. The database's maintenance windows were active during 99% of the analysis period. The analysis of I/O performance is based on the default assumption that the average read time for one database block is 10000 micro-seconds. An explanation of the terminology used in this report is available when you run the report with the 'ALL' level of detail. task_name --------- TASK_72988
从上面的信息ADDM报告与top sql部分可以看到在快照26245到26246之间Database Time为16644秒。而找到的一条SQL消耗了76%的CPU时间。如果对这两条SQL执行优化应该可以将CPU消耗显著降低。而该SQL虽然每执行一次的时间是0.93秒,消耗的CPU时间只有0.76秒,但在周末的时间内一个小时都执行了16,029次,消耗的CPU时间是12249秒而且小机的CPU数量只有2颗(6核),那么每秒该SQL的执行次数就是=16029/3600=4.5次,所以大部分的CPU被该SQL所消耗了。这还是周末,如果上班时间该SQL执行的次数会以倍数增加,那么CPU的消耗就会更高。
SQL语句如下:
SELECT COUNT(EmailDetail.F_EMAIL_DETAIL_ID) AS UnReadCount FROM T_LK_EMAIL_EMAILGROUP EmailGroup LEFT OUTER JOIN T_LK_EMAIL_DETAIL EmailDetail ON EmailGroup.F_EMAIL_GROUP_ID = EmailDetail.F_EMAIL_GROUP_ID AND (EmailDetail.F_RECEIVER_ID =3440 OR (EmailDetail.F_SENDER_ID =3418 AND EmailDetail.F_RECEIVER_ID IS NULL)) AND EmailDetail.F_EMAIL_STATE = '0' AND EmailDetail.F_RECORD_STATE = '0' WHERE EmailGroup.F_PARENT_ID = 0 AND (EmailGroup.F_EMAIL_GROUP_ID != 4 OR (EmailGroup.F_GROUP_TYPE = 'USER' AND EmailGroup.F_USER_ID = 23402));
其执行计划如下:
从执行计划来看该SQL的cost也不高(执行时间是0.93秒,cpu时间是0.76秒),从SQL的执行计划来看见两个表是使用的嵌套循环,而驱动表T_LK_EMAIL_EMAILGROUP的数据量是1w多行,T_LK_EMAIL_DETAIL表的数量是20w行左右。而表T_LK_EMAIL_EMAILGROUP执行全表扫描后满足查询条件的记录有4条,所以就得对表T_LK_EMAIL_DETAIL中的记录遍历4次来找到与驱动表相匹配的记录,虽然每次执行时间不长,但是在并发执行次数高,而物理CPU数量不足的情况下还是会引发性能问题。而这两个表有等值连接条件F_EMAIL_GROUP_ID,而且在驱动表中F_EMAIL_GROUP_ID列创建了索引,所以这里选择在表T_LK_EMAIL_DETAIL表的F_EMAIL_GROUP_ID列上创建索引之后执行计划如下所示:
SQL执行计划的COST显示增加了,但SQL执行时间只有0.1s提高了10倍。在对该SQL优化之后,业务系统恢复正常,CPU使用率也维持在20%左右。
所以在优化时,不能简单的根据SQL执行时间来判断该SQL是否会引发性能问题,要具体问题具体分析。