oracle 聚簇因子(Cluster Factor)
文章来源:http://www.itpub.net/thread-1317424-1-1.html#select table_name,index_name from user_indexes where table_name='TORDER'//查看索引select * from user_ind_columns where index_name=upper('&index_name');select index_name,monitoring,used,start_monitoring,end_monitoring from v$object_usage
一次简单的性能优化诊断,聚簇因子过高导致全表扫描。业务人员反映一个查询非常慢:--------------------------------------------------------------------------------select * from ab44 where aae002=201006;--------------------------------------------------------------------------------查看索引的句子查看执行计划,是全表扫描SQL> explain plan for select * from ab44 where aae002=201006;已解释。SQL> select * from table(dbms_xplan.display);PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------Plan hash value: 781340439 -------------------------------------------------------------------------- | Id| Operation | Name | Rows| Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT| | 10554 | 865K|8777 (3)| 00:01:46 | |*1 |TABLE ACCESS FULL| AB44 | 10554 | 865K|8777 (3)| 00:01:46 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- 1 - filter("AAE002"=201006) 已选择13行。看看查询应该返回多少数据量,还有这个表有多少记录。SQL> select count(*) from ab44 where aae002='201006';COUNT(*)---------- 576SQL> select count(*) from ab44;COUNT(*)---------- 3310023SQL> select 576/3310023 from dual;576/3310023-----------.000174017查询所需返回的行数仅占表的很小比例,如果有索引的话,应该索引扫描才对。查看表的索引,发现在aae002字段上有一个复合索引,四个字段组成AAE002, AAE003, AAB001, AAE140。既然有索引,为什么没有使用呢?莫非是缺失统计信息。查看表、索引、直方图的信息都有。而且统计信息相对还是比较新的。SQL> select num_rows,blocks,avg_row_len from user_tables where table_name='AB44';NUM_ROWS BLOCKS AVG_ROW_LEN ---------- ---------- ----------- 3310017 44538 84 SQL> select distinct_keys,clustering_factor,num_rows fromUSER_IND_STATISTICS WHERE table_name='AB44' and index_name='PK_AB44';DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS ------------- ----------------- ---------- 3309447 3299907 3309447SQL> SELECT * FROM USER_HISTOGRAMS WHERE table_name='AB44';略。。。。。。。。。。。。。。。。。。。。。。。。查询到索引的统计信息的时候,发现索引的聚簇因子非常高,非常接近表的行数。重新分析表,依然如此。修改聚簇因子后,查看执行计划,已经是索引扫描了。begindbms_stats.set_index_stats(ownname => 'NCSI',indname => 'PK_AB44',clstfct => '7800');end;SQL> select * from table(dbms_xplan.display);PLAN_TABLE_OUTPUT------------------------------------------------------------------------------------------------------------Plan hash value: 1618544176---------------------------------------------------------------------------------------| Id| Operation | Name | Rows| Bytes | Cost (%CPU)| Time |---------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 10554 | 865K| 239 (1)| 00:00:03 || 1 |TABLE ACCESS BY INDEX ROWID| AB44 | 10554 | 865K| 239 (1)| 00:00:03 ||*2 | INDEX RANGE SCAN | PK_AB44 | 10554 | | 45 (0)| 00:00:01 |---------------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - access("AAE002"=201006)已选择14行。但是到这里并不能说一定是聚簇因子导致的,因为很可能是还有直方图的因素。查询列AAE002上的唯一值个数为420,而表的记录总数是330万,如果没有直方图的话,ORACLE评估返回的行数应该是3300000/420=7857条记录,按照这个记录量来看,返回的行数占表记录总数的0.2%.根据经验,应该也能使用到索引才对。于是重新收集统计信息,取消直方图。查看执行计划,还是全表扫描。看来直方图在本例中所占影响因素较小,还是聚簇因子过大惹的祸。暂时通过修改聚簇因子暂时改善了性能问题,晚上的时候,按照索引字段的顺序重新创建了表。SQL>create table AB44_TEMP as select * from ab44 where 1=0;SQL>INSERT /*+ append */INTO AB44_TEMPSELECT * FROM AB44ORDER BY AAE002, AAE003, AAB001, AAE140;SQL>commit;SQL>drop table ab44;SQL>alter table ab44_temp rename to ab44;重新创建索引,分析表。重建后的聚簇因子只有60197,远远小于之前的 3299907。查看执行计划,也对了。SQL> explain plan for select * from ab44 where aae002=201006;已解释。SQL> select * from table(dbms_xplan.display);PLAN_TABLE_OUTPUT-----------------------------------------------------------------------------------------------------------------Plan hash value: 2627288474---------------------------------------------------------------------------------------------| Id| Operation | Name | Rows| Bytes | Cost (%CPU)| Time |---------------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 10799 | 885K| 249 (1)| 00:00:03 || 1 |TABLE ACCESS BY INDEX ROWID| AB44 | 10799 | 885K| 249 (1)| 00:00:03 ||*2 | INDEX RANGE SCAN | AB44_TEMP_IND | 10799 | | 50 (0)| 00:00:01 |---------------------------------------------------------------------------------------------Predicate Information (identified by operation id):--------------------------------------------------- 2 - access("AAE002"=201006)已选择14行。而且为了验证本例确实是由于聚簇因子过大占了决定因素。我把重建后的表直方图取消掉,重新查询,每一个AAE002的值都是索引扫描了。而之前聚簇因子较大的无直方图的实验,还是全表扫描。进一步证明了本例聚簇因子的影响占了很大比例。BEGINDBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'NCSI', TABNAME => 'AB44', CASCADE => TRUE, METHOD_OPT => 'for ALL columns SIZE 1');END;SQL>EXPLAIN PLAN FOR SELECT * FROM ab44 WHERE aae002=201002;SQL>EXPLAIN PLAN FOR SELECT * FROM ab44 WHERE aae002=201006;SQL>EXPLAIN PLAN FOR SELECT * FROM ab44 WHERE aae002=198701;SQL>EXPLAIN PLAN FOR SELECT * FROM ab44 WHERE aae002=199101;SQL>EXPLAIN PLAN FOR SELECT * FROM ab44 WHERE aae002=199804;SQL> select object_name,operation,options from plan_table where id=2;OBJECT_NAME OPERATION OPTIONS-------------------- ------------------------------ --------------------AB44_TEMP_IND INDEX RANGE SCANAB44_TEMP_IND INDEX RANGE SCANAB44_TEMP_IND INDEX RANGE SCANAB44_TEMP_IND INDEX RANGE SCANAB44_TEMP_IND INDEX RANGE SCAN
查看表有多大
select sum(bytes)/1024/1024 from user_segments where segment_name='X1';
页:
[1]