RBO和CBO的基本概念
RBO与CBO基本概念介绍
Oracle数据库中的优化器又叫查询优化器(Query Optimizer)。它是SQL分析和执行的优化工具,它负责生成、制定SQL的执行计划。Oracle的优化器有两种,基于规则的优化器(RBO)与基于代价的优化器(CBO)
RBO: Rule-Based Optimization 基于规则的优化器
CBO: Cost-Based Optimization 基于代价的优化器
RBO自ORACLE 6以来被采用,一直沿用至ORACLE 9i. ORACLE 10g开始,ORACLE已经彻底丢弃了RBO,它有着一套严格的使用规则,只要你按照它去写SQL语句,无论数据表中的内容怎样,也不会影响到你的“执行计划”,也就是说RBO对数据不“敏感”;它根据ORACLE指定的优先顺序规则,对指定的表进行执行计划的选择。比如在规则中,索引的优先级大于全表扫描;RBO是根据可用的访问路径以及访问路径等级来选择执行计划,在RBO中,SQL的写法往往会影响执行计划,它要求开发人员非常了解RBO的各项细则,菜鸟写出来的SQL脚本性能可能非常差。随着RBO的被遗弃,渐渐不为人所知。也许只有老一辈的DBA对其了解得比较深入。关于RBO的访问路径,官方文档做了详细介绍:
RBO Path 1: SingleRow by Rowid
RBO Path 2: Single Row by Cluster Join
RBO Path 3: Single Row by Hash Cluster Key with Unique or Primary Key
RBO Path 4: Single Row by Unique or Primary Key
RBO Path 5: Clustered Join
RBO Path 6: Hash Cluster Key
RBO Path 7: Indexed Cluster Key
RBO Path 8: Composite Index
RBO Path 9: Single-Column Indexes
RBO Path 10: Bounded Range Search on Indexed Columns
RBO Path 11: Unbounded Range Search on Indexed Columns
RBO Path 12: Sort Merge Join
RBO Path 13: MAX or MIN of Indexed Column
RBO Path 14: ORDER BY on Indexed Column
RBO Path 15: Full Table Scan
CBO是一种比RBO更加合理、可靠的优化器,它是从ORACLE 8中开始引入,但到ORACLE 9i 中才逐渐成熟,在ORACLE 10g中完全取代RBO, CBO是计算各种可能“执行计划”的“代价”,即COST,从中选用COST最低的执行方案,作为实际运行方案。它依赖数据库对象的统计信息,统计信息的准确与否会影响CBO做出最优的选择。如果对一次执行SQL时发现涉及对象(表、索引等)没有被分析、统计过,那么ORACLE会采用一种叫做动态采样的技术,动态的收集表和索引上的一些数据信息。
关于RBO与CBO,我有个形象的比喻:大数据时代到来以前,做生意或许凭借多年累计下来的经验(RBO)就能够很好的做出决策,跟随市场变化。但是大数据时代,如果做生意还是靠以前凭经验做决策,而不是靠大数据、数据分析、数据挖掘做决策,那么就有可能做出错误的��策。这也就是越来越多的公司对BI、数据挖掘越来越重视的缘故,像电商、游戏、电信等行业都已经大规模的应用,以前在一家游戏公司数据库部门做BI分析,挖掘潜在消费用户简直无所不及。至今映像颇深。
CBO与RBO的优劣
CBO优于RBO是因为RBO是一种呆板、过时的优化器,它只认规则,对数据不敏感。毕竟规则是死的,数据是变化的,这样生成的执行计划往往是不可靠的,不是最优的,CBO由于RBO可以从很多方面体现。下面请看一个例子,此案例来自于《让Oracle跑得更快》。
SQL> create table test as
select 1 id ,object_name from dba_objects;
Table created.
SQL> create index idx_test
on test(id);
Index created.
SQL> update test set
id=100 where rownum =1;
1 row updated.
SQL> select id, count(1)
from test group by id;
ID
COUNT(1)
---------- ----------
100 1
1
50314
从上面可以看出,该测试表的数据分布极其不均衡,ID=100的记录只有一条,而ID=1的记录有50314条。我们先看看RBO下两条SQL的执行计划.
SQL> select /*+ rule */ *
from test where id =100;
Execution Plan
----------------------------------------------------------
Plan hash value: 2473784974
------------------------------------------------
| Id | Operation | Name |
------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 |
TABLE ACCESS BY INDEX ROWID| TEST
|
|* 2 | INDEX
RANGE SCAN | IDX_TEST |
------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
2 - access("ID"=100)
Note
-----
- rule based optimizer used (consider using
cbo)
Statistics
----------------------------------------------------------
1
recursive calls
0
db block gets
3
consistent gets
0
physical reads
0
redo size
588
bytes sent via SQL*Net to client
469
bytes received via SQL*Net from client
2
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
1
rows processed
SQL>
SQL> select /*+ rule */ *
from test where id=1;
50314 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2473784974
------------------------------------------------
| Id | Operation | Name |
------------------------------------------------
| 0 | SELECT STATEMENT | |
| 1 |
TABLE ACCESS BY INDEX ROWID| TEST
|
|* 2 |
INDEX RANGE SCAN |
IDX_TEST |
------------------------------------------------
Predicate Information
(identified by operation id):
---------------------------------------------------
2 - access("ID"=1)
Note
-----
- rule based optimizer used (consider using
cbo)
Statistics
----------------------------------------------------------
1
recursive calls
0
db block gets
7012
consistent gets
97
physical reads
0
redo size
2243353
bytes sent via SQL*Net to client
37363
bytes received via SQL*Net from client
3356
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
50314
rows processed
从执行计划可以看出,RBO的执行计划让人有点失望,对于ID=1,几乎所有的数据全部符合谓词条件,走索引只能增加额外的开销(因为ORACLE首先要访问索引数据块,在索引上找到了对应的键值,然后按照键值上的ROWID再去访问表中相应数据),既然我们几乎要访问所有表中的数据,那么全表扫描自然是最优的选择。而RBO选择了错误的执行计划。可以对比一下CBO下SQL的执行计划,显然它对数据敏感,执行计划及时的根据��据量做了调整,当查询条件为1时,它走全表扫描;当查询条件为100时,它走区间索引扫描。如下所示:
SQL> select * from test where id=1;
50314 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1357081020
--------------------------------------------------------------------------
| Id |
Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | | 49075 |
3786K| 52 (2)| 00:00:01 |
|* 1 | TABLE ACCESS FULL| TEST | 49075 | 3786K|
52 (2)| 00:00:01 |
--------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 -
filter("ID"=1)
Note
-----
- dynamic
sampling used for this statement
Statistics
----------------------------------------------------------
32 recursive calls
0 db block gets
3644 consistent gets
0 physical reads
0 redo size
1689175 bytes sent via SQL*Net to
client
37363 bytes received via SQL*Net
from client
3356 SQL*Net roundtrips to/from
client
0
sorts (memory)
0 sorts (disk)
50314 rows processed
SQL> select * from test where id =100;
Execution Plan
----------------------------------------------------------
Plan hash value: 2473784974
----------------------------------------------------------------------------------------
| Id |
Operation | Name | Rows
| Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT
STATEMENT | |
1 | 79 | 2
(0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID| TEST |
1 | 79 | 2
(0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_TEST | 1 |
| 1 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 -
access("ID"=100)
Note
-----
- dynamic
sampling used for this statement
Statistics
----------------------------------------------------------
9 recursive calls
0 db block gets
73 consistent gets
0 physical reads
0 redo size
588
bytes sent via SQL*Net to client
469 bytes received via SQL*Net
from client
2 SQL*Net roundtrips to/from
client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
仅此一项就可以看出为什么ORACLE极力推荐使用CBO,从ORACLE
10g开始不支持RBO的缘故。所谓长江后浪推前浪,前浪死在沙滩上。
CBO知识点的总结
CBO优化器根据SQL语句生成一组可能被使用的执行计划,估算出每个执行计划的代价,并调用计划生成器(Plan Generator)生成执行计划,比较执行计划的代价,最终选择选择一个代价最小的执行计划。查询优化器由查询转换器(Query Transform)、代价估算器(Estimator)和计划生成器(Plan Generator)组成。
CBO优化器组件
CBO由以下组件构成:
·
查询转化器(Query Transformer)
查询转换器的作用就是等价改变查询语句的形式,以便产生更好的执行计划。它决定是否重写用户的查询(包括视图合并、谓词推进、非嵌套子查询/子查询反嵌套、物化视图重写),以生成更好的查询计划。
The input to the query
transformer is a parsed query, which is represented by a set of
query blocks. The query blocks
are nested or interrelated to each other. The form of the
query determines how the query
blocks are interrelated to each other. The main
objective of the query
transformer is to determine if it is advantageous to change the
form of the query so that it
enables generation of a better query plan. Several different
query transformation techniques
are employed by the query transformer, including:
■ View Merging
■ Predicate Pushing
■ Subquery Unnesting
■ Query Rewrite with
Materialized Views
Any combination of these
transformations can be applied to a given query.
·
代价评估器(Estimator)
评估器通过复杂的算法结合来统计信息的三个值来评估各个执行计划的总体成本:选择性(Selectivity)、基数(Cardinality)、成本(Cost)
计划生成器会考虑可能的访问路径(Access Path)、关联方法和关联顺序,生成不同的执行计划,让查询优化器从这些计划中选择出执行代价最小的一个计划。
·
计划生成器(Plan Generator)
计划生成器就是生成大量的执行计划,然后选择其总体代价或总体成本最低的一个执行计划。
由于不同的访问路径、连��方式和连接顺序可以组合,虽然以不同的方式访问和处理数据,但是可以产生同样的结果
下图是我自己为了加深理解,用工具画的图
查看ORACLE优化器
mso-hansi-font-family:Calibri;mso-hansi-theme-font:minor-latin;mso-bidi-font-family:
宋体;mso-font-kerning:0pt;mso-ansi-language:X-NONE">SQL> show parameter
optimizer_mode;
mso-hansi-font-family:Calibri;mso-hansi-theme-font:minor-latin;mso-bidi-font-family:
宋体;mso-font-kerning:0pt;mso-ansi-language:X-NONE">
mso-hansi-font-family:Calibri;mso-hansi-theme-font:minor-latin;mso-bidi-font-family:
宋体;mso-font-kerning:0pt;mso-ansi-language:X-NONE">NAME TYPE VALUE
mso-hansi-font-family:Calibri;mso-hansi-theme-font:minor-latin;mso-bidi-font-family:
宋体;mso-font-kerning:0pt;mso-ansi-language:X-NONE">------------------------------------
----------- ------------------------------
mso-hansi-font-family:Calibri;mso-hansi-theme-font:minor-latin;mso-bidi-font-family:
宋体;mso-font-kerning:0pt;mso-ansi-language:X-NONE">optimizer_mode string ALL_ROWS
修改ORACLE优化器
ORACLE 10g 优化器可以从系统级别、会话级别、语句级别三种方式修改优化器模式,非常方便灵活。
其中optimizer_mode可以选择的值有: first_rows,all_rows, choose, rule
其中first_rows_n又有first_rows_1000, first_rows_100, first_rows_10, first_rows_1
在Oracle 9i中,优化器模式有下面一些
Rule:基于规则的方式。
Choolse:指的是当一个表或或索引有统计信息,则走CBO的方式,如果表或索引没统计信息,表又不是特别的小,而且相应的列有索引时,那么就走索引,走RBO的方式。
If OPTIMIZER_MODE=CHOOSE,
if statistics do not exist, and if you do not add hints to SQL statements, then
SQL statements use the RBO. You can use the RBO to access both relational data
and object types. If OPTIMIZER_MODE=FIRST_ROWS, FIRST_ROWS_n, or ALL_ROWS and
no statistics exist, then the CBO uses default statistics. Migrate existing
applications to use the cost-based approach.
First Rows:它与Choose方���是类似的,所不同的是当一个表有统计信息时,它将是以最快的方式返回查询的最先的几行,从总体上减少了响应时间。
All Rows: 10g中的默认值,也就是我们所说的Cost的方式,当一个表有统计信息时,它将以最快的方式返回表的所有的行,从总体上提高查询的吞吐
虽然Oracle 10g中不再支持RBO,Oracle g官方文档关于optimizer_mode参数的只有first_rows和all_rows.但是依然可以设置 optimizer_mode为rule或choose,估计是ORACLE为了过渡或向下兼容考虑。如下所示。
下面是ORACLE 10g中关于优化器模式的介绍:
OPTIMIZER_MODE
Property
|
Description
|
Parameter type
|
String
|
Syntax
|
OPTIMIZER_MODE =
{ first_rows_[1 10 100 1000] first_rows all_rows }
|
Default value
|
all_rows
|
Modifiable
|
ALTER SESSION,ALTER SYSTEM
|
4.
OPTIMIZER_MODEestablishes the default behavior. for choosing an
optimization approach for the instance.
Values:first_rows_n
The optimizer uses a
cost-based approach and optimizes with a goal of best response time to return
the firstnrows (wheren= 1, 10, 100, 1000).
first_rows
The optimizer uses a mix of
costs and heuristics to find a best plan for fast delivery of the first few
rows.
all_rows
The optimizer uses a
cost-based approach for all SQL statements in the session and optimizes with a goal
of best throughput (minimum resource use to complete the entire statement).
系统级别
SQL> alter system set optimizer_mode=rule scope=both;
System altered.
SQL> show parameter optimizer_mode
NAME TYPE VALUE
------------------------------------ -----------
------------------------------
optimizer_mode string RULE
会话级别
会话级别修改优化器模式,只对当前会话有效,其它会话依然使用系统优化器模式。
SQL> alter session set
optimizer_mode=first_rows_100;
Session altered.
语句级别
语句级别通过使用提示hints来实现。
SQL> select /*+ rule */ * from dba_objects where
rownum <= 10;
-------------------------------------------------------------------------------------------------------------------------------------
ORACLE提供了CBO、RBO两种SQL优化器。CBO在ORACLE
7 引入,但在ORACLE 8i 中才成熟。ORACLE 已经明确声明在ORACLE 9i之后的版本中(ORACLE 10G ),RBO将不再支持。因此选择CBO 是必然的趋势。
CBO和 RBO作为不同的SQL优化器,对SQL语句的执行计划产生重大影响,如果要对现有的应用程序从RBO向CBO移植,则必须充分考虑这些影响,避免SQL语句性能急剧下降;但是,对新的应用系统,则可以考虑直接使用CBO,在CBO模式下进行SQL语句编写、分析执行计划、性能测试等工作,这需要开发者对CBO的特性比较熟悉。以下小结几点在CBO下写SQL语句的注意事项:
1、RBO自ORACLE
6以来被采用,有着一套严格的使用规则,只要你按照它去写SQL语句,无论数据表中的内容怎样,也不会影响到你的“执行计划”,也就是说对数据不“敏感”;CBO计算各种可能“执行计划”的“代价”,即COST,从中选用cost最低的方案,作为实际运行方案。各“执行计划”的COST的计算根据,依赖于数据表中数据的统计分布,ORACLE数据库本身对该统计分布并不清楚,必须要分析表和相关的索引(使用ANALYZE
命令),才能搜集到CBO所需的数据。
2���使用CBO 时,编写SQL语句时,不必考虑"FROM" 子句后面的表或视图的顺序和"WHERE" 子句后面的条件顺序;ORACLE自7版以来采用的许多新技术都是基于CBO的,如星型连接排列查询,哈希连接查询,函数索引,和并行查询等。
3、一般而言,CBO所选择的“执行计划”都不会比RBO的“执行计划”差,而且相对而言,CBO对程序员的要求没有RBO那么苛刻,节省了程序员为了从多个可能的“执行计划”中选择一个最优的方案而花费的调试时间,但在某些场合下也会存在问题。较典型的问题有:有时,表明明建有索引,但查询过程显然没有用到相关的索引,导致查询过程耗时漫长,占用资源巨大,这时就需要仔细分析执行计划,找出原因。例如,可以看连接顺序是否允许使用相关索引。假设表emp的deptno列上有索引,表dept的列deptno上无索引,WHERE语句有emp.deptno=dept.deptno条件。在做NL连接时,emp做为外表,先被访问,由于连接机制原因,外表的数据访问方式是全表扫描,emp.deptno上的索引显然是用不上,最多在其上做索引全扫描或索引快速全扫描。
4、如果一个语句使用 RBO的执行计划确实比CBO 好,则可以通过添加提示"RULE" 提示,强制使用RBO。
5、使用CBO 时,SQL语句 "FROM" 子句后面的表,必须全部使用ANALYZE 命令分析过,如果"FROM" 子句后面的是视图,则此视图的基础表,也必须全部使用ANALYZE 命令分析过;否则,ORACLE 会在执行此SQL语句之前,自动进行ANALYZE 命令分析,这会极大导致SQL语句执行极其缓慢。
6、使用CBO 时,SQL语句 "FROM" 子句后面的表的个数不宜太多,因为CBO在选择表连接顺序时,会对"FROM" 子句后面的表进行阶乘运算,选择最好的一个连接顺序。假如"FROM" 子句后有6个表,则其可选择的连接顺序就是6*5*4*3*2*1 = 720 种,CBO 选择其中一种,而如果"FROM" 子句后有12个表,则其可选择的连接顺序就是12*11*10*9*8*7*6*5*4*3*2*1= 479001600 种,可以想象从中选择一种,会消耗多少CPU 时间?如果实在是要访问很多表,则最好使用 ORDER 提示,强制使用"FROM" 子句表固定的访问顺序。
7、使用CBO 时,SQL语句中不能引用系统数据字典表或视图,因为系统数据字典表都未被分析过,可能导致极差的“执行计划”。但是不要擅自对数据字典表做分析,否则可能导致死锁,或系统性能严重下降。
8、使用CBO 时,要注意看采用了哪种类型的表连接方式。ORACLE的共有Sort Merge Join(SMJ)、Hash Join(HJ)和Nested Loop Join(NL)。CBO有时会偏重于SMJ 和 HJ,但在OLTP 系统中,NL 一般会更好,因为它高效的使用了索引。在两张表连接,且内表的目标列上建有索引时,只有Nested Loop才能有效地利用到该索引。SMJ即使相关列上建有索引,最多只能因索引的存在,避免数据排序过程。HJ由于须做HASH运算,索引的存在对数据查询速度几乎没有影响。
9、使用CBO 时,必须保证为表和相关的索引搜集足够的统计数据。对数据经常有增、删、改的表最好定期对表和索引进行分析,可用SQL语句“analyze table xxx compute
statistics for all indexes;"ORACLE掌握了充分反映实际的统计数据,才有可能做出正确的选��。
10、使用CBO 时,要注意被索引的字段的值的数据分布,会影响SQL语句的执行计划。例如:表emp,共有一百万行数据,但其中的emp.deptno列,数据只有4种不同的值,如10、20、30、40。虽然emp数据行有很多,ORACLE缺省认定表中列的值是在所有数据行均匀分布的,也就是说每种deptno值各有25万数据行与之对应。假设SQL搜索条件DEPTNO=10,利用deptno列上的索引进行数据搜索效率,往往不比全表扫描的高,ORACLE理所当然对索引“视而不见”,认为该索引的选择性不高。我们考虑另一种情况,如果一百万数据行实际不是在4种deptno值间平均分配,其中有99万行对应着值10,5000行对应值20,3000行 对应值30,2000行对应值40。在这种数据分布图案中对除值为10外的其它deptno值搜索时,毫无疑问,如果索引能被应用,那么效率会高出很多。我们可以采用对该索引列进行单独分析,或用analyze语句对该列建立直方图,对该列搜集足够的统计数据,使ORACLE在搜索选择性较高的值能用上索引。
实验例子
1:在CBO模式下,字段有索引但是不走索引的例子
如下所示,在DM.TM_WGCRNT_MSCIORELAY_DAY的DATE_CD, CITY_ID字段上建有索引:
SQL> SELECT INDEX_OWNER, INDEX_NAME,
TABLE_OWNER, TABLE_NAME,COLUMN_NAME FROM DBA_IND_COLUMNS WHERE
TABLE_NAME='TM_WGCRNT_MSCIORELAY_DAY';
INDEX_OWNE INDEX_NAME TABLE_OWNE TABLE_NAME COLUMN_NAME
---------- ------------------------------
---------- ------------------------------
----------------------------------------
DM
PK_TM_WGCRNT_MSCIORELAY_DAY
DM
TM_WGCRNT_MSCIORELAY_DAY
DATE_CD
DM
PK_TM_WGCRNT_MSCIORELAY_DAY
DM TM_WGCRNT_MSCIORELAY_DAY CITY_ID
SQL>
但是下面脚本,不走DM.TM_WGCRNT_MSCIORELAY_DAY表的索引,走全表扫描。
SQL> set autotrace traceonly
SQL> SELECT '2012-01-13' AS START_DATE,
2 '2012-02-19' AS END_DATE,
3 TO_CHAR(TO_DATE(DATE_CD,
'YYYY-MM-DD'), 'YYYY-MM-DD') AS DATE_CD,
4 DM.CITY_ID,
5 INCALL_ERL AS INCALL_ERL,
OUTCALL_ERL AS OUTCALL_ERL,
ALL_ERL AS ALL_ERL,
MAX_ERL_HR AS HR_CD,
MAX_ERL AS MAX_ERL
FROM
REF.TR_WG_GD_CITY B, DM.TM_WGCRNT_MSCIORELAY_DAY DM
WHERE
DM.CITY_ID = B.CITY_ID
AND
DM.DATE_CD BETWEEN
TRANSLATE('2012-01-13', '0123456789年月日?-', '0123456789') AND
TRANSLATE('2012-02-19', '0123456789年月日?-', '0123456789')
AND
DM.CITY_ID = 9999;
38 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1848119847
-----------------------------------------------------------------------------------------------
| Id |
Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------
| 0 |
SELECT STATEMENT | | 36 |
1368 | 6 (0)| 00:00:01 |
| 1
| NESTED LOOPS | |
36 | 1368 | 6
(0)| 00:00:01 |
|* 2
| INDEX UNIQUE SCAN|
PK_TR_WG_GD_CITY | 1 |
4 | 0 (0)| 00:00:01 |
|* 3
| TABLE ACCESS FULL|
TM_WGCRNT_MSCIORELAY_DAY | 36 | 1224 |
6 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------
Predicate Information (identified by operation
id):
---------------------------------------------------
2 -
access("B"."CITY_ID"=9999)
3 -
filter("DM"."CITY_ID"=9999 AND
"DM"."DATE_CD">=20120113 AND
"DM"."DATE_CD"<=20120219)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
27 consistent gets
0 physical reads
0 redo size
2944 bytes sent via SQL*Net to
client
360 bytes received via SQL*Net
from client
4 SQL*Net roundtrips to/from
client
0 sorts (memory)
0 sorts (disk)
38 rows processed
SQL>
刚开始以为是没有收集表信息或表信息太老旧,导致CBO选择执行计划不准确,那么收集一下表、索引的统计信息
SQL>
exec dbms_stats.gather_table_stats(ownname => 'DM', tabname =>
'TM_WGCRNT_MSCIORELAY_DAY', estimate_percent =>
dbms_stats.AUTO_SAMPLE_SIZE);
PL/SQL
procedure successfully completed
SQL>
exec dbms_stats.gather_index_stats(ownname => 'DM', indname =>
'PK_TM_WGCRNT_MSCIORELAY_DAY');
PL/SQL
procedure successfully completed
结果发现执行计划还是原来的情况,根本没有丝毫变化。于是用下面脚本测试了一下,发现它会走INDEX UNIQUE SCAN扫描。
SQL>
SELECT * FROM DM.TM_WGCRNT_MSCIORELAY_DAY WHERE DATE_CD =
20120216 AND CITY_ID = 9999;
Execution
Plan
----------------------------------------------------------
Plan
hash value: 1126807173
-----------------------------------------------------------------------------------------------------------
|
Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 |
34 | 2 (0)| 00:00:01 |
| 1 |
TABLE ACCESS BY INDEX ROWID| TM_WGCRNT_MSCIORELAY_DAY |
1 | 34 | 2
(0)| 00:00:01 |
|* 2 |
INDEX UNIQUE SCAN | PK_TM_WGCRNT_MSCIORELAY_DAY | 1 |
| 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------
Predicate
Information (identified by operation id):
---------------------------------------------------
2 - access("DATE_CD"=20120216 AND
"CITY_ID"=9999)
Statistics
----------------------------------------------------------
215
recursive calls
0
db block gets
52
consistent gets
2
physical reads
0
redo size
641
bytes sent via SQL*Net to client
327
bytes received via SQL*Net from client
1
SQL*Net roundtrips to/from client
5
sorts (memory)
0
sorts (disk)
1
rows processed
但是如果是查询一段时间的数据,CBO则会选择走全表扫描,如下所示
SQL>
SELECT * FROM DM.TM_WGCRNT_MSCIORELAY_DAY WHERE
DATE_CD BETWEEN 20120113 AND 20120219
AND CITY_ID = 9999;
38 rows
selected.
Execution
Plan
----------------------------------------------------------
Plan
hash value: 340989980
----------------------------------------------------------------------------------------------
|
Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 36 |
1224 | 6 (0)| 00:00:01 |
|* 1 |
TABLE ACCESS FULL| TM_WGCRNT_MSCIORELAY_DAY | 36 |
1224 | 6 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------
Predicate
Information (identified by operation id):
---------------------------------------------------
1 - filter("CITY_ID"=9999 AND
"DATE_CD">=20120113 AND "DATE_CD"<=20120219)
Statistics
----------------------------------------------------------
1
recursive calls
0
db block gets
26
consistent gets
0
physical reads
0
redo size
2539
bytes sent via SQL*Net to client
360
bytes received via SQL*Net from client
4
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
38
rows processed
SQL>
查看数据情况,才发现从20120113到20120219这段时间,有不少记录。所有估计CBO由于查询记录比较多,所以从代价最小的选择出发才会选择全表扫描。如果,我查询的天数比较少,是否会走索引呢?
如下所示,我只查询13号到17号的数据,执行计划果真走索引(如下所示),看来CBO果真会选择代价最小的执行计划,所以在CBO模式下,突然不走索引,走全表扫描,也不要大惊小怪。这是CBO选择代价最小的执行计划的缘故。
SQL>
SELECT '2012-01-13' AS START_DATE,
2
'2012-02-19' AS END_DATE,
3
TO_CHAR(TO_DATE(DATE_CD, 'YYYY-MM-DD'), 'YYYY-MM-DD') AS DATE_CD,
4
DM.CITY_ID,
INCALL_ERL AS INCALL_ERL,
5
6 OUTCALL_ERL AS
OUTCALL_ERL,
7
ALL_ERL AS ALL_ERL,
8
MAX_ERL_HR AS HR_CD,
9
MAX_ERL AS MAX_ERL
FROM REF.TR_WG_GD_CITY B,
DM.TM_WGCRNT_MSCIORELAY_DAY DM
10
11 WHERE DM.CITY_ID = B.CITY_ID
12
AND DM.DATE_CD BETWEEN
13
TRANSLATE('2012-01-13', '0123456789年月日?-', '0123456789') AND
14
TRANSLATE('2012-01-17', '0123456789年月日?-', '0123456789')
15
AND DM.CITY_ID = 9999;
Execution
Plan
----------------------------------------------------------
Plan
hash value: 1152910450
------------------------------------------------------------------------------------------------------------
|
Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5 |
190 | 6 (0)| 00:00:01 |
| 1 |
NESTED LOOPS | | 5 |
190 | 6 (0)| 00:00:01 |
|* 2 |
INDEX UNIQUE SCAN |
PK_TR_WG_GD_CITY | 1 |
4 | 0 (0)| 00:00:01 |
| 3 |
TABLE ACCESS BY INDEX ROWID| TM_WGCRNT_MSCIORELAY_DAY |
5 | 170 | 6
(0)| 00:00:01 |
|* 4 |
INDEX RANGE SCAN |
PK_TM_WGCRNT_MSCIORELAY_DAY | 5 | |
1 (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------
Predicate
Information (identified by operation id):
---------------------------------------------------
2 -
access("B"."CITY_ID"=9999)
4 - access("DM"."DATE_CD">=20120113
AND "DM"."CITY_ID"=9999 AND
"DM"."DATE_CD"<=20120117)
filter("DM"."CITY_ID"=9999)
Statistics
----------------------------------------------------------
1
recursive calls
0
db block gets
6 consistent gets
0
physical reads
0
redo size
1058
bytes sent via SQL*Net to client
338
bytes received via SQL*Net from client
2
SQL*Net roundtrips to/from client
0
sorts (memory)
0
sorts (disk)
5
rows processed
Oracle has announced that with Oracle 11g, the
rule-based optimizer (RBO) will no longer be supported, and you need to move
quickly to migrate to the cost-based optimizer (CBO).
OPTIMIZER_MODE=RULE is not supported in Oracle
Database 11g; the code has only been left in Oracle to enable easy migrations
from RBO to CBO. We do not recommended you use OPTIMIZER_MODE=RULE as
a long term strategy in Oracle Database 11g as the code can be removed at any
time now that it is de-supported.
|
Migrating to the cost-based optimizer is tricky and dangerous, and you need to hire experts who have experience doing an RBO to CBO migration.
Don't risk an unplanned outage. |
For those experienced DBA's who wish to migrate
away from rule-based optimization, here are some tips.
With a large number of Oracle shops using the
rule-based optimizer (RBO), migration to cost-based optimization (CBO) will
become a important task when migrating to Oracle11g, where the RBO will
disappear.
As you may know, Oracle provides several
parameters that can adjust the behavior of the CBO to make it more like
rule-based optimization:
·
·
optimizer_index_caching
·
·
optimizer_index_cost_adj
·
·
optimizer_max_permutations
·
·
optimizer_search_limit
Today we examine optimizer_index_caching,
and we will cover the other parameters in later tips.
Important Note: Prior to Oracle 10g, adjusting
these optimizer parameters was the only way to compensate for sample size
issues with dbms_stats. As of 10g, the use ofdbms_stats.gather_system_stats and
improved sampling within dbms_stats had made adjustments to
these parameters far less important. Ceteris Parabus, always adjust CBO
statistics before adjusting optimizer parms. For more details on
optimizer parameters, see my latest book "Oracle
Tuning: The Definitive Reference".
optimizer_index_cost_adj
This is the most important parameter of all, and
the default setting of 100 is incorrect for most Oracle systems. For some
OLTP systems, re-setting this parameter to a smaller value (between 10- to 30)
may result in huge performance gains!
If you are having slow performance because the
CBO first_rows optimizer
mode is favoring too many full-table scans, you can reduce the value of theoptimizer_index_cost_adj parameter
to immediately tune all of the SQL in your database to favor index scans over
full-table scans. This is sometimes a “silver bullet” that can improve the
performance of an entire database in cases where the database is OLTP and you
have verified that the full-table scan costing is too low.
Even in Oracle, the CBO sometimes falsely
determines that the cost of full-table scan is less than the cost of an index
access. The optimizer_index_cost_adj parameter
is a great approach to whole-system SQL tuning, but you will need to evaluate
the overall effect by slowly resetting the value down from 100 and observing
the percentage of full-tale scans. You can also slowly bump down the value ofoptimizer_index_cost_adj when
you bounce the database and then either use theaccess.sql or plan.sql scripts
or reexamine SQL from the STATSPACKstats$sql_summary table to see
the net effect of index scans on the whole database.
The plan.sql script (see code depot from book below) uses the v$sql_plan view
and a quickly the reduction in sub-optimal, large-table full-table scans:
Full table scans and counts
Note that "K" indicates in the table is in the KEEP pool.
OWNER
NAME
NUM_ROWS C K BLOCKS NBR_FTS
-------------- ------------------------
--------- - - -------- --------
SYS
DUAL
N 2 97,237
SYSTEM
SQLPLUS_PRODUCT_PROFILE
N K 2 16,178
DONALD
PAGE 3,450,209
N 932,120 9,999
DONALD
RWU_PAGE
434 N 8
7,355
DONALD
PAGE_IMAGE
18,067 N 1,104 5,368
DONALD
SUBSCRIPTION
476
N K 192 2,087
DONALD
PRINT_PAGE_RANGE
10 N K
32 874
The optimizer_index_caching parameter
The optimizer_index_caching parameter
is a percentage parameter with valid values between zero and 100. This
parameter lets you adjust the behavior of the cost-based optimizer to select
nested loop joins more often or less often. The cost of executing a nested loop
join where an index is used to access the inner table is highly dependent on
the caching of that index in the buffer cache. The amount of index caching
depends on factors, such as the load on the system and the block access
patterns of different users, that the optimizer cannot predict. Of course, you
may cache an index by placing the data block in the KEEP pool, thereby ensuring
that the blocks are always cached.
Setting optimizer_index_caching to
a higher percentage makes nested loop joins look less expensive to the
optimizer, which will be more likely to pick nested loop joins over hash or
sort merge joins.
The default value for the optimizer_index_caching parameter
is 0, which gives the highest preference to hash joins and sort merge joins.
Resetting this parameter can be very dangerous if you are not using stored
outlines because it could change the execution plans for thousands of SQL
statements. Also, because the cost-based optimizer will generally only invoke
sort merge joins when there are no indexes on the joined tables, this parameter
has the most effect on the invocation of hash joins.
According to Oracle, selective indexes are favored
by optimizier_index_caching. The result of using lower values
for this parameter will be the optimizer effectively modeling the caches of
non-leaf index blocks.
For this situation, the cost of using this index will
be based mostly on its selectivity. Using a lower valuse of optimizer_index_caching will
result in an index caching model that is less likely to overuse potentially
lesser desirable indexes with poorer selectivity.
Even though Oracle has deprecated the rule-based
optimizer, Oracle continues to use the rule hint in Oracle 11g, as shown by
this Data Pump internal SQL:
Module: Data Pump Worker
SELECT /*+rule*/
SYS_XMLGEN(VALUE(KU$),XMLFORMAT.createFormat2(
'TABLE_T', '7')), KU$.OBJ_NUM ,KU$.ANC_OBJ.NAME
. . .
More information is available on the following pages:
·
·
Oracle rule-based SQL Optimizer
Migration
·
·
Migrate optimizer from RBO to
CBO
参考资料:
http://docs.oracle.com/cd/B10501_01/server.920/a96533/rbo.htm
http://oxiaobai.blog.51cto.com/3369332/752142
http://book.2cto.com/201212/12731.html
http://wenku.baidu.com/view/03cb78f79e314332396893b9.html
该贴被潇湘隐者编辑于2014-7-27 15:36:54
该贴由hui.chen转至本版2014-11-5 16:24:15