- 浏览: 11714986 次
- 性别:
- 来自: 深圳
文章分类
最新评论
-
笨蛋咯:
获取不到信息?
C#枚举硬件设备 -
guokaiwhu:
能把plan的数据结构图画出来,博主的耐心和细致令人佩服。
PostgreSQL服务过程中的那些事二:Pg服务进程处理简单查询五:规划成plantree -
gao807877817:
学习
BitmapFactory.Options详解 -
GB654:
楼主,我想问一下,如何在创建PPT时插入备注信息,虽然可以解析 ...
java转换ppt,ppt转成图片,获取备注,获取文本 -
jpsb:
多谢 ,不过我照搬你的sql查不到,去掉utl_raw.cas ...
关于oracle中clob字段查询的问题
HASH JOIN ,MERGE JOIN ,NESTED LOOP用法效率比较
概述:
NESTEDLOOP:
对于被连接的数据子集较小的情况,嵌套循环连接是个较好的选择。在嵌套循环中,内表被外表驱动,外表返回的每一行都要在内表中检索找到与它匹配的行,因此整个查询返回的结果集不能太大(大于1万不适合),要把返回子集较小表的作为外表(CBO默认外表是驱动表),而且在内表的连接字段上一定要有索引。当然也可以用ORDERED提示来改变CBO默认的驱动表,使用USE_NL(table_name1 table_name2)可是强制CBO执行嵌套循环连接。
HASH JOIN :
散列连接是CBO做大数据集连接时的常用方式,优化器使用两个表中较小的表(或数据源)利用连接键在内存中建立散列表,然后扫描较大的表并探测散列表,找出与散列表匹配的行。
这种方式适用于较小的表完全可以放于内存中的情况,这样总成本就是访问两个表的成本之和。但是在表很大的情况下并不能完全放入内存,这时优化器会将它分割成若干不同的分区,不能放入内存的部分就把该分区写入磁盘的临时段,此时要有较大的临时段从而尽量提高I/O的性能。
也可以用USE_HASH(table_name1 table_name2)提示来强制使用散列连接。如果使用散列连接HASH_AREA_SIZE初始化参数必须足够的大,如果是9i,Oracle建议使用SQL工作区自动管理,设置WORKAREA_SIZE_POLICY为AUTO,然后调整PGA_AGGREGATE_TARGET即可。
MERGE JOIN排序合并连接
通常情况下散列连接的效果都比排序合并连接要好,然而如果行源已经被排过序,在执行排序合并连接时不需要再排序了,这时排序合并连接的性能会优于散列连接。可以使用USE_MERGE(table_name1 table_name2)来强制使用排序合并连接
几种方式的操作方式
merge join
merge join的操作通常分三步:
1、对连接的每个表做table access full;
2、对table access full的结果进行排序。
3、进行merge join对排序结果进行合并。
在全表扫描比索引范围扫描再通过rowid进行表访问更可取的情况下,merge join会比nested loops性能更佳。当表特别小或特别巨大的时候,实行全表访问可能会比索引范围扫描更有效。mrege join的性能开销几乎都在前两步。
hash join
对两个表进行全表扫描,然后oracle读取涉及连接的其中一个表,并且在内存里创建来自表的连接列的唯一关键字的位图。当读取和处理第二个表的行时,创建值的位图被用做过滤器。如果一个行成功的通过位图过滤,则hash算法用于数据查找和后来的连接。(这里涉及数学问题,我也弄的不是很清楚)。
以下条件下hash join可能有优势:
两个巨大的表之间的连接。
在一个巨大的表和一个小表之间的连接。
Nested Loops
会循环外表(驱动表),逐个比对和内表的连接是否符合条件。在驱动表比较小,内表比较大,而且内外表的连接列有索引的时候比较好。当SORT_AREA空间不足的时候,Oracle也会选择使用NL。基于Cost的Oracle优化器(CBO)会自动选择较小的表做外表。
连接方式总结:
1))嵌套循环(nest loop):
对于被连接的数据子集较小的情况,嵌套循环连接是较好的选择。在嵌套循环中,外表驱动内表,外表返回的每一行都要在内表中检索找到它匹配的行,因此整个查询返回的结果集不能太大(大于10000不合适),要把返回子集较小的表作为外表(驱动表),而且在内表的连接字段上一定要有索引。
2)哈希连接(hash join):
哈希连接是大数据集连接时常用的方式,优化器使用两个表中较小的表,利用连接键在内存中建立散列表,然后扫描较大的表并探测散列表,找出与散列表匹配的行。
这种方式适用于较小的表完全可以放入内存的情况,这样成本就是访问两个表的成本之和。但是在表很大的情况下并不能完全放入内存,这时优化器将它分割成若干不同的分区,不能放入内存的部分就把该分区写入磁盘的临时段。
哈希连接只能应用于等值连接(如WHERE A.COL3 = B.COL4)、非等值连接(WHERE A.COL3 > B.COL4)、外连接(WHERE A.COL3 = B.COL4(+))。
3)排序合并连接(Sort Merge Join )
通常情况下哈希连接的效果都比排序合并连接要好。然而如果行源已经被排过序,在执行排序合并连接时不需要再排序了,这时排序归并连接的性能会忧于哈希连接。
测试实验:
--创建测试表
SQL> Create Table t As Select * From dba_objects;
SQL> Insert /*+ append*/ t Select * From t;
SQL> Insert /*+ append*/ t Select * From t;
......
SQL> commit;
SQL> Create Table t1 As Select * From user_objects;
SQL> Create Table t2 As Select * From dba_objects;
SQL> Insert /*+ append*/ t2 Select * From t2;
......
SQL> commit;
SQL> select count(1) from t;
COUNT(1)
----------
3199936
SQL> select count(1) from t2;
COUNT(1)
----------
101220
SQL> select count(1) from t1;
COUNT(1)
----------
13
SQL> Create Index ind_id On t(object_id);
SQL> Create Index ind1_id On t1(object_id);
SQL> Create Index ind2_id On t2(object_id);
1.默认使用Nested loops的SQL语句的情况
--不加hints会默认使用Nested loops join
SQL> Select * From t,t1 Where t.object_id=t1.object_id;
576 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1308275697
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 833 | 208K| 862 (1)| 00:00:11 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 64 | 6144 | 66 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 833 | 208K| 862 (1)| 00:00:11 |
| 3 | TABLE ACCESS FULL | T1 | 13 | 2080 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IND_ID | 64 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T"."OBJECT_ID"="T1"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
5 recursive calls
0 db block gets
687 consistent gets
0 physical reads
1484 redo size
14877 bytes sent via SQL*Net to client
910 bytes received via SQL*Net from client
40 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
576 rows processed
--强制使用Hash Jion
SQL> Select /*+ use_hash(t1,t)*/ * From t1,t Where t.object_id=t1.object_id;
576 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1444793974
---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 833 | 208K| 9931 (3)| 00:02:00 |
|* 1 | HASH JOIN | | 833 | 208K| 9931 (3)| 00:02:00 |
| 2 | TABLE ACCESS FULL| T1 | 13 | 2080 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T | 3204K| 293M| 9890 (2)| 00:01:59 |
---------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T"."OBJECT_ID"="T1"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
44124 consistent gets
0 physical reads
524 redo size
71981 bytes sent via SQL*Net to client
910 bytes received via SQL*Net from client
40 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
576 rows processed
--强制使用Merge join
SQL> Select /*+ use_merge(t1,t)*/ * From t1,t Where t.object_id=t1.object_id;
576 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2036048135
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 833 | 208K| | 80548 (1)| 00:16:07 |
| 1 | MERGE JOIN | | 833 | 208K| | 80548 (1)| 00:16:07 |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 13 | 2080 | | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | IND1_ID | 13 | | | 1 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 3204K| 293M| 794M| 80546 (1)| 00:16:07 |
| 5 | TABLE ACCESS FULL | T | 3204K| 293M| | 9890 (2)| 00:01:59 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("T"."OBJECT_ID"="T1"."OBJECT_ID")
filter("T"."OBJECT_ID"="T1"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
44097 consistent gets
0 physical reads
524 redo size
14899 bytes sent via SQL*Net to client
910 bytes received via SQL*Net from client
40 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
576 rows processed
结论:
逻辑读分别为:
Nested loops join 687
Hash Jion 44124
Merge join 44097
可以看到,大表t和小表t1关联使用Nested loops join关联性能会比较好。
2.默认使用Hash Jion的SQL语句的情况
--不加hints会默认使用Hash Jion
SQL> set timing on
SQL> Select t.object_id From t,t2 Where t.object_id=t2.object_id;
6364032 rows selected.
Elapsed: 00:00:31.58
Execution Plan
----------------------------------------------------------
Plan hash value: 3237705646
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5713K| 98M| | 4442 (4)| 00:00:54 |
|* 1 | HASH JOIN | | 5713K| 98M| 2224K| 4442 (4)| 00:00:54 |
| 2 | INDEX FAST FULL SCAN| IND2_ID | 90908 | 1154K| | 55 (4)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| IND_ID | 3204K| 15M| | 1596 (4)| 00:00:20 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
430719 consistent gets
0 physical reads
0 redo size
109700403 bytes sent via SQL*Net to client
4667440 bytes received via SQL*Net from client
424270 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6364032 rows processed
--强制使用Nested loops join
SQL> Select /*+ use_nl(t,t2)*/ t.object_id From t,t2 Where t.object_id=t2.object_id;
6364032 rows selected.
Elapsed: 00:00:31.45
Execution Plan
----------------------------------------------------------
Plan hash value: 3551542035
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5713K| 98M| 98323 (1)| 00:19:40 |
| 1 | NESTED LOOPS | | 5713K| 98M| 98323 (1)| 00:19:40 |
| 2 | INDEX FAST FULL SCAN| IND2_ID | 90908 | 1154K| 55 (4)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IND_ID | 63 | 315 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
838760 consistent gets
0 physical reads
0 redo size
109700403 bytes sent via SQL*Net to client
4667440 bytes received via SQL*Net from client
424270 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6364032 rows processed
--强制使用merge join
SQL> Select /*+ use_merge(t,t2)*/ t.object_id From t,t2 Where t.object_id=t2.object_id;
6364032 rows selected.
Elapsed: 00:00:31.78
Execution Plan
----------------------------------------------------------
Plan hash value: 1712472574
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5713K| 98M| | 7661 (2)| 00:01:32 |
| 1 | MERGE JOIN | | 5713K| 98M| | 7661 (2)| 00:01:32 |
| 2 | INDEX FULL SCAN | IND_ID | 3204K| 15M| | 7137 (2)| 00:01:26 |
|* 3 | SORT JOIN | | 90908 | 1154K| 3576K| 495 (3)| 00:00:06 |
| 4 | INDEX FAST FULL SCAN| IND2_ID | 90908 | 1154K| | 55 (4)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")
filter("T"."OBJECT_ID"="T2"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
430689 consistent gets
0 physical reads
0 redo size
109700403 bytes sent via SQL*Net to client
4667440 bytes received via SQL*Net from client
424270 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
6364032 rows processed
结论:
逻辑读分别为:
Nested loops join 838760
Hash Jion 430719
Merge join 430689
可以看到,大表t和大表t2关联使用Hash Jion和Merge join关联性能差不多,Nested loops join性能最差。
3.默认使用Merge Jion的SQL语句的情况
--不加hints会默认使用Merge Jion
SQL> Select t.object_id From t2,(Select * From t Where owner='SYS' Order By object_id) t Where t.object_id=t2.object_id;
2910592 rows selected.
Elapsed: 00:00:15.57
Execution Plan
----------------------------------------------------------
Plan hash value: 2780935651
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2589K| 59M| | 16553 (2)| 00:03:19 |
| 1 | MERGE JOIN | | 2589K| 59M| | 16553 (2)| 00:03:19 |
| 2 | INDEX FULL SCAN | IND2_ID | 90908 | 1154K| | 240 (2)| 00:00:03 |
|* 3 | SORT JOIN | | 1452K| 15M| 55M| 16300 (2)| 00:03:16 |
|* 4 | TABLE ACCESS FULL| T | 1452K| 15M| | 9822 (2)| 00:01:58 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")
filter("T"."OBJECT_ID"="T2"."OBJECT_ID")
4 - filter("OWNER"='SYS')
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
89745 consistent gets
0 physical reads
124 redo size
50169586 bytes sent via SQL*Net to client
2134921 bytes received via SQL*Net from client
194041 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
2910592 rows processed
--强制使用Nested join
SQL> Select /*+ use_nl(t2,t)*/ t.object_id From t2,(Select * From t Where owner='SYS' Order By object_id) t Where t.object_id=t2.object_id;
2910592 rows selected.
Elapsed: 00:00:17.18
Execution Plan
----------------------------------------------------------
Plan hash value: 1858129701
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2589K| 59M| | 1482K (1)| 04:56:28 |
| 1 | SORT ORDER BY | | 2589K| 59M| 79M| 1482K (1)| 04:56:28 |
| 2 | NESTED LOOPS | | 2589K| 59M| | 1463K (1)| 04:52:47 |
|* 3 | TABLE ACCESS FULL| T | 1452K| 15M| | 9822 (2)| 00:01:58 |
|* 4 | INDEX RANGE SCAN | IND2_ID | 2 | 26 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("OWNER"='SYS')
4 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
1523249 consistent gets
0 physical reads
568 redo size
50169586 bytes sent via SQL*Net to client
2134921 bytes received via SQL*Net from client
194041 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
2910592 rows processed
--强制使用Hash join
SQL> Select /*+ use_hash(t2,t)*/ t.object_id From t2,(Select * From t Where owner='SYS' Order By object_id) t Where t.object_id=t2.object_id;
2910592 rows selected.
Elapsed: 00:00:16.34
Execution Plan
----------------------------------------------------------
Plan hash value: 2900034259
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2589K| 59M| | 30029 (2)| 00:06:01 |
| 1 | SORT ORDER BY | | 2589K| 59M| 79M| 30029 (2)| 00:06:01 |
|* 2 | HASH JOIN | | 2589K| 59M| 2224K| 11615 (2)| 00:02:20 |
| 3 | INDEX FAST FULL SCAN| IND2_ID | 90908 | 1154K| | 55 (4)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | T | 1452K| 15M| | 9822 (2)| 00:01:58 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")
4 - filter("OWNER"='SYS')
Note
-----
- dynamic sampling used for this statement
Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
44373 consistent gets
0 physical reads
524 redo size
50169586 bytes sent via SQL*Net to client
2134921 bytes received via SQL*Net from client
194041 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
2910592 rows processed
结论:
逻辑读分别为:
Nested loops join 1523249
Hash Jion 44373
Merge join 89745
可以看到,T2在和t的排序的限定结果集关联时,执行计划使用了merge join,但此时执行效率最的还是hash join,因为此时同样是大表之间关联,采用hash join效果会更好一些。
相关推荐
数据库中JOIN操作的实现主要有三种:嵌套循环连接(Nested Loop Join),归并连接(Merge Join)和散列连接或者哈稀连接(Hash Join)。其中嵌套循环连接又视情况又有两种变形:块嵌套循环连接和索引嵌套循环连接。
Sql中的三种物理连接操作 嵌套循环连接(Nested Loop Join) 合并连接(Merge Join) 哈希匹配(Hash Join)
hash join 原理和算法 1.Hash Join概述 2.Hash Join原理 3.Hash Join算法 4.Hash Join的成本
Hash join算法的一个基本思想就是根据小的row sources(称作build input,我们记较小的表为S,较大的表为B) 建立一个可以存在于hash area内存中的hash table,然后用大的row sources(称作probe input) 来探测前面所建...
数据库 我自己在 Java 中实现了 SortMergeJoin 和 HashJoin(来自 SQL 的著名 INNER JOIN)。 在更多信息。
oracle hash join算法原理
Hash join算法原理 详细讲述了oracle sql语句的连接方式 对于sql调优提高有很大帮助
NULL 博文链接:https://juji1010.iteye.com/blog/1535638
MySQL 开发组于 2019 年 10 月 14 日 正式发布了 MySQL 8.0.18 GA 版本,带来了一些新特性和增强功能。其中最引人注目的莫过于多表连接查询支持 hash join 方式...大多数情况下,hash join 比之前的 Block Nested-Loop
在MySQL 8.0.18中,增加了Hash Join新功能,它适用于未创建索引的字段,...我们用explain format=tree命令可以查看到已经使用到hash join算法。 但目前8.0.18版本,仅支持join。left join和right join失效,这里请注意
hash join/merge join/nest loop(cluster join)/index join 2:不借助第三方工具,怎样查看sql的执行计划 set autot on explain plan set statement_id = &item_id for &sql; select * from table(dbms_...
Oracle中hash join研究.pdf
给大家介绍Mysql 8.0.18 hash join测试的相关知识,本文通过实例代码讲解,非常不错,具有一定的参考借鉴价值,需要的朋友参考下吧
1. 生成Hash Table:将参与join的左表生成Hash Table 3. FILTER处理:按照不等值条件进行过滤,进而输出最终结果 1. 对语法树生
Oracle CBO 学习笔记之(1) : 深入理解Oracle Hash Join的代价模型及其执行流程:word,pdf,图例
第一次作业