`
wsql
  • 浏览: 11714986 次
  • 性别: Icon_minigender_1
  • 来自: 深圳
文章分类
社区版块
存档分类
最新评论

HASH JOIN ,MERGE JOIN ,NESTED LOOP用法效率比较

 
阅读更多

概述:

NESTEDLOOP:

对于被连接的数据子集较小的情况,嵌套循环连接是个较好的选择。在嵌套循环中,内表被外表驱动,外表返回的每一行都要在内表中检索找到与它匹配的行,因此整个查询返回的结果集不能太大(大于1万不适合),要把返回子集较小表的作为外表(CBO默认外表是驱动表),而且在内表的连接字段上一定要有索引。当然也可以用ORDERED提示来改变CBO默认的驱动表,使用USE_NL(table_name1 table_name2)可是强制CBO执行嵌套循环连接。

HASH JOIN :

散列连接是CBO做大数据集连接时的常用方式,优化器使用两个表中较小的表(或数据源)利用连接键在内存中建立散列表,然后扫描较大的表并探测散列表,找出与散列表匹配的行。

这种方式适用于较小的表完全可以放于内存中的情况,这样总成本就是访问两个表的成本之和。但是在表很大的情况下并不能完全放入内存,这时优化器会将它分割成若干不同的分区,不能放入内存的部分就把该分区写入磁盘的临时段,此时要有较大的临时段从而尽量提高I/O的性能。

也可以用USE_HASH(table_name1 table_name2)提示来强制使用散列连接。如果使用散列连接HASH_AREA_SIZE初始化参数必须足够的大,如果是9i,Oracle建议使用SQL工作区自动管理,设置WORKAREA_SIZE_POLICY为AUTO,然后调整PGA_AGGREGATE_TARGET即可。

MERGE JOIN排序合并连接

通常情况下散列连接的效果都比排序合并连接要好,然而如果行源已经被排过序,在执行排序合并连接时不需要再排序了,这时排序合并连接的性能会优于散列连接。可以使用USE_MERGE(table_name1 table_name2)来强制使用排序合并连接

几种方式的操作方式

merge join

merge join的操作通常分三步:

1、对连接的每个表做table access full;

2、对table access full的结果进行排序。

3、进行merge join对排序结果进行合并。

在全表扫描比索引范围扫描再通过rowid进行表访问更可取的情况下,merge join会比nested loops性能更佳。当表特别小或特别巨大的时候,实行全表访问可能会比索引范围扫描更有效。mrege join的性能开销几乎都在前两步。

hash join

对两个表进行全表扫描,然后oracle读取涉及连接的其中一个表,并且在内存里创建来自表的连接列的唯一关键字的位图。当读取和处理第二个表的行时,创建值的位图被用做过滤器。如果一个行成功的通过位图过滤,则hash算法用于数据查找和后来的连接。(这里涉及数学问题,我也弄的不是很清楚)。

以下条件下hash join可能有优势:

两个巨大的表之间的连接。

在一个巨大的表和一个小表之间的连接。

Nested Loops

会循环外表(驱动表),逐个比对和内表的连接是否符合条件。在驱动表比较小,内表比较大,而且内外表的连接列有索引的时候比较好。当SORT_AREA空间不足的时候,Oracle也会选择使用NL。基于Cost的Oracle优化器(CBO)会自动选择较小的表做外表。

连接方式总结:

1))嵌套循环(nest loop):

对于被连接的数据子集较小的情况,嵌套循环连接是较好的选择。在嵌套循环中,外表驱动内表,外表返回的每一行都要在内表中检索找到它匹配的行,因此整个查询返回的结果集不能太大(大于10000不合适),要把返回子集较小的表作为外表(驱动表),而且在内表的连接字段上一定要有索引。

2)哈希连接(hash join):

哈希连接是大数据集连接时常用的方式,优化器使用两个表中较小的表,利用连接键在内存中建立散列表,然后扫描较大的表并探测散列表,找出与散列表匹配的行。

这种方式适用于较小的表完全可以放入内存的情况,这样成本就是访问两个表的成本之和。但是在表很大的情况下并不能完全放入内存,这时优化器将它分割成若干不同的分区,不能放入内存的部分就把该分区写入磁盘的临时段。

哈希连接只能应用于等值连接(如WHERE A.COL3 = B.COL4)、非等值连接(WHERE A.COL3 > B.COL4)、外连接(WHERE A.COL3 = B.COL4(+))。

3)排序合并连接(Sort Merge Join )

通常情况下哈希连接的效果都比排序合并连接要好。然而如果行源已经被排过序,在执行排序合并连接时不需要再排序了,这时排序归并连接的性能会忧于哈希连接。

测试实验:

--创建测试表

SQL> Create Table t As Select * From dba_objects;
SQL> Insert /*+ append*/ t Select * From t;
SQL> Insert /*+ append*/ t Select * From t;
......
SQL> commit;

SQL> Create Table t1 As Select * From user_objects;

SQL> Create Table t2 As Select * From dba_objects;
SQL> Insert /*+ append*/ t2 Select * From t2;
......
SQL> commit;

SQL> select count(1) from t;

COUNT(1)
----------
3199936

SQL> select count(1) from t2;

COUNT(1)
----------
101220

SQL> select count(1) from t1;

COUNT(1)
----------
13

SQL> Create Index ind_id On t(object_id);
SQL> Create Index ind1_id On t1(object_id);
SQL> Create Index ind2_id On t2(object_id);


1.默认使用Nested loops的SQL语句的情况
--不加hints会默认使用Nested loops join
SQL> Select * From t,t1 Where t.object_id=t1.object_id;

576 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1308275697

--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 833 | 208K| 862 (1)| 00:00:11 |
| 1 | TABLE ACCESS BY INDEX ROWID| T | 64 | 6144 | 66 (0)| 00:00:01 |
| 2 | NESTED LOOPS | | 833 | 208K| 862 (1)| 00:00:11 |
| 3 | TABLE ACCESS FULL | T1 | 13 | 2080 | 3 (0)| 00:00:01 |
|* 4 | INDEX RANGE SCAN | IND_ID | 64 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T"."OBJECT_ID"="T1"."OBJECT_ID")

Note
-----
- dynamic sampling used for this statement


Statistics
----------------------------------------------------------
5 recursive calls
0 db block gets
687 consistent gets
0 physical reads
1484 redo size
14877 bytes sent via SQL*Net to client
910 bytes received via SQL*Net from client
40 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
576 rows processed

--强制使用Hash Jion
SQL> Select /*+ use_hash(t1,t)*/ * From t1,t Where t.object_id=t1.object_id;

576 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 1444793974

---------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 833 | 208K| 9931 (3)| 00:02:00 |
|* 1 | HASH JOIN | | 833 | 208K| 9931 (3)| 00:02:00 |
| 2 | TABLE ACCESS FULL| T1 | 13 | 2080 | 3 (0)| 00:00:01 |
| 3 | TABLE ACCESS FULL| T | 3204K| 293M| 9890 (2)| 00:01:59 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("T"."OBJECT_ID"="T1"."OBJECT_ID")

Note
-----
- dynamic sampling used for this statement


Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
44124 consistent gets
0 physical reads
524 redo size
71981 bytes sent via SQL*Net to client
910 bytes received via SQL*Net from client
40 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
576 rows processed


--强制使用Merge join
SQL> Select /*+ use_merge(t1,t)*/ * From t1,t Where t.object_id=t1.object_id;

576 rows selected.


Execution Plan
----------------------------------------------------------
Plan hash value: 2036048135

------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 833 | 208K| | 80548 (1)| 00:16:07 |
| 1 | MERGE JOIN | | 833 | 208K| | 80548 (1)| 00:16:07 |
| 2 | TABLE ACCESS BY INDEX ROWID| T1 | 13 | 2080 | | 2 (0)| 00:00:01 |
| 3 | INDEX FULL SCAN | IND1_ID | 13 | | | 1 (0)| 00:00:01 |
|* 4 | SORT JOIN | | 3204K| 293M| 794M| 80546 (1)| 00:16:07 |
| 5 | TABLE ACCESS FULL | T | 3204K| 293M| | 9890 (2)| 00:01:59 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("T"."OBJECT_ID"="T1"."OBJECT_ID")
filter("T"."OBJECT_ID"="T1"."OBJECT_ID")

Note
-----
- dynamic sampling used for this statement


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
44097 consistent gets
0 physical reads
524 redo size
14899 bytes sent via SQL*Net to client
910 bytes received via SQL*Net from client
40 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
576 rows processed

结论:
逻辑读分别为:
Nested loops join 687
Hash Jion 44124
Merge join 44097
可以看到,大表t和小表t1关联使用Nested loops join关联性能会比较好。

2.默认使用Hash Jion的SQL语句的情况

--不加hints会默认使用Hash Jion
SQL> set timing on
SQL> Select t.object_id From t,t2 Where t.object_id=t2.object_id;

6364032 rows selected.

Elapsed: 00:00:31.58

Execution Plan
----------------------------------------------------------
Plan hash value: 3237705646

-----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5713K| 98M| | 4442 (4)| 00:00:54 |
|* 1 | HASH JOIN | | 5713K| 98M| 2224K| 4442 (4)| 00:00:54 |
| 2 | INDEX FAST FULL SCAN| IND2_ID | 90908 | 1154K| | 55 (4)| 00:00:01 |
| 3 | INDEX FAST FULL SCAN| IND_ID | 3204K| 15M| | 1596 (4)| 00:00:20 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")

Note
-----
- dynamic sampling used for this statement


Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
430719 consistent gets
0 physical reads
0 redo size
109700403 bytes sent via SQL*Net to client
4667440 bytes received via SQL*Net from client
424270 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6364032 rows processed

--强制使用Nested loops join
SQL> Select /*+ use_nl(t,t2)*/ t.object_id From t,t2 Where t.object_id=t2.object_id;

6364032 rows selected.

Elapsed: 00:00:31.45

Execution Plan
----------------------------------------------------------
Plan hash value: 3551542035

---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5713K| 98M| 98323 (1)| 00:19:40 |
| 1 | NESTED LOOPS | | 5713K| 98M| 98323 (1)| 00:19:40 |
| 2 | INDEX FAST FULL SCAN| IND2_ID | 90908 | 1154K| 55 (4)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | IND_ID | 63 | 315 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")

Note
-----
- dynamic sampling used for this statement


Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
838760 consistent gets
0 physical reads
0 redo size
109700403 bytes sent via SQL*Net to client
4667440 bytes received via SQL*Net from client
424270 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
6364032 rows processed

--强制使用merge join
SQL> Select /*+ use_merge(t,t2)*/ t.object_id From t,t2 Where t.object_id=t2.object_id;

6364032 rows selected.

Elapsed: 00:00:31.78

Execution Plan
----------------------------------------------------------
Plan hash value: 1712472574

------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 5713K| 98M| | 7661 (2)| 00:01:32 |
| 1 | MERGE JOIN | | 5713K| 98M| | 7661 (2)| 00:01:32 |
| 2 | INDEX FULL SCAN | IND_ID | 3204K| 15M| | 7137 (2)| 00:01:26 |
|* 3 | SORT JOIN | | 90908 | 1154K| 3576K| 495 (3)| 00:00:06 |
| 4 | INDEX FAST FULL SCAN| IND2_ID | 90908 | 1154K| | 55 (4)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")
filter("T"."OBJECT_ID"="T2"."OBJECT_ID")

Note
-----
- dynamic sampling used for this statement


Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
430689 consistent gets
0 physical reads
0 redo size
109700403 bytes sent via SQL*Net to client
4667440 bytes received via SQL*Net from client
424270 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
6364032 rows processed

结论:
逻辑读分别为:
Nested loops join 838760
Hash Jion 430719
Merge join 430689
可以看到,大表t和大表t2关联使用Hash Jion和Merge join关联性能差不多,Nested loops join性能最差。

3.默认使用Merge Jion的SQL语句的情况
--不加hints会默认使用Merge Jion
SQL> Select t.object_id From t2,(Select * From t Where owner='SYS' Order By object_id) t Where t.object_id=t2.object_id;

2910592 rows selected.

Elapsed: 00:00:15.57

Execution Plan
----------------------------------------------------------
Plan hash value: 2780935651

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2589K| 59M| | 16553 (2)| 00:03:19 |
| 1 | MERGE JOIN | | 2589K| 59M| | 16553 (2)| 00:03:19 |
| 2 | INDEX FULL SCAN | IND2_ID | 90908 | 1154K| | 240 (2)| 00:00:03 |
|* 3 | SORT JOIN | | 1452K| 15M| 55M| 16300 (2)| 00:03:16 |
|* 4 | TABLE ACCESS FULL| T | 1452K| 15M| | 9822 (2)| 00:01:58 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")
filter("T"."OBJECT_ID"="T2"."OBJECT_ID")
4 - filter("OWNER"='SYS')

Note
-----
- dynamic sampling used for this statement


Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
89745 consistent gets
0 physical reads
124 redo size
50169586 bytes sent via SQL*Net to client
2134921 bytes received via SQL*Net from client
194041 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
2910592 rows processed

--强制使用Nested join
SQL> Select /*+ use_nl(t2,t)*/ t.object_id From t2,(Select * From t Where owner='SYS' Order By object_id) t Where t.object_id=t2.object_id;

2910592 rows selected.

Elapsed: 00:00:17.18

Execution Plan
----------------------------------------------------------
Plan hash value: 1858129701

---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2589K| 59M| | 1482K (1)| 04:56:28 |
| 1 | SORT ORDER BY | | 2589K| 59M| 79M| 1482K (1)| 04:56:28 |
| 2 | NESTED LOOPS | | 2589K| 59M| | 1463K (1)| 04:52:47 |
|* 3 | TABLE ACCESS FULL| T | 1452K| 15M| | 9822 (2)| 00:01:58 |
|* 4 | INDEX RANGE SCAN | IND2_ID | 2 | 26 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter("OWNER"='SYS')
4 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")

Note
-----
- dynamic sampling used for this statement


Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
1523249 consistent gets
0 physical reads
568 redo size
50169586 bytes sent via SQL*Net to client
2134921 bytes received via SQL*Net from client
194041 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
2910592 rows processed

--强制使用Hash join
SQL> Select /*+ use_hash(t2,t)*/ t.object_id From t2,(Select * From t Where owner='SYS' Order By object_id) t Where t.object_id=t2.object_id;

2910592 rows selected.

Elapsed: 00:00:16.34

Execution Plan
----------------------------------------------------------
Plan hash value: 2900034259

------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2589K| 59M| | 30029 (2)| 00:06:01 |
| 1 | SORT ORDER BY | | 2589K| 59M| 79M| 30029 (2)| 00:06:01 |
|* 2 | HASH JOIN | | 2589K| 59M| 2224K| 11615 (2)| 00:02:20 |
| 3 | INDEX FAST FULL SCAN| IND2_ID | 90908 | 1154K| | 55 (4)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | T | 1452K| 15M| | 9822 (2)| 00:01:58 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("T"."OBJECT_ID"="T2"."OBJECT_ID")
4 - filter("OWNER"='SYS')

Note
-----
- dynamic sampling used for this statement


Statistics
----------------------------------------------------------
4 recursive calls
0 db block gets
44373 consistent gets
0 physical reads
524 redo size
50169586 bytes sent via SQL*Net to client
2134921 bytes received via SQL*Net from client
194041 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
2910592 rows processed

结论:
逻辑读分别为:
Nested loops join 1523249
Hash Jion 44373
Merge join 89745
可以看到,T2在和t的排序的限定结果集关联时,执行计划使用了merge join,但此时执行效率最的还是hash join,因为此时同样是大表之间关联,采用hash join效果会更好一些。

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics