`
wsql
  • 浏览: 11787188 次
  • 性别: Icon_minigender_1
  • 来自: 深圳
文章分类
社区版块
存档分类
最新评论

Oracle 11g 使用 dbms_parallel_execute 对大表进行并行update

 
阅读更多

一. dbms_parallel_execute说明

Updating Large Tables in Parallel

TheDBMS_PARALLEL_EXECUTEpackage enables you to incrementally update the data in a large table in parallel, in twohigh-level steps:

(1)Group sets of rows in the table into smaller chunks.

(2)Apply the desired UPDATE statement to the chunks in parallel,committing each time you have finished processing a chunk.

--dbms_parallel_execute 包使用并行的2个步骤,一是将大表分成多个小的chunks。二对这些小的chunks 进行并行。

Thistechnique is recommended whenever you are updating a lot of data. Its advantages are:

(1)You lock only one set of rows at a time, for a relatively shorttime, instead of locking the entire table.

(2)You do not lose work that has been done if something fails beforethe entire operation finishes.

(3)You reduce rollback space consumption.

(4)You improve performance.

See Also:

OracleDatabase PL/SQL Packages and Types Reference for more information about theDBMS_PARALLEL_EXECUTE package

http://download.oracle.com/docs/cd/E11882_01/appdev.112/e16760/d_parallel_ex.htm#ARPLS233

-- 这个链接上有这个包的详细使用说明。

并行在一定程度上能够提高SQL 的性能, 在我的blog里对parallelexecution 这块有说明:

Oracle Parallel Execution(并行执行)

http://blog.csdn.net/tianlesoftware/article/details/5854583

提到这篇文章,是关注一个问题:

Oracle对Delete,update,merge的操作限制在,只有操作的对象是分区表示,Oracle 才会启动并行操作。原因在于,对于分区表,Oracle 会对每个分区启用一个并行服务进程同时进行数据处理,这对于非分区表来说是没有意义的。

如果我们要对一张大表进行update,而且该表又不是分区表,这时就可以使用我们的dbms_parallel­_execute包来进行并行操作。

dbms_parallel_execute包是把大表分成了多个小的chunks,然后对chunks进行并行,这个就类似把非分区表变成了分区表。

注意,该包是Oracle 11g 以后才有的。

二. 使用说明

以下内容转自:

http://www.oracle-base.com/articles/11g/dbms_parallel_execute_11gR2.php

2.1 操作需要createjob的权限,所以先赋权

SQL> conn / as sysdba;

Connected.

SQL> grant create job to icd;

Grant succeeded.

SQL> conn icd/icd;

Connected.

2.2 创建相关的测试表并插入数据

SQL> CREATE TABLE test_tab (
 2 id NUMBER,
 3 description VARCHAR2(50),
 4 num_col NUMBER,
 5 CONSTRAINT test_tab_pk PRIMARY KEY (id)
 6 );
Table created.

SQL> INSERT /*+ APPEND */ INTO test_tab
 2 SELECT level,
 3 'Description for ' || level,
 4 CASE
 5 WHEN MOD(level, 5) = 0 THEN 10
 6 WHEN MOD(level, 3) = 0 THEN 20
 7 ELSE 30
 8 END
 9 FROM dual
 10 CONNECT BY level <= 500000;
500000 rows created.
SQL> commit;
Commit complete.

2.3 收集统计信息

SQL> EXEC DBMS_STATS.gather_table_stats(USER, 'TEST_TAB', cascade => TRUE);
PL/SQL procedure successfully completed.

SQL> SELECT num_col, COUNT(*)
 2 FROM test_tab
 3 GROUP BY num_col
 4 ORDER BY num_col;

 NUM_COL COUNT(*)
---------- ----------
 10 100000
 20 133333
 30 266667

2.4 创建task

TheCREATE_TASK procedure is used to create a new task. It requires a task name tobe specified, but can also include an optional task comment.

SQL> BEGIN

2DBMS_PARALLEL_EXECUTE.create_task (task_name => 'test_task');

3 END;

4 /

PL/SQL procedure successfully completed.

Informationabout existing tasks is displayed using the [DBA|USER]_PARALLEL_EXECUTE_TASKSviews.

SQL> COLUMN task_name FORMAT A10

SQL> SELECT task_name,

2 status

3 FROM user_parallel_execute_tasks;

TASK_NAMESTATUS

---------- -------------------

test_taskCREATED


The GENERATE_TASK_NAME function returns a unique task name ifyou do not want to name the task manually.

SQL> SELECTDBMS_PARALLEL_EXECUTE.generate_task_name FROMdual;

GENERATE_TASK_NAME

-----------------------------------------------------

TASK$_1

2.5 Split the workload into chunks

将一张大表split 成多个chunks 有三种方法。

(1)CREATE_CHUNKS_BY_ROWID

(2)CREATE_CHUNKS_BY_NUMBER_COL

(3)CREATE_CHUNKS_BY_SQL

分配好的chunks 可以用drop_chunks 来删除。

2.5.1 CREATE_CHUNKS_BY_ROWID

TheCREATE_CHUNKS_BY_ROWID procedure splits the data by rowid into chunks specifiedby the CHUNK_SIZE parameter. If the BY_ROW parameter isset to TRUE, the CHUNK_SIZE refers to the number of rows, otherwise it refersto the number of blocks.

SQL> BEGIN

2dbms_parallel_execute.create_chunks_by_rowid(task_name => 'test_task',

3table_owner => 'icd',

4table_name=> 'test_tab',

5by_row => true,

6chunk_size=> 10000);

7 end;

8 /

PL/SQL procedure successfully completed.

一旦chunks创建完毕,task 的状态就变成了'chunked'.

SQL> COLUMN task_name FORMAT A10

SQL> SELECT task_name,

2 status

3 FROM user_parallel_execute_tasks;

TASK_NAMESTATUS

---------- -------------------

test_taskCHUNKED

The [DBA|USER]_PARALLEL_EXECUTE_CHUNKS views displayinformation about the individual chunks.

SQL> SELECT chunk_id, status,start_rowid, end_rowid

2 FROM user_parallel_execute_chunks

3 WHERE task_name = 'test_task'

4 ORDER BY chunk_id;

CHUNK_ID STATUS START_ROWID END_ROWID

---------- -------------------------------------- ------------------

2 UNASSIGNEDAAATMCAAMAABSMIAAA AAATMCAAMAABSMPCcP

3 UNASSIGNEDAAATMCAAMAABSMgAAA AAATMCAAMAABSMnCcP

4 UNASSIGNED AAATMCAAMAABSMoAAAAAATMCAAMAABSMvCcP

...

73 UNASSIGNEDAAATMCAAMAABS0yAAA AAATMCAAMAABS1jCcP

74 UNASSIGNEDAAATMCAAMAABS1kAAA AAATMCAAMAABS1/CcP

73 rows selected.

删除chunks

SQL> begin

2 dbms_parallel_execute.drop_chunks('test_task');

3 end;

4 /

PL/SQL procedure successfully completed.

再次查看chunk状态,又变成了created.

SQL> SELECT task_name,

2 status

3 FROM user_parallel_execute_tasks;

TASK_NAMESTATUS

---------- -------------------

test_taskCREATED

2.5.2 CREATE_CHUNKS_BY_NUMBER_COL

TheCREATE_CHUNKS_BY_NUMBER_COL procedure divides the workload up based on a number column. It uses the specifiedcolumns min and max values along with the chunk size to split the data intoapproximately equal chunks. For the chunks to be equally sized the column mustcontain a continuous sequence of numbers, like that generated by a sequence.

BEGIN

dbms_parallel_execute.create_chunks_by_number_col(task_name => 'test_task',

table_owner => 'ICD',

table_name => 'TEST_TAB',

table_column => 'ID',

chunk_size => 10000);

END;

/

The [DBA|USER]_PARALLEL_EXECUTE_CHUNKSviews display information about the individual chunks.

SQL> SELECT chunk_id, status, start_id,end_id

2 FROM user_parallel_execute_chunks

3 WHERE task_name = 'test_task'

4 ORDER BY chunk_id;

CHUNK_ID STATUSSTART_ID END_ID

---------- -------------------- --------------------

75 UNASSIGNED1 10000

76 UNASSIGNED10001 20000

77 UNASSIGNED20001 30000

78 UNASSIGNED30001 40000

......

122 UNASSIGNED470001 480000

123 UNASSIGNED480001 490000

124 UNASSIGNED490001 500000

50 rows selected.

2.5.3 CREATE_CHUNKS_BY_SQL

TheCREATE_CHUNKS_BY_SQL procedure divides the workload based on a user-definedquery. If the BY_ROWID parameter is set to TRUE, the query must return a seriesof start and end rowids. If it's set to FALSE, the query must return a seriesof start and end IDs.

把之前创建的chunks drop 掉

SQL> exec dbms_parallel_execute.drop_chunks('test_task');

PL/SQL procedure successfully completed.

DECLARE

l_stmt CLOB;

BEGIN

l_stmt:= 'SELECT DISTINCT num_col, num_col FROM test_tab';

DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => 'test_task',

sql_stmt => l_stmt,

by_rowid => FALSE);

END;

/

The[DBA|USER]_PARALLEL_EXECUTE_CHUNKS views display information about theindividual chunks.

SQL> SELECT chunk_id, status, start_id,end_id

2 FROM user_parallel_execute_chunks

3 WHERE task_name = 'test_task'

4 ORDER BY chunk_id;

CHUNK_ID STATUSSTART_ID END_ID

---------- -------------------- --------------------

141 UNASSIGNED10 10

142 UNASSIGNED30 30

143 UNASSIGNED20 20

2.6 Run the task

Runninga task involves running a specific statement for each defined chunk of work.The documentation only shows examples using updates of the base table, but thisis not the only use of this functionality. The statement associated with thetask can be a procedure call, as shown in one of the examples at the end of thearticle.

There are two ways to run a taskand several procedures to control a running task.

2.6.1 RUN_TASK

TheRUN_TASK procedure runs the specified statement inparallel by scheduling jobs to process the workload chunks. Thestatement specifying the actual work to be done mustinclude a reference to the ':start_id' and ':end_id', which represent arange of rowids or column IDs to be processed, as specified in the chunkdefinitions. The degree of parallelism is controlled by the number of scheduledjobs, not the number of chunks defined. The scheduled jobs take an unassignedworkload chunk, process it, then move on to the next unassigned chunk.

DECLARE

l_sql_stmtVARCHAR2(32767);

BEGIN

l_sql_stmt:= 'UPDATE /*+ ROWID (dda) */ test_tab t

SET t.num_col = t.num_col + 10

WHERE rowid BETWEEN :start_idAND :end_id';

DBMS_PARALLEL_EXECUTE.run_task(task_name => 'test_task',

sql_stmt => l_sql_stmt,

language_flag =>DBMS_SQL.NATIVE,

parallel_level => 10);

END;

/

TheRUN_TASK procedure waits for the task to complete. On completion, the status ofthe task must be assessed to know what action to take next.

2.6.2 User-defined framework

TheDBMS_PARALLEL_EXECUTE package allows you to manually code the task run. The GET_ROWID_CHUNK and GET_NUMBER_COL_CHUNK proceduresreturn the next available unassigned chunk. You can than manuallyprocess the chunk and set its status. The example below shows the processing ofa workload chunked by rowid.

DECLARE

l_sql_stmt VARCHAR2(32767);

l_chunk_id NUMBER;

l_start_rowid ROWID;

l_end_rowid ROWID;

l_any_rows BOOLEAN;

BEGIN

l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t

SET t.num_col = t.num_col + 10

WHERE rowid BETWEEN :start_idAND :end_id';

LOOP

-- Get next unassigned chunk.

DBMS_PARALLEL_EXECUTE.get_rowid_chunk(task_name => 'test_task',

chunk_id => l_chunk_id,

start_rowid=> l_start_rowid,

end_rowid => l_end_rowid,

any_rows => l_any_rows);

EXIT WHEN l_any_rows = FALSE;

BEGIN

-- Manually execute the work.

EXECUTE IMMEDIATE l_sql_stmt USING l_start_rowid, l_end_rowid;

-- Set the chunk status as processed.

DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',

chunk_id=> l_chunk_id,

status =>DBMS_PARALLEL_EXECUTE.PROCESSED);

EXCEPTION

WHEN OTHERS THEN

-- Record chunk error.

DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',

chunk_id => l_chunk_id,

status =>DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR,

err_num => SQLCODE,

err_msg => SQLERRM);

END;

-- Commit work.

COMMIT;

ENDLOOP;

END;

/

2.6.3 Task control

A running task can be stopped and restarted using the STOP_TASKand RESUME_TASK procedures respectively.

The PURGE_PROCESSED_CHUNKSprocedure deletes all chunks with a status of 'PROCESSED' or'PROCESSED_WITH_ERROR'.

The ADM_DROP_CHUNKS, ADM_DROP_TASK,ADM_TASK_STATUS and ADM_STOP_TASK routines have the same function as theirnamesakes, but they allow the operations to performed on tasks owned by otherusers. In order to use these routines the user must have been granted the ADM_PARALLEL_EXECUTE_TASKrole.

2.7 Check the task status

Thesimplest way to check the status of a task is to use the TASK_STATUS function. After execution of the task, the only possible return valuesare the 'FINISHED' or 'FINISHED_WITH_ERROR' constants. If the status isnot 'FINISHED', then the task can be resumed using the RESUME_TASK procedure.

DECLARE

l_try NUMBER;

l_status NUMBER;

BEGIN

--If there is error, RESUME it for at most 2 times.

l_try := 0;

l_status := DBMS_PARALLEL_EXECUTE.task_status('test_task');

WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)

Loop

l_try := l_try + 1;

DBMS_PARALLEL_EXECUTE.resume_task('test_task');

l_status := DBMS_PARALLEL_EXECUTE.task_status('test_task');

ENDLOOP;

END;

/

The status of the taskand the chunks can also be queried.

COLUMN task_name FORMAT A10

SELECT task_name,

status

FROMuser_parallel_execute_tasks;


TASK_NAMESTATUS

---------- -------------------

test_taskFINISHED


If there were errors, thechunks can be queried to identify the problems.

SELECT status, COUNT(*)

FROMuser_parallel_execute_chunks

GROUP BY status

ORDER BY status;


STATUS COUNT(*)
-------------------- ----------
PROCESSED_WITH_ERROR 3

The[DBA|USER]_PARALLEL_EXECUTE_TASKS views contain a record of the JOB_PREFIX usedwhen scheduling the chunks of work.

SELECT job_prefix

FROMuser_parallel_execute_tasks

WHEREtask_name = 'test_task';

JOB_PREFIX

------------------------------

TASK$_368


Thisvalue can be used to query information about the individual jobs used duringthe process. The number of jobs scheduled should match the degree ofparallelism specified in the RUN_TASK procedure.

COLUMN job_name FORMAT A20

SELECT job_name, status

FROMuser_scheduler_job_run_details

WHEREjob_name LIKE (SELECT job_prefix || '%'

FROM user_parallel_execute_tasks

WHERE task_name = 'test_task');


JOB_NAME STATUS

--------------------------------------------------

TASK$_205_3 SUCCEEDED

TASK$_205_9 SUCCEEDED

TASK$_205_5 SUCCEEDED

TASK$_205_7 SUCCEEDED

TASK$_205_1 SUCCEEDED

TASK$_205_2 SUCCEEDED

TASK$_205_6 SUCCEEDED

TASK$_205_8 SUCCEEDED

TASK$_205_4 SUCCEEDED

TASK$_205_10 SUCCEEDED


2.8 Drop the task

Oncethe job is complete you can drop the task, which will drop the associated chunkinformation also.

BEGIN

DBMS_PARALLEL_EXECUTE.drop_task('test_task');

END;

/

三. 示例

3.1 Test 1

The following example shows the processingof a workload chunked by rowid.

DECLARE

l_task VARCHAR2(30) :='test_task';

l_sql_stmt VARCHAR2(32767);

l_try NUMBER;

l_status NUMBER;

BEGIN

DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);

DBMS_PARALLEL_EXECUTE.create_chunks_by_rowid(task_name => l_task,

table_owner => 'TEST',

table_name => 'TEST_TAB',

by_row => TRUE,

chunk_size => 10000);

l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t

SET t.num_col = t.num_col + 10

WHERE rowid BETWEEN :start_idAND :end_id';

DBMS_PARALLEL_EXECUTE.run_task(task_name => l_task,

sql_stmt => l_sql_stmt,

language_flag =>DBMS_SQL.NATIVE,

parallel_level => 10);

--If there is error, RESUME it for at most 2 times.

l_try := 0;

l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);

WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)

Loop

l_try := l_try + 1;

DBMS_PARALLEL_EXECUTE.resume_task(l_task);

l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);

ENDLOOP;

DBMS_PARALLEL_EXECUTE.drop_task(l_task);

END;

/

3.2 Test 2

Thefollowing example shows the processing of a workload chunked by a numbercolumn. Notice that the workload is actually a stored procedure in this case.

CREATE OR REPLACE PROCEDURE process_update(p_start_id IN NUMBER, p_end_id IN NUMBER) AS

BEGIN

UPDATE /*+ ROWID (dda) */ test_tab t

SET t.num_col = t.num_col + 10

WHERE id BETWEEN p_start_id AND p_end_id;

END;

/


DECLARE

l_task VARCHAR2(30) :='test_task';

l_sql_stmt VARCHAR2(32767);

l_try NUMBER;

l_status NUMBER;

BEGIN

DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);

DBMS_PARALLEL_EXECUTE.create_chunks_by_number_col(task_name => l_task,

table_owner => 'TEST',

table_name => 'TEST_TAB',

table_column => 'ID',

chunk_size => 10000);

l_sql_stmt := 'BEGIN process_update(:start_id, :end_id); END;';

DBMS_PARALLEL_EXECUTE.run_task(task_name => l_task,

sql_stmt => l_sql_stmt,

language_flag =>DBMS_SQL.NATIVE,

parallel_level=> 10);

--If there is error, RESUME it for at most 2 times.

l_try := 0;

l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);

WHILE(l_try < 2 and l_status != DBMS_PARALLEL_EXECUTE.FINISHED)

Loop

l_try := l_try + 1;

DBMS_PARALLEL_EXECUTE.resume_task(l_task);

l_status := DBMS_PARALLEL_EXECUTE.task_status(l_task);

ENDLOOP;

DBMS_PARALLEL_EXECUTE.drop_task(l_task);

END;

/

3.3 Test 3

Thefollowing example shows a workload chunked by an SQL statement and processed bya user-defined framework.

DECLARE

l_task VARCHAR2(30) :='test_task';

l_stmt CLOB;

l_sql_stmt VARCHAR2(32767);

l_chunk_id NUMBER;

l_start_id NUMBER;

l_end_id NUMBER;

l_any_rows BOOLEAN;

BEGIN

DBMS_PARALLEL_EXECUTE.create_task (task_name => l_task);

l_stmt := 'SELECT DISTINCT num_col, num_col FROM test_tab';

DBMS_PARALLEL_EXECUTE.create_chunks_by_sql(task_name => l_task,

sql_stmt => l_stmt,

by_rowid => FALSE);

l_sql_stmt := 'UPDATE /*+ ROWID (dda) */ test_tab t

SET t.num_col = t.num_col

WHERE num_col BETWEEN:start_id AND :end_id';

LOOP

-- Get next unassigned chunk.

DBMS_PARALLEL_EXECUTE.get_number_col_chunk(task_name => 'test_task',

chunk_id => l_chunk_id,

start_id => l_start_id,

end_id => l_end_id,

any_rows => l_any_rows);

EXIT WHEN l_any_rows = FALSE;

BEGIN

-- Manually execute the work.

EXECUTE IMMEDIATE l_sql_stmt USING l_start_id, l_end_id;

-- Set the chunk status as processed.

DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',

chunk_id => l_chunk_id,

status =>DBMS_PARALLEL_EXECUTE.PROCESSED);

EXCEPTION

WHEN OTHERS THEN

-- Record chunk error.

DBMS_PARALLEL_EXECUTE.set_chunk_status(task_name => 'test_task',

chunk_id => l_chunk_id,

status =>DBMS_PARALLEL_EXECUTE.PROCESSED_WITH_ERROR,

err_num => SQLCODE,

err_msg => SQLERRM);

END;

-- Commit work.

COMMIT;

ENDLOOP;

DBMS_PARALLEL_EXECUTE.drop_task(l_task);

END;

/

-------------------------------------------------------------------------------------------------------

Blog: http://blog.csdn.net/tianlesoftware

Email: dvd.dba@gmail.com

DBA1 群:62697716(满); DBA2 群:62697977(满)DBA3 群:62697850(满)

DBA 超级群:63306533(满); DBA4 群: 83829929 DBA5群: 142216823

DBA6 群:158654907 聊天 群:40132017 聊天2群:69087192

--加群需要在备注说明Oracle表空间和数据文件的关系,否则拒绝申请

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics