PQ_DISTRIBUTE Hint
Introduction and Warning
Before discussing how to use the parallel PQ_DISTRIBUTE hint, it is important to note, that parallel hints should be used sparingly, with caution and only when your system has enough IO and CPU resources to do so. It is exceptionally easy to satuarte system resources using these hints. If you plan to use parallel hints, heed the warning and test before implementation. I highly recommend reading the documentation on parallel query, a link to which can be found in the notes section below. That said, lets take a look at how to use the PQ_DISTRIBUTE hint.
PQ_DISTRIBUTE Hint Description
The PQ_DISTRIBUTE hint in Oracle is used to manage the distribution of data across multiple query servers during parallel execution in a parallel query (PQ) environment. This hint helps specify how the data should be divided among the parallel execution servers, aiming to improve performance, especially when dealing with large datasets.
There are two scenarios where the PQ_DISTRIBUTE is used; During LOADS and during JOINS. As such, there are a number of different ways to specify the hint. The information below. deals with the LOADS and JOINS separately.
PQ_DISTRIBUTE Load Usage
When loading a table or creating a table using statements like insert into ..... select or create table as select... the PQ_DISTRIBUTE hint can take various "load distribution" values. The syntax is below followed by the distribution method descriptions and meanings
Example
CREATE /*+ PQ_DISTRIBUTE(myTab1, distribute_method) */ TABLE MyTab1 AS SELECT col1
FROM myTab2
/
In this scenario, the following distribution methods are allowed.
NONE
RANDOM
RANDOM_LOCAL
PARTITION
The following describes each method
NONE
No distribution. Effectively, all parallel servers will load all parallel partitions. You would use this to avoid any cost overhead of distributing rows when the data is not skewed. Note there is a PGA overhead for using this method, which depending on the number of parallel processes, can be considerable.
RANDOM
Using random will ensure that the rows are distributed from the producers in a round-robin fashion to the consumers. This should be used if the data highly skewed.
RANDOM_LOCAL
Using random_local ensures the rows from the producers are distributed to a given set of partition parallel processes. Note that two or more parallel processes can be loading the same parallel partition, but cannot load all parallel partitions. Useful when the data is skewed.
PARTITION
This method is only for partitioned tables. When partition is used, the partitioning details of the table are considered when distributing the rows between parallel processes. This method is useful when the number of partitions being loaded is greater than or equal to the number of parallel processes and when there is no data skew.
PQ_DISTRIBUTE Join Usage
The PQ_DISTRIBUTE hint is useful for optimizing parallel joins in Oracle. It controls how the outer and inner tables are distributed among parallel processes. By specifying distribution methods like HASH, BROADCAST, or NONE for each table, you can minimize data movement and maximize parallel processing efficiency. Correctly identifying the outer and inner tables is key to using PQ_DISTRIBUTE effectively. This often requires examination of the query plan or using the LEADING or ORDERED hint to explicitly define the join order, thereby determining the outer and inner tables. The syntax example is below, followed by the distribution methods and meanings
Example
SELECT /*+ PQ_DISTRIBUTE(myTab1 distribute_method, myTab2 distribution_method) */ col1, col3
FROM myTab1, myTab2
WHERER myTab1.col1 = myTab2.col2
/
HASH, HASH
Both the outer and inner tables are distributed using a hash function applied to the join key. This ensures that rows with matching join keys are processed by the same parallel process, minimizing data transfer between processes.
This is often the most efficient method for large joins, especially when neither table is significantly smaller than the other. It's the default and generally recommended approach for large-scale joins.
BROADCAST, NONE
The outer table is "broadcast", meaning a complete copy of it is sent to every parallel process. The inner table is not redistributed (NONE).
This is most effective when the outer table is very small compared to the inner table. Broadcasting the smaller table avoids the overhead of redistributing the larger table.
NONE, BROADCAST
This is the reverse of the previous method. The outer table is not redistributed (NONE), and the inner table is broadcast.
This is suitable when the inner table is is very small compared to the outer table.
PARTITION, NONE or NONE, PARTITION
When one of the tables is already partitioned, you can use PARTITION for that table and NONE for the other. This leverages the existing partitioning scheme to minimize or eliminate data redistribution.
This is highly efficient when joining a partitioned table with a non-partitioned table, especially if the join condition is based on the partitioning key. In such cases, there might be no redistribution at all.
Notes
You can find more information on using parallel query in the documentation
Published on
Published 10th January 2025