文档首页

租户端数据库 TDSQL PostgreSQL 版开发指南分布式执行计划示例

分布式执行计划示例

最近更新时间: 2025-02-18 16:02:00

下面是分布式架构下，不同类型的关联（join)的执行计划的示例。

Local Join

复制

复制成功

--关联的两张表        
create table t1 (                                         
    c1 integer,                                           
    c2 integer,                                                 
    c3 integer,                                                  
    c4 integer,                                                     
    c5 integer                                                       
) distribute by shard(c1);

create table t2 (                                         
    c1 integer,                                           
    c2 integer,                                                 
    c3 integer,                                                  
    c4 integer,                                                     
    c5 integer                                                       
) distribute by shard(c1);


tbase=# explain select t1.c3, t2.c3 from t1, t2 where t1.c1=t2.c1;

结果为

复制

复制成功

                                   QUERY PLAN                                    
--------------------------------------------------------------------------------
 Remote Subquery Scan on all (datanode_1, datanode_2, datanode_3)  (cost=17.71..85.28 rows=3192 width=8)
   ->  Hash Join  (cost=17.71..85.28 rows=3192 width=8)
         Hash Cond: (t1.c1 = t2.c1)
         ->  Seq Scan on t1  (cost=0.00..10.65 rows=565 width=8)
         ->  Hash  (cost=10.65..10.65 rows=565 width=8)
               ->  Seq Scan on t2  (cost=0.00..10.65 rows=565 width=8)

EXPLAIN对应的执行计划流程

Distributed Join（1）

下面这个例子中只有Join的一端（t2表）需要重新分布

复制

复制成功

tbase=# explain select t1.c3, t2.c3 from t1, t2 where t1.c1=t2.c2;

结果为

复制

复制成功

                                      QUERY PLAN                                             
--------------------------------------------------------------------------------
 Remote Subquery Scan on all (datanode_1, datanode_2, datanode_3)  (cost=3285.00..7395.00 rows=100000 width=8)
   ->  Hash Join  (cost=3285.00..7395.00 rows=100000 width=8)
         Hash Cond: (t2.c2 = t1.c1)
         ->  Remote Subquery Scan on all (datanode_1, datanode_2, datanode_3)  (cost=100.00..3335.00 rows=100000 width=8)
               Distribute results by S: c2
               ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=8)
         ->  Hash  (cost=1935.00..1935.00 rows=100000 width=8)
               ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=8)

EXPLAIN对应的执行流程：其中t2表的扫描结果会按照c2列值重分布。

Distributed Join（2）

下面这个例子中Join的两个表（t1和t2）都需要重新分布，因为关联条件都不是分布列。

复制

复制成功

tbase =# explain select t1.c3, t2.c3 from t1, t2 where t1.c2=t2.c2;

结果为

复制

复制成功

                                QUERY PLAN                                                
--------------------------------------------------------------------------------
 Remote Subquery Scan on all (datanode_1, datanode_2, datanode_3)  (cost=4685.00..8795.00 rows=100000 width=8)
   ->  Hash Join  (cost=4685.00..8795.00 rows=100000 width=8)
         Hash Cond: (t1.c2 = t2.c2)
         ->  Remote Subquery Scan on all (datanode_1, datanode_2, datanode_3)  (cost=100.00..3335.00 rows=100000 width=8)
               Distribute results by S: c2
               ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=8)
         ->  Hash  (cost=3335.00..3335.00 rows=100000 width=8)
               ->  Remote Subquery Scan on all (datanode_1, datanode_2, datanode_3)  (cost=100.00..3335.00 rows=100000 width=8)
                     Distribute results by S: c2
                     ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=8)

EXPLAIN对应的执行流程：其中t1表和t2表的扫描结果会按照各自c2列值重分布。

Broadcast

下面这个例子中，当表t1是大表， t2是小表的时候，虽然两张表的关联条件都不是分布列，优化器可能选择broadcast join, 把小表的扫描结果发送到各个数据节点。

复制

复制成功

tbase =# explain select t1.c3, t2.c3 from t1, t2 where t1.c2=t2.c2;

结果为

复制

复制成功

                                QUERY PLAN                                                
--------------------------------------------------------------------------------
 Remote Subquery Scan on all (datanode_1, datanode_2, datanode_3)  (cost=6635.00..10070.00 rows=100000 width=8)
   ->  Hash Join  (cost=6635.00..10070.00 rows=100000 width=8)
         Hash Cond: (t1.c2 = t2.c2)
         ->  Seq Scan on t1  (cost=0.00..1935.00 rows=100000 width=8)
         ->  Hash  (cost=4135.00..4135.00 rows=200000 width=8)
               ->  Remote Subquery Scan on all (datanode_1, datanode_2, datanode_3)  (cost=100.00..4135.00 rows=200000 width=8)
                     ->  Seq Scan on t2  (cost=0.00..1935.00 rows=100000 width=8)

EXPLAIN对应的执行流程：