分布key jon+limit优化

最近更新时间: 2025-02-18 16:02:00

数据准备。

  • 复制
    复制成功
postgres=# CREATE TABLE t1(f1 serial not null unique,f2 text,f3 text,f4 text,f5 text,f6 text,f7 text,f8 text,f9 text,f10 text,f11 text,f12 text) distribute by shard(f1); 
CREATE TABLE

postgres=# CREATE TABLE t2(f1 serial not null unique,f2 text,f3 text,f4 text,f5 text,f6 text,f7 text,f8 text,f9 text,f10 text,f11 text,f12 text) distribute by shard(f1); 
CREATE TABLE

postgres=# insert into t1 select t,md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text) from generate_series(1,1000000) as t;
INSERT 0 1000000

postgres=# insert into t2 select t,md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text) from generate_series(1,1000000) as t; 
INSERT 0 1000000

postgres=# analyze t1;
ANALYZE
postgres=# analyze t2;
ANALYZE
postgres=# 
postgres=# \timing 
Timing is on.
postgres=# explain  select t1.* from t1,t2 where t1.f1=t2.f1 limit 10;
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.25..1.65 rows=10 width=367)
   ->  Merge Join  (cost=0.25..140446.26 rows=1000000 width=367)
         Merge Cond: (t1.f1 = t2.f1)
         ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.12..434823.13 rows=1000000 width=367)
               ->  Index Scan using t1_f1_key on t1  (cost=0.12..62723.13 rows=1000000 width=367)
         ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.12..71823.13 rows=1000000 width=4)
               ->  Index Only Scan using t2_f1_key on t2  (cost=0.12..62723.13 rows=1000000 width=4)
(7 rows)

Time: 1.372 ms
postgres=# explain analyze  select t1.* from t1,t2 where t1.f1=t2.f1 limit 10;       
                                                                                         QUERY PLAN                                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.25..1.65 rows=10 width=367) (actual time=2675.437..2948.199 rows=10 loops=1)
   ->  Merge Join  (cost=0.25..140446.26 rows=1000000 width=367) (actual time=2675.431..2675.508 rows=10 loops=1)
         Merge Cond: (t1.f1 = t2.f1)
         ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.12..434823.13 rows=1000000 width=367) (actual time=1.661..1.704 rows=10 loops=1)
         ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.12..71823.13 rows=1000000 width=4) (actual time=2673.761..2673.783 rows=10 loops=1)
 Planning time: 0.358 ms
 Execution time: 2973.948 ms
(7 rows)
Time: 2976.008 ms (00:02.976)
postgres=# 

看执行计划是在cn上面执行,merge join需要把要join的数据拉回cn再排序,然后再join,这里主切的开销在于网络,优化的话方法就是让语句其推下去计算。

  • 复制
    复制成功
postgres=# set prefer_olap to on;
SET
Time: 0.291 ms
postgres=#  explain  select t1.* from t1,t2 where t1.f1=t2.f1 limit 10;
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=100.25..101.70 rows=10 width=367)
   ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.25..101.70 rows=10 width=367)
         ->  Limit  (cost=0.25..1.65 rows=10 width=367)
               ->  Merge Join  (cost=0.25..140446.26 rows=1000000 width=367)
                     Merge Cond: (t1.f1 = t2.f1)
                     ->  Index Scan using t1_f1_key on t1  (cost=0.12..62723.13 rows=1000000 width=367)
                     ->  Index Only Scan using t2_f1_key on t2  (cost=0.12..62723.13 rows=1000000 width=4)
(7 rows)
Time: 1.061 ms
postgres=# explain analyze  select t1.* from t1,t2 where t1.f1=t2.f1 limit 10;   
                                             QUERY PLAN                                                                                 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=100.25..101.70 rows=10 width=367) (actual time=1.527..3.899 rows=10 loops=1)
   ->  Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10)  (cost=100.25..101.70 rows=10 width=367) (actual time=1.525..1.529 rows=10 loops=1)
 Planning time: 0.360 ms
 Execution time: 18.193 ms
(4 rows)
Time: 19.921 ms

相差150倍的性能,一般情况下,如果需要拉大量的数据回cn计算,则下推执行的效率会更好。