分布key jon+limit优化
最近更新时间: 2025-02-18 16:02:00
数据准备。
postgres=# CREATE TABLE t1(f1 serial not null unique,f2 text,f3 text,f4 text,f5 text,f6 text,f7 text,f8 text,f9 text,f10 text,f11 text,f12 text) distribute by shard(f1);
CREATE TABLE
postgres=# CREATE TABLE t2(f1 serial not null unique,f2 text,f3 text,f4 text,f5 text,f6 text,f7 text,f8 text,f9 text,f10 text,f11 text,f12 text) distribute by shard(f1);
CREATE TABLE
postgres=# insert into t1 select t,md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text) from generate_series(1,1000000) as t;
INSERT 0 1000000
postgres=# insert into t2 select t,md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text) from generate_series(1,1000000) as t;
INSERT 0 1000000
postgres=# analyze t1;
ANALYZE
postgres=# analyze t2;
ANALYZE
postgres=#
postgres=# \timing
Timing is on.
postgres=# explain select t1.* from t1,t2 where t1.f1=t2.f1 limit 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.25..1.65 rows=10 width=367)
-> Merge Join (cost=0.25..140446.26 rows=1000000 width=367)
Merge Cond: (t1.f1 = t2.f1)
-> Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=100.12..434823.13 rows=1000000 width=367)
-> Index Scan using t1_f1_key on t1 (cost=0.12..62723.13 rows=1000000 width=367)
-> Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=100.12..71823.13 rows=1000000 width=4)
-> Index Only Scan using t2_f1_key on t2 (cost=0.12..62723.13 rows=1000000 width=4)
(7 rows)
Time: 1.372 ms
postgres=# explain analyze select t1.* from t1,t2 where t1.f1=t2.f1 limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.25..1.65 rows=10 width=367) (actual time=2675.437..2948.199 rows=10 loops=1)
-> Merge Join (cost=0.25..140446.26 rows=1000000 width=367) (actual time=2675.431..2675.508 rows=10 loops=1)
Merge Cond: (t1.f1 = t2.f1)
-> Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=100.12..434823.13 rows=1000000 width=367) (actual time=1.661..1.704 rows=10 loops=1)
-> Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=100.12..71823.13 rows=1000000 width=4) (actual time=2673.761..2673.783 rows=10 loops=1)
Planning time: 0.358 ms
Execution time: 2973.948 ms
(7 rows)
Time: 2976.008 ms (00:02.976)
postgres=#
看执行计划是在cn上面执行,merge join需要把要join的数据拉回cn再排序,然后再join,这里主切的开销在于网络,优化的话方法就是让语句其推下去计算。
postgres=# set prefer_olap to on;
SET
Time: 0.291 ms
postgres=# explain select t1.* from t1,t2 where t1.f1=t2.f1 limit 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Limit (cost=100.25..101.70 rows=10 width=367)
-> Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=100.25..101.70 rows=10 width=367)
-> Limit (cost=0.25..1.65 rows=10 width=367)
-> Merge Join (cost=0.25..140446.26 rows=1000000 width=367)
Merge Cond: (t1.f1 = t2.f1)
-> Index Scan using t1_f1_key on t1 (cost=0.12..62723.13 rows=1000000 width=367)
-> Index Only Scan using t2_f1_key on t2 (cost=0.12..62723.13 rows=1000000 width=4)
(7 rows)
Time: 1.061 ms
postgres=# explain analyze select t1.* from t1,t2 where t1.f1=t2.f1 limit 10;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=100.25..101.70 rows=10 width=367) (actual time=1.527..3.899 rows=10 loops=1)
-> Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=100.25..101.70 rows=10 width=367) (actual time=1.525..1.529 rows=10 loops=1)
Planning time: 0.360 ms
Execution time: 18.193 ms
(4 rows)
Time: 19.921 ms
相差150倍的性能,一般情况下,如果需要拉大量的数据回cn计算,则下推执行的效率会更好。