exists的优化
最近更新时间: 2025-02-18 16:02:00
exists在数据量比较大情况下,一般使用的是Semi Join ,在work_mem足够大的情况下走的是hash join,性能会更好。
postgres=# show work_mem;
work_mem
----------
4MB
(1 row)
Time: 0.298 ms
postgres=# explain select count(1) from t1 where exists(select 1 from t2 where t2.t1_f1=t1.f1);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=242218.32..242218.33 rows=1 width=8)
-> Remote Subquery Scan on all (dn001,dn002) (cost=242218.30..242218.32 rows=1 width=0)
-> Partial Aggregate (cost=242118.30..242118.31 rows=1 width=8)
-> Hash Semi Join (cost=110248.00..242118.30 rows=505421 width=0)
Hash Cond: (t1.f1 = t2.t1_f1)
-> Seq Scan on t1 (cost=0.00..17420.00 rows=1000000 width=4)
-> Hash (cost=79340.00..79340.00 rows=3000000 width=4)
-> Remote Subquery Scan on all (dn001,dn002) (cost=100.00..79340.00 rows=3000000 width=4)
Distribute results by S: t1_f1
-> Seq Scan on t2 (cost=0.00..52240.00 rows=3000000 width=4)
(10 rows)
Time: 1.091 ms
postgres=# select count(1) from t1 where exists(select 1 from t2 where t2.t1_f1=t1.f1);
count
--------
500000
(1 row)
Time: 3779.401 ms (00:03.779)
postgres=# set work_mem to '128MB';
SET
Time: 0.368 ms
postgres=# explain select count(1) from t1 where exists(select 1 from t2 where t2.t1_f1=t1.f1);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Finalize Aggregate (cost=101763.76..101763.77 rows=1 width=8)
-> Remote Subquery Scan on all (dn001,dn002) (cost=101763.75..101763.76 rows=1 width=0)
-> Partial Aggregate (cost=101663.75..101663.76 rows=1 width=8)
-> Hash Join (cost=89660.00..101663.75 rows=505421 width=0)
Hash Cond: (t2.t1_f1 = t1.f1)
-> Remote Subquery Scan on all (dn001,dn002) (cost=59840.00..69443.00 rows=505421 width=4)
Distribute results by S: t1_f1
-> HashAggregate (cost=59740.00..64794.21 rows=505421 width=4)
Group Key: t2.t1_f1
-> Seq Scan on t2 (cost=0.00..52240.00 rows=3000000 width=4)
-> Hash (cost=17420.00..17420.00 rows=1000000 width=4)
-> Seq Scan on t1 (cost=0.00..17420.00 rows=1000000 width=4)
(12 rows)
Time: 4.739 ms
postgres=# select count(1) from t1 where exists(select 1 from t2 where t2.t1_f1=t1.f1);
count
--------
500000
(1 row)
Time: 1942.037 ms (00:01.942)
postgres=#
大约有一倍性能的提升。