增大work_mem减少io访问
最近更新时间: 2025-02-18 16:02:00
postgres=# CREATE TABLE t1(f1 serial not null unique,f2 text,f3 text,f4 text,f5 text,f6 text,f7 text,f8 text,f9 text,f10 text,f11 text,f12 text) distribute by shard(f1);
CREATE TABLE
Time: 70.545 ms
postgres=# CREATE TABLE t2(f1 serial not null unique,f2 text,f3 text,f4 text,f5 text,f6 text,f7 text,f8 text,f9 text,f10 text,f11 text,f12 text) distribute by shard(f1);
CREATE TABLE
Time: 61.913 ms
postgres=# insert into t1 select t,md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text) from generate_series(1,1000) as t;
INSERT 0 1000
Time: 48.866 ms
postgres=# insert into t2 select t,md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text),md5(t::text) from generate_series(1,50000) as t;
INSERT 0 50000
Time: 792.858 ms
postgres=# analyze t1;
ANALYZE
Time: 175.946 ms
postgres=# analyze t2;
ANALYZE
Time: 318.802 ms
postgres=# explain select * from t1 where f2 not in (select f2 from t2);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=0.00..2076712.50 rows=500 width=367)
-> Seq Scan on t1 (cost=0.00..2076712.50 rows=500 width=367)
Filter: (NOT (SubPlan 1))
SubPlan 1
-> Materialize (cost=0.00..4028.00 rows=50000 width=33)
-> Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=0.00..3240.00 rows=50000 width=33)
-> Seq Scan on t2 (cost=0.00..3240.00 rows=50000 width=33)
(7 rows)
Time: 0.916 ms
postgres=# select * from t1 where f2 not in (select f2 from t2);
f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | f10 | f11 | f12
----+----+----+----+----+----+----+----+----+-----+-----+-----
(0 rows)
Time: 4226.825 ms (00:04.227)
postgres=# set work_mem to '8MB';
SET
Time: 0.289 ms
postgres=# explain select * from t1 where f2 not in (select f2 from t2);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=3365.00..3577.50 rows=500 width=367)
-> Seq Scan on t1 (cost=3365.00..3577.50 rows=500 width=367)
Filter: (NOT (hashed SubPlan 1))
SubPlan 1
-> Remote Subquery Scan on all (dn01,dn02,dn03,dn04,dn05,dn06,dn07,dn08,dn09,dn10) (cost=0.00..3240.00 rows=50000 width=33)
-> Seq Scan on t2 (cost=0.00..3240.00 rows=50000 width=33)
(6 rows)
Time: 0.890 ms
postgres=# select * from t1 where f2 not in (select f2 from t2);
f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | f10 | f11 | f12
----+----+----+----+----+----+----+----+----+-----+-----+-----
(0 rows)
Time: 105.249 ms
postgres=#
增大work_mem后,性能提高了40倍,因为work_mem足够放下filter的数据,不需要再做 Materialize物化,filter由原来的subplan变成了hash subplan,直接在内存hash表中filter,性能就上去了。
注意:
work_mem默认不宜过大,建议在某个具体的查询语句中再根据需要进行调整即可。