开发相关规范
最近更新时间: 2024-06-12 15:06:00
建议对DB object 尤其是COLUMN 加COMMENT,便于后续新人了解业务及维护。
注释前后的数据表可读性对比,有注释的一看就明白。
postgres=# \d+ TBase_main Table "public.tbase_main" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+----------+--------------+------------- id | integer | | plain | | mc | text | | extended | | Indexes: "TBase_main_id_uidx" UNIQUE, btree (id) Has OIDs: no Distribute By SHARD(id) Location Nodes: ALL DATANODES postgres=# comment on column TBase_main.id is 'id号'; COMMENT postgres=# comment on column TBase_main.mc is '产品名称'; COMMENT postgres=# \d+ TBase_main Table "public.tbase_main" Column | Type | Modifiers | Storage | Stats target | Description --------+---------+-----------+----------+--------------+------------- id | integer | | plain | | id号 mc | text | | extended | | 产品名称 Indexes: "TBase_main_id_uidx" UNIQUE, btree (id) Has OIDs: no Distribute By SHARD(id) Location Nodes: ALL DATANODES
建议非必须时避免select *,只取所需字段,以减少包括不限于网络带宽消耗。
postgres=# explain (verbose) select * from tbase_main where id=1; QUERY PLAN --------------------------------------------------------------------------------------------- Index Scan using TBase_main_id_uidx on public.tbase_main (cost=0.15..8.17 rows=1 width=36) Output: id, mc Index Cond: (TBase_main.id = 1) (3 rows) postgres=# explain (verbose) select tableoid from tbase_main where id=1; QUERY PLAN -------------------------------------------------------------------------------------------- Index Scan using TBase_main_id_uidx on public.tbase_main (cost=0.15..8.17 rows=1 width=4) Output: tableoid Index Cond: (TBase_main.id = 1) (3 rows)
*是返回36个字符,而另一个一条记录只能4个字段的长度。
建议update 时尽量做
postgres=# update tbase_main set mc='TBase' ; UPDATE 1 postgres=# select xmin,* from tbase_main; xmin | id | mc ------+----+------- 2562 | 1 | TBase (1 row) postgres=# update tbase_main set mc='TBase' ; UPDATE 1 postgres=# select xmin,* from tbase_main; xmin | id | mc ------+----+------- 2564 | 1 | TDSQL PG (1 row) postgres=# update tbase_main set mc='TBase' where mc!='TBase'; UPDATE 0 postgres=# select xmin,* from tbase_main; xmin | id | mc ------+----+------- 2564 | 1 | TDSQL PG (1 row)
上面的效果是一样的,但带条件的更新不会产生一个新的版本记录,不需要系统执行vacuum回收垃圾数据。
建议将单个事务的多条SQL操作,分解、拆分,或者不放在一个事务里,让每个事务的粒度尽可能小,尽量lock少的资源,避免lock 、dead lock的产生。
postgres=# begin; BEGIN postgres=# update tbase_main set mc='TBase_1.3'; UPDATE 200000000 #sesseion2 等待 postgres=# update tbase_main set mc='TBase_1.4' where id=1; #sesseion3 等待 postgres=# update tbase_main set mc='TBase_1.5' where id=2; 如果#sesseion1分布批更新的话,如下所示 postgres=# begin; BEGIN postgres=# update tbase_main set mc='TBase_1.3' where id>0 and id <=100000; UPDATE 100000 postgres=#COMMIT; postgres=# begin; BEGIN postgres=# update tbase_main set mc='TBase_1.3' where id>100000 and id <=200000; UPDATE 100000 postgres=#COMMIT;
则session2和session3中就能部分提前完成,这样可以避免大量的锁等待和出现大量的session占用系统资源,在做全表更新时请使用这种方法来执行:
建议大批量的数据入库时,使用copy ,不建议使用insert,以提高写入速度;
postgres=# insert into tbase_main select t,'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx' from generate_series(1,100000) as t; INSERT 0 100000 Time: 9511.755 ms postgres=# copy TBase_main to '/data/pgxz/TBase_main.txt'; COPY 100002 Time: 179.428 ms postgres=# copy TBase_main from '/data/pgxz/TBase_main.txt'; COPY 100002 Time: 1625.803 ms postgres=#
性能相差5倍。
建议对报表类的或生成基础数据的查询,使用物化视图(MATERIALIZEDVIEW)定期固化数据快照,避免对多表(尤其多写频繁的表)重复跑相同的查询,且物化视图支持REFRESH MATERIALIZED VIEW CONCURRENTLY, 支持并发更新;
如有一个程序需要不断查询TBase_main的总记录数,那么我们这样做:
postgres=# select count(1) from tbase_main; count -------- 200004 (1 row) Time: 27.948 ms postgres=# create MATERIALIZED VIEW TBase_main_count as select count(1) as num from tbase_main; SELECT 1 Time: 322.372 ms postgres=# select num from TBase_main_count ; num -------- 200004 (1 row) Time: 0.421 ms
性能提高上百倍。
有数据变化时刷新方法 。
postgres=# copy TBase_main from '/data/pgxz/TBase_main.txt'; COPY 100002 Time: 1201.774 ms postgres=# select count(1) from tbase_main; count -------- 300006 (1 row) Time: 23.164 ms postgres=# REFRESH MATERIALIZED VIEW TBase_main_count; REFRESH MATERIALIZED VIEW Time: 49.486 ms postgres=# select num from tbase_main_count ; num -------- 300006 (1 row) Time: 0.301 ms
建议复杂的统计查询可以尝试窗口函数 。
两表join时尽量的使用分布key进行join。
所以在建立业务的主表,明细表时,就需要使用他们的关联键来做分布键,如下所示:
[pgxz@VM_0_29_centos pgxz]$ psql -p 15001 psql (PostgreSQL 10 (TBase 2.01)) Type "help" for help. postgres=# create table tbase_main(id integer,mc text) distribute by shard(id); CREATE TABLE postgres=# create table tbase_detail(id integer,TBase_main_id integer,mc text) distribute by shard(TBase_main_id); CREATE TABLE postgres=# explain select TBase_detail.* from tbase_main,TBase_detail where TBase_main.id=TBase_detail.TBase_main_id; QUERY PLAN ---------------------------------------------------------------------------- Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) Node/s: dn001, dn002 (2 rows) postgres=# explain (verbose) select TBase_detail.* from tbase_main,TBase_detail where TBase_main.id=TBase_detail.TBase_main_id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Data Node Scan on "__REMOTE_FQS_QUERY__" (cost=0.00..0.00 rows=0 width=0) Output: TBase_detail.id, TBase_detail.TBase_main_id, TBase_detail.mc Node/s: dn001, dn002 Remote query: SELECT TBase_detail.id, TBase_detail.TBase_main_id, TBase_detail.mc FROM public.tbase_main, public.tbase_detail WHERE (TBase_main.id = TBase_detail.TBase_main_id) (4 rows) postgres=#
分布键用唯一索引代替主键。
postgres=# create unique index TBase_main_id_uidx on TBase_main using btree(id); CREATE INDEX
因为唯一索引后期的维护成本比主键要低很多。
分布键无法建立唯一索引则要建立普通索引,提高查询的效率。
postgres=# create index TBase_detail_TBase_main_id_idx on TBase_detail using btree(TBase_main_id); CREATE INDEX
这样两表在join查询时返回少量数据时的效率才会高。
不要对字段建立外键。
目前TDSQL PG还不支持多dn外键约束,除非你能确定数据关联键的数据全部落在同一个dn上面。