PostgresSQL

PostgresSQL

Tips

  • Postgres vs. PostgreSQL
    • PostgreSQL 为主,Postgres 作为别名
  • Manual
  • Current Document
  • Postgrest
    • REST API for any Postgres database
  • pRest
    • Golang
  • Awesome PostgreSQL
  • Trees In The Database
  • 默认端口: 5432
  • PG 所带的 bin 列表
    • pgbench
    • pg_resetxlog
    • pg_controldata
    • pg_test_fsync
    • pg_upgrade
    • pg_test_timing
    • pg_xlogdump
    • pg_rewind
    • initdb
    • pg_ctl
    • postgres
    • postmaster
    • pg_archivecleanup
  • 其他工具
  • 资源
  • 数据结构
    • Database / Schema / table
  • 数据迁移
  • 参考
    • Why PostgreSQL doesn't have query hints
    • A PostgreSQL Response to Uber
      • Write Amplification
        • PostgreSQL must update every index if a change to the row updates an index.
        • PostgreSQL keeps each version of the tuple on disk until it is vacuumed.
        • Each page changed here must be pushed down the binary replication link.
      • Replication
        • PostgreSQL does not have logical replication in core. (Coming in 10!)
        • Existing logical replication tools (Slony, Bucardo, etc.) are somewhat fiddly to set up and manage.
      • Replica MVCC
        • Incoming streaming replication activity can be blocked by queries, or queries can be cancelled.
        • Naïve users can be surprised by query cancellation messages.
      • Upgrade
        • PostgreSQL does not have in-place major version upgrade.
        • You have to do some kind of process to get low-downtime upgrades.
        • pg_upgrade, while a big improvement, is not a panacea.
          • PostGIS, for example, is a huge pain.
      • Buffer Pool
        • PostgreSQL’s shared buffer management performance peaks at 8-32GB.
          • [citation required]
        • Larger shared_buffers than that (usually) mean diminishing returns.
        • Retrieving things from file system cache is slower than from shared buffers.
      • Connection Management
        • The PostgreSQL forking model is not efficient for lots of connections, or fast connection cycling.
        • While basic RAM statistics can be misleading, each backend does consume a notable amount of memory.
    • Postgres Hidden Gems
brew install postgresql
brew postgresql-update database
# POSTGRES_USER=postgres
# POSTGRES_PASSWORD
# PGDATA=/var/lib/postgresql/data
# 可以用任意 --user 启动
docker run -it --rm -v $PWD/pg:/var/lib/postgresql/data postgres:alpine
# 设置数据目录, 免得后面再指定
export PGDATA=$HOME/data/pg/data
# 查看配置, -W 可要求提示输入密码
initdb -kU postgres -s
# 生成数据库
initdb -kU postgres
# 启动数据库
pg_ctl -D $PGDATA -l $HOME/data/pg/logfile start
pg_ctl -D $PGDATA -l $PGDATA/pg.log start
# 查看运行状态
pg_ctl -D $PGDATA status
-- 查看当前版本
select version();
-- 将字符串风格为字符数组
select regexp_split_to_array('abc','');

Notes

升级

pg_dumpall -U postgres > dumpfile
/etc/init.d/postgresql stop
apk add -u postgresql
/etc/init.d/postgresql setup
/etc/init.d/postgresql start
psql -U postgres -f dumpfile

备份

# 转储单个库
pg_dump dbname > outfile
psql dbname < infile
# -j 8 并发数
# -F d 目录格式, 并发要求使用目录
# -t 指定表
# -O --no-owner
# -f backup 文件/目录名
pg_dump -F d -f backup -j 8 db -t a -t b -O
# 使用压缩
pg_dump dbname | gzip > filename.gz
gunzip -c filename.gz | psql dbname
# 使用自定义格式, 必须要使用 pg_restore 恢复
pg_dump -Fc dbname > filename
pg_restore -d dbname filename
# 转储所有
pg_dumpall > outfile
psql -f infile postgres
-- 导入 CSV
-- COPY
-- https://www.postgresql.org/docs/current/static/sql-copy.html
-- COPY table_name [ ( column_name [, ...] ) ] FROM { 'filename' | PROGRAM 'command' | STDIN } [ [ WITH ] ( option [, ...] ) ]
-- COPY { table_name [ ( column_name [, ...] ) ] | ( query ) } TO { 'filename' | PROGRAM 'command' | STDOUT } [ [ WITH ] ( option [, ...] ) ]
-- 需要管理员权限
-- 支持 text,binary,csv
-- Windows users might need to use an E'' string and double any backslashes used in the path name.
-- 要求文件在服务器上
COPY phonebook (id,name, phone) FROM '/tmp/phonebook.csv' DELIMITER ',' CSV;
-- 带头导出
COPY phonebook TO '/tmp/data.csv' DELIMITER ',' CSV HEADER;
-- \copy
-- https://www.postgresql.org/docs/current/static/app-psql.html#APP-PSQL-META-COMMANDS-COPY
-- \copy { table [ ( column_list ) ] | ( query ) } { from | to } { 'filename' | program 'command' | stdin | stdout | pstdin | pstdout } [ [ with ] ( option [, ...] ) ]
-- 从客户端进行导入
-- 可以使用相对路径, 会先将文件上传到服务器
\copy out_tmp (id,name) from 'out.csv' DELIMITER ',' CSV;
-- 导出
\copy my_table to 'filename' csv header
\copy (select id,name from out_tmp) to 'exp.csv' DELIMITER ',' CSV;
-- 带头
\copy (select id,name from out_tmp) to 'exp.csv' DELIMITER ',' CSV HEADER;
# 导入 JSON
# JSON 如果包含转移可能会有问题, 例如包含 \n
cat data.json | psql -h localhost -p 5432 feeds -c "COPY news_feed (data) FROM STDIN;"
# 支持导入 CSV 和 JSON
# https://github.com/lukasmartinelli/pgfutter
# create table raw_data(id serial primary key, data jsonb);
pgfutter --host localhost --db db-name --user myuser --schema myschema --table raw_data --jsonb json data.json

optimize

-- 这个操作会非常慢
update a set name = b.name from b where a.id = b.id;
-- 这个操作会比整个更新快
create table c as (select a.age, b.name from a left join b on a.id = b.id);
-- 所以整表更新建议
begin;
create table T as select col1, col2, colN from orders;
drop table orders;
alter table T rename to orders;
commit;
-- 置空一列
-- 数据大了后该操作非常慢
UPDATE orders SET status = null;
-- 因此可以考虑这样
ALTER TABLE orders DROP column status
, ADD column status text;

Install

数据类型

JSON

  • 支持 hstore, json 和 jsonb 类型
    • 大部分时候选择 JSONB
    • 如果只写入或者要求快速写入, 很少查询可以选择 JSON
  • https://www.postgresql.org/docs/current/static/datatype-json.html
  • json 文本, 操作更耗时, 每次需要解析, 会保存重复键值, 以最后一次为准
  • josnb 二进制, 一般磁盘空间更大, 写入更耗时, 支持索引
  • 操作符
    • -> 取字段, 可以是数组索引
    • ->> 返回的值始终为 text, 不会有引号
    • #> 指定多个路径, 获取为 text
      • SELECT '{"a":[1,2,3],"b":[4,5,6]}'::json#>>'{a,2}';
    • @> jsonb, 检测左侧是否包含右侧
    • <@ jsonb, 检测右侧是否包含左侧
    • ? jsonb, 检测是否包含 key 或数组元素
      • SELECT '{"a":1, "b":2}'::jsonb ? 'b';
    • ?| jsonb, 检测是否包含某个 key
      • SELECT '{"a":1, "b":2, "c":3}'::jsonb ?| array['b', 'ceeee', 'e'];
    • ?& jsonb, 是否包含所有 key
    • || jsonb, 拼接两个 jsonb
    • - 删除 kv 或数组元素
      • SELECT '{"a": "b"}'::jsonb - 'a'; SELECT '["a", "b"]'::jsonb - 'a';
      • SELECT '["a", "b"]'::jsonb - (-1); SELECT '["a", "b"]'::jsonb - (1);
    • #- 删除路径
  • FAQ
SELECT '5'::json;
SELECT '[1, 2, "foo", null]'::json;
SELECT '{"bar": "baz", "balance": 7.77, "active": false}'::json;
-- 检测是否包含
SELECT '"foo"'::jsonb @> '"foo"'::jsonb;
SELECT '[1, 2, 3]'::jsonb @> '[3, 1]'::jsonb;
-- 指定一个字段
SELECT data->'field' FROM doc;
-- 创建索引
CREATE INDEX idxgin ON api USING GIN (jdoc);
CREATE INDEX idxginp ON api USING GIN (jdoc jsonb_path_ops);
-- 将数组作为行
SELECT
s ->> 'name'
FROM tab t, jsonb_array_elements(t.family -> 'children') s;

Admin

-- 相当于给其他人 root 权限
grant postgres to someone;
GRANT ALL PRIVILEGES ON DATABASE "my_db" to my_user;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO myuser;
-- What GRANT USAGE ON SCHEMA exactly do? https://stackoverflow.com/q/17338621/1870054
--ACCESS DB
REVOKE CONNECT ON DATABASE nova FROM PUBLIC;
GRANT CONNECT ON DATABASE nova TO user;
--ACCESS SCHEMA
REVOKE ALL ON SCHEMA public FROM PUBLIC;
GRANT USAGE ON SCHEMA public TO user;
--ACCESS TABLES
grant usage , select on all sequences in schema s to u;
REVOKE ALL ON ALL TABLES IN SCHEMA public FROM PUBLIC ;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO read_only ;
GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public TO read_write ;
GRANT ALL ON ALL TABLES IN SCHEMA public TO admin ;

NOTIFY

LISTEN virtual;
NOTIFY virtual;
NOTIFY virtual, 'This is the payload';
LISTEN foo;
SELECT pg_notify('fo' || 'o', 'pay' || 'load');
-- NOTIFY on insert
-- Send Notify Procedure
CREATE OR REPLACE FUNCTION my_tab()
RETURNS TRIGGER AS $$
DECLARE
BEGIN
PERFORM pg_notify('my_tab_insert_notify', new.id::TEXT);
RETURN new;
END;
$$ LANGUAGE plpgsql;
-- Trigger
CREATE TRIGGER my_tab_after_insert
AFTER INSERT ON my_tab
FOR EACH ROW EXECUTE PROCEDURE my_tab_notify_insert();

Graph

Extension

-- 查看安装的
\dX;
select * from pg_extension;
-- 查看所有的
SELECT * FROM pg_available_extensions;
-- 查看所有的
CREATE EXTENSION IF NOT EXISTS file_fdw;

postgres-fdw

https://wiki.postgresql.org/wiki/Autonomous_subtransactions

Version

11

  • POSTGRESQL 11 BETA 1 RELEASED!
  • Appendix E. Release Notes
    • 分片
      • 支持使用 hash 键分片
      • UPDATE 更新分片键时可以将数据更新到正确的分片
      • SELECT 性能提升
      • 对于分片表支持 PRIMARY KEY, FOREIGN KEY, 索引和触发器
    • 并行
      • 并行 hash join
      • 并行 CREATE INDEX 创建 B-tree 索引
      • 并行 CREATE TABLE .. AS, CREATE MATERIALIZED VIEW 和一些 UNION 操作
    • SQL 存储过程
      • 支持内嵌事务
      • 使用 CALL 执行, CREATE/DROP/ALTER PROCEDURE
      • 也可以使用 DROP/ALTER ROUTINE 删除和修改, 也可以操作函数和聚合
      • 现在的存储函数是在事务内执行
      • 事务的操作只能在顶层存储过程
      • https://www.postgresql.org/docs/11/static/xproc.html
    • 部分 SQL JIT
    • 窗口函数支持所有 SQL:2011 标准, 包括 RANGE distance PRECEDING/FOLLOWING, GROUPS mode, and frame exclusion options
    • 之前 ALTER TABLE .. ADD COLUMN 并且 DEFAULT 为 non-null 时会重写整个表, 现在大部分情况下都不会, 这类操作会相当快
    • 支持使用 quit 和 exit 退出

10

-- 之前
-- serial 不会被语句重现
-- 会有序列归属问题
CREATE TABLE test_old (
id serial PRIMARY KEY,
payload text
);
INSERT INTO test_old (payload) VALUES ('a'), ('b'), ('c') RETURNING *;
-- 之后
-- 符合 SQL 标准, 兼容 DB2, Oracle
-- 语句重现
CREATE TABLE test_new (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
payload text
);
INSERT INTO test_new (payload) VALUES ('a'), ('b'), ('c') RETURNING *;

9.x

FAQ

varchar vs text

int vs bigint

  • 在 64 位的服务器上, 两者占用的空间相同
  • 因此建议使用 bigint

数组索引

GIN 索引是反向索引(inverted indexes), 适用于包含多个值的情况.

  • 支持的操作符
  • <@
  • @>
  • =
  • &&

数组外键

  • 不支持

ERROR: cannot alter type of a column used by a view or rule

必须要先 drop view 再操作, 目前没有比较好的操作方式, 但操作都可以在一个事务中完成

有些修改可以通过直接修改 pg_attribute 来达到目的, 但是非常不建议.

时间戳上的毫秒处理

目前没有比较好的处理方式

-- 将一个毫秒的 ts 转为 timestamp 类型
ALTER TABLE my_info
ALTER COLUMN tstmp TYPE TIMESTAMP USING to_timestamp(tstmp / 1000) + ((tstmp % 1000) || ' milliseconds') :: INTERVAL;

查询语句的最大大小

#define MaxAllocSize ((Size) 0x3fffffff) /* 1 gigabyte - 1 */

IN vs any

CREATE TABLE test (
id BIGINT GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
val TEXT
);
-- 插入测试数据
DO
$$
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO test (val) VALUES ('val#' || i);
END LOOP;
END
$$;
EXPLAIN SELECT *
FROM test
WHERE id IN (1, 2, 3);
EXPLAIN SELECT *
FROM test
WHERE id = ANY (VALUES (1), (2), (3));
# IN
Bitmap Heap Scan on test (cost=12.86..19.97 rows=3 width=40)
Recheck Cond: (id = ANY ('{1,2,3}'::bigint[]))
-> Bitmap Index Scan on test_pkey (cost=0.00..12.86 rows=3 width=0)
Index Cond: (id = ANY ('{1,2,3}'::bigint[]))
# ANY
Nested Loop (cost=0.32..25.00 rows=3 width=40)
-> HashAggregate (cost=0.05..0.08 rows=3 width=4)
Group Key: ""*VALUES*"".column1
-> Values Scan on ""*VALUES*"" (cost=0.00..0.04 rows=3 width=4)
-> Index Scan using test_pkey on test (cost=0.28..8.29 rows=1 width=40)
Index Cond: (id = ""*VALUES*"".column1)
-- 测试 IN
DO
$$
DECLARE
x TEXT = '';
r REFCURSOR;
rec RECORD;
BEGIN
x = '0';
FOR i IN 1..1000 LOOP
x = x || ',' || i;
END LOOP;
OPEN r FOR EXECUTE 'EXPLAIN ANALYSE SELECT *
FROM test
WHERE id IN (' || x || ')';
FOR i IN 1..6 LOOP
FETCH r INTO rec;
RAISE NOTICE 'ROW %', rec;
END LOOP;
END
$$;
-- 测试 ANY
DO
$$
DECLARE
x TEXT;
r REFCURSOR;
rec RECORD;
BEGIN
x = '(0)';
FOR i IN 1..1000 LOOP
x = x || ',(' || i || ')';
END LOOP;
OPEN r FOR EXECUTE 'EXPLAIN ANALYSE SELECT *
FROM test
WHERE id = any (VALUES ' || x || ')';
FOR i IN 1..10 LOOP
FETCH r INTO rec;
RAISE NOTICE 'ROW %', rec;
END LOOP;
END
$$;