六狼论坛

 找回密码
 立即注册

QQ登录

只需一步,快速开始

新浪微博账号登陆

只需一步,快速开始

搜索
查看: 137|回复: 0

PostgreSQL的分区表

[复制链接]

升级  4%

12

主题

12

主题

12

主题

秀才

Rank: 2

积分
56
 楼主| 发表于 2013-1-30 01:33:03 | 显示全部楼层 |阅读模式
1.  version.
                               version                                                   ------------------------------------------------------------------------------------------------------------ PostgreSQL 8.4.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-10), 32-bit 
2.  和Oracle的分区一样,PostgresSQL(以下简称PG)的分区的意思是把逻辑上的一个大表分割成物理上的几块儿。分区不仅仅能带来访问速度的提升,关键的它带来的是管理和维护的方便。想想一张上T的表,管理起来是一件多么费劲的事情。小表分区不实际。表在多大情况下才考虑分区呢?PostgresSQL官方给出的建议是当表本身大小超过了机器物理内存的实际大小时,可以考虑分区。PG目前(8.4.4)仅支持范围分区和列表分区,尚未支持散列分区。
 
3.  PG的分区是基于表继承的方式实现的,无论何时,都应保证父表里面是空的。例子:
--建父表CREATE TABLE measurement (    city_id         int not null,    logdate         date not null,    peaktemp        int,    unitsales       int); 
--建分区表CREATE TABLE measurement_y2006m01 (    CHECK ( logdate >= DATE '2006-01-01' AND logdate < DATE '2006-02-01' )) INHERITS (measurement);CREATE TABLE measurement_y2006m02 (    CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )) INHERITS (measurement);CREATE TABLE measurement_y2006m03 (    CHECK ( logdate >= DATE '2006-03-01' AND logdate < DATE '2006-04-01' )) INHERITS (measurement);CREATE TABLE measurement_y2006m04 (    CHECK ( logdate >= DATE '2006-04-01' AND logdate < DATE '2006-05-01' )) INHERITS (measurement);CREATE TABLE measurement_y2006m05 (    CHECK ( logdate >= DATE '2006-05-01' AND logdate < DATE '2006-06-01' )) INHERITS (measurement);CREATE TABLE measurement_y2006m06 (    CHECK ( logdate >= DATE '2006-06-01' AND logdate < DATE '2006-07-01' )) INHERITS (measurement);CREATE TABLE measurement_y2006m07 (    CHECK ( logdate >= DATE '2006-07-01' AND logdate < DATE '2006-08-01' )) INHERITS (measurement);CREATE TABLE measurement_y2006m08 (    CHECK ( logdate >= DATE '2006-08-01' AND logdate < DATE '2006-09-01' )) INHERITS (measurement);CREATE TABLE measurement_y2006m09 (    CHECK ( logdate >= DATE '2006-09-01' AND logdate < DATE '2006-10-01' )) INHERITS (measurement);CREATE TABLE measurement_y2006m10 (    CHECK ( logdate >= DATE '2006-10-01' AND logdate < DATE '2006-11-01' )) INHERITS (measurement);CREATE TABLE measurement_y2006m11 (    CHECK ( logdate >= DATE '2006-11-01' AND logdate < DATE '2006-12-01' )) INHERITS (measurement);CREATE TABLE measurement_y2006m12 (    CHECK ( logdate >= DATE '2006-12-01' AND logdate < DATE '2007-01-01' )) INHERITS (measurement); 
 分区键上建索引:
CREATE INDEX measurement_y2006m02_logdate ON measurement_y2006m02 (logdate);CREATE INDEX measurement_y2006m03_logdate ON measurement_y2006m03 (logdate);... 
4.   对于开发人员来说,希望数据库是透明的,只管insert into measurement.对于数据插向哪个分区,则希望由DB决定。这点,ORACLE实现了,但是PG不行,毕竟是免费的。需要前期人工处理下:
首先建立个函数
CREATE OR REPLACE FUNCTION measurement_insert_trigger()RETURNS TRIGGER AS $$BEGIN    IF ( NEW.logdate >= DATE '2006-01-01' AND         NEW.logdate < DATE '2006-02-01' ) THEN        INSERT INTO measurement_y2006m01 VALUES (NEW.*);    ELSIF ( NEW.logdate >= DATE '2006-02-01' AND            NEW.logdate < DATE '2006-03-01' ) THEN        INSERT INTO measurement_y2006m02 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-03-01' AND            NEW.logdate < DATE '2006-04-01' ) THEN        INSERT INTO measurement_y2006m03 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-04-01' AND            NEW.logdate < DATE '2006-05-01' ) THEN        INSERT INTO measurement_y2006m04 VALUES (NEW.*);    ELSIF ( NEW.logdate >= DATE '2006-05-01' AND            NEW.logdate < DATE '2006-06-01' ) THEN        INSERT INTO measurement_y2006m05 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-06-01' AND            NEW.logdate < DATE '2006-07-01' ) THEN        INSERT INTO measurement_y2006m06 VALUES (NEW.*); ELSIF ( NEW.logdate >= DATE '2006-07-01' AND            NEW.logdate < DATE '2006-08-01' ) THEN        INSERT INTO measurement_y2006m07 VALUES (NEW.*);    ELSIF ( NEW.logdate >= DATE '2006-08-01' AND            NEW.logdate < DATE '2006-09-01' ) THEN        INSERT INTO measurement_y2006m08 VALUES (NEW.*);       ELSIF ( NEW.logdate >= DATE '2006-09-01' AND            NEW.logdate < DATE '2006-10-01' ) THEN        INSERT INTO measurement_y2006m09 VALUES (NEW.*);    ELSIF ( NEW.logdate >= DATE '2006-10-01' AND            NEW.logdate < DATE '2006-11-01' ) THEN        INSERT INTO measurement_y2006m10 VALUES (NEW.*);    ELSIF ( NEW.logdate >= DATE '2006-11-01' AND            NEW.logdate < DATE '2006-12-01' ) THEN        INSERT INTO measurement_y2006m11 VALUES (NEW.*);    ELSIF ( NEW.logdate >= DATE '2006-12-01' AND            NEW.logdate < DATE '2007-01-01' ) THEN        INSERT INTO measurement_y2006m12 VALUES (NEW.*);    ELSE        RAISE EXCEPTION 'Date out of range.  Fix the measurement_insert_trigger() function!';    END IF;    RETURN NULL;END;$$LANGUAGE plpgsql; 建立触发器:
CREATE TRIGGER insert_measurement_trigger    BEFORE INSERT ON measurement    FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger(); 测试下:
mydb=> \d measurement    Table "sch2.measurement"  Column   |  Type   | Modifiers -----------+---------+----------- city_id   | integer | not null logdate   | date    | not null peaktemp  | integer |  unitsales | integer | Triggers:    insert_measurement_trigger BEFORE INSERT ON measurement FOR EACH ROW EXECUTE PROCEDURE measurement_insert_trigger()mydb=> insert into measurement values (1,date '2006-02-10',1,1);INSERT 0 0mydb=> insert into measurement values (1,date '2006-03-10',1,1);INSERT 0 0mydb=> insert into measurement values (1,date '2006-04-10',1,1);INSERT 0 0mydb=> select * from measurement; city_id |  logdate   | peaktemp | unitsales ---------+------------+----------+-----------       1 | 2006-02-10 |        1 |         1       1 | 2006-03-10 |        1 |         1       1 | 2006-04-10 |        1 |         1(3 rows)mydb=> select * from measurement_y2006m03; city_id |  logdate   | peaktemp | unitsales ---------+------------+----------+-----------       1 | 2006-03-10 |        1 |         1(1 row) 
mydb=> select * from measurement; city_id |  logdate   | peaktemp | unitsales ---------+------------+----------+-----------       1 | 2006-02-10 |        1 |         1       1 | 2006-03-10 |        1 |         1       1 | 2006-04-10 |        1 |         1(3 rows)mydb=>  删除一个分区
DROP TABLE measurement_y2006m02;将一个表从分区表里删除,但是保留表,使其继续可以被使用:
ALTER TABLE measurement_y2006m02 NO INHERIT measurement; 
添加分区的最快方式:
CREATE TABLE measurement_y2006m02  (LIKE measurement INCLUDING DEFAULTS INCLUDING CONSTRAINTS);ALTER TABLE measurement_y2006m02 ADD CONSTRAINT y2006m02   CHECK ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' );--从文件中拷贝数据\copy measurement_y2006m02 from '/home/postgres/measurement_y2006m02'ALTER TABLE measurement_y2008m02 INHERIT measurement; 
5.分区排除和查询优化。默认情况下,是不会开启分区排除的,索引针对一个基于分区键条件的检索,要扫描所有的分区:
 
mydb=> SET constraint_exclusion = off;SETmydb=> EXPLAIN SELECT count(*) FROM measurement WHERE logdate <= DATE '2006-02-01';                                          QUERY PLAN                                           ----------------------------------------------------------------------------------------------- Aggregate  (cost=436.80..436.81 rows=1 width=0)   ->  Append  (cost=0.00..417.62 rows=7670 width=0)         ->  Seq Scan on measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m01 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m02 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m03 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m04 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m05 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m06 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m07 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m08 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m09 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m10 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m11 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m12 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)(28 rows) 
开启:
mydb=> SET constraint_exclusion = on;SETmydb=> EXPLAIN SELECT count(*) FROM measurement WHERE logdate <= DATE '2006-02-01';                                          QUERY PLAN                                           ----------------------------------------------------------------------------------------------- Aggregate  (cost=100.80..100.81 rows=1 width=0)   ->  Append  (cost=0.00..96.38 rows=1770 width=0)         ->  Seq Scan on measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m01 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)         ->  Seq Scan on measurement_y2006m02 measurement  (cost=0.00..32.12 rows=590 width=0)               Filter: (logdate <= '2006-02-01'::date)(8 rows) 
还可以通过建立规则的方式来决定向哪个分区插入数据:
CREATE RULE measurement_insert_y2006m02 ASON INSERT TO measurement WHERE    ( logdate >= DATE '2006-02-01' AND logdate < DATE '2006-03-01' )DO INSTEAD    INSERT INTO measurement_y2006m02 VALUES (NEW.*);... 
此外:
VACUUM or ANALYZE  measurement 只会对父表起作用,要想分析表,需要一次分析每个分区表。
 
 熟悉ORACLE的同志应该觉得PG的分区表比ORACLE是要逊色很多吧。
 
 
您需要登录后才可以回帖 登录 | 立即注册 新浪微博账号登陆

本版积分规则

快速回复 返回顶部 返回列表