Table Partitions / Partial Indexes

Поиск
Список
Период
Сортировка
От Mike C
Тема Table Partitions / Partial Indexes
Дата
Msg-id bd0eabd0512111807q6b54115q2477c7c2a37f1c1a@mail.gmail.com
обсуждение исходный текст
Ответы Re: Table Partitions / Partial Indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Table Partitions / Partial Indexes  (Simon Riggs <simon@2ndquadrant.com>)
Список pgsql-performance
Hello,

I've got a table with ~60 Million rows and am having performance problems querying it. Disks are setup as 4x10K SCSI 76GB, RAID 1+0. The table is being inserted into multiple times every second of the day, with no updates and every 2nd day we delete 1/60th of the data (as it becomes old). Vacuum analyze is scheduled to run 3 times a day.

Query:

select sum(TOTAL_FROM) as TOTAL_IN, sum(TOTAL_TO) as TOTAL_OUT, SOURCE_MAC from PC_TRAFFIC where FK_DEVICE = 996 and TRAFFIC_DATE >= '2005-10-14 00:00:00' and TRAFFIC_DATE <= '2005-11-13 23:59:59' group by SOURCE_MAC order by 1 desc

Table:

CREATE TABLE PC_TRAFFIC (
     PK_PC_TRAFFIC  INTEGER NOT NULL,
     TRAFFIC_DATE   TIMESTAMP NOT NULL,
     SOURCE_MAC     CHAR(20) NOT NULL,
     DEST_IP        CHAR(15),
     DEST_PORT      INTEGER,
     TOTAL_TO       DOUBLE PRECISION,
     TOTAL_FROM     DOUBLE PRECISION,
     FK_DEVICE      SMALLINT,
     PROTOCOL_TYPE  SMALLINT
);

CREATE INDEX pc_traffic_pkidx ON pc_traffic (pk_pc_traffic);
CREATE INDEX pc_traffic_idx3 ON pc_traffic (fk_device, traffic_date);

Plan:
Sort  (cost=76650.58..76650.58 rows=2 width=40)
   Sort Key: sum(total_from)
   ->  HashAggregate  (cost=76650.54..76650.57 rows=2 width=40)
         ->  Bitmap Heap Scan on pc_traffic  (cost=534.64..76327.03 rows=43134 width=40)
               Recheck Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01 00:00:00'::timestamp without time zone) AND (traffic_date <= '2005-10-31 23:59:59'::timestamp without time zone))
               ->  Bitmap Index Scan on pc_traffic_idx3  (cost=0.00..534.64 rows=43134 width=0)
                     Index Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01 00:00:00'::timestamp without time zone) AND (traffic_date <= '2005-10-31 23:59:59'::timestamp without time zone))
(7 rows)

CLUSTER on PC_TRAFFIC_IDX3 gives me significantly improved performance:

Sort  (cost=39886.65..39886.66 rows=2 width=40)
   Sort Key: sum(total_from)
   ->  HashAggregate  (cost=39886.61..39886.64 rows=2 width=40)
         ->  Index Scan using pc_traffic_idx3 on pc_traffic  (cost=0.00..39551.26 rows=44714 width=40)
               Index Cond: ((fk_device = 996) AND (traffic_date >= '2005-10-01 00:00:00'::timestamp without time zone) AND (traffic_date <= '2005-10-31 23:59:59'::timestamp without time zone))
(5 rows)

However the clustering is only effective on the first shot. Because of the constant usage of the table we can't perform a vacuum full nor any exclusive lock function.

Would table partitioning/partial indexes help much? Partitioning on date range doesn't make much sense for this setup, where a typical 1-month query spans both tables (as the billing month for the customer might start midway through a calendar month).

Noting that the index scan was quicker than the bitmap, I'm trying to make the indexes smaller/more likely to index scan. I have tried partitioning against fk_device, with 10 child tables. I'm using fk_device % 10 = 1, fk_device % 10 = 2, fk_device % 10 = 3, etc... as the check constraint.

CREATE TABLE pc_traffic_0 (CHECK(FK_DEVICE % 10 = 0)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_1 (CHECK(FK_DEVICE % 10 = 1)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_2 (CHECK(FK_DEVICE % 10 = 2)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_3 (CHECK(FK_DEVICE % 10 = 3)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_4 (CHECK(FK_DEVICE % 10 = 4)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_5 (CHECK(FK_DEVICE % 10 = 5)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_6 (CHECK(FK_DEVICE % 10 = 6)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_7 (CHECK(FK_DEVICE % 10 = 7)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_8 (CHECK(FK_DEVICE % 10 = 8)) INHERITS (pc_traffic);
CREATE TABLE pc_traffic_9 (CHECK(FK_DEVICE % 10 = 9)) INHERITS (pc_traffic);

... indexes now look like:
CREATE INDEX pc_traffic_6_idx3 ON pc_traffic_6 (fk_device, traffic_date);

To take advantage of the query my SQL now has to include the mod operation (so the query planner picks up the correct child tables):

select sum(TOTAL_FROM) as TOTAL_IN, sum(TOTAL_TO) as TOTAL_OUT, SOURCE_MAC from PC_TRAFFIC where FK_DEVICE = 996 and FK_DEVICE % 10 = 6 and TRAFFIC_DATE >= '2005-10-14 00:00:00' and TRAFFIC_DATE <= '2005-11-13 23:59:59' group by SOURCE_MAC order by 1 desc

Sorry I would show the plan but I'm rebuilding the dev database atm. It was faster though and did pick up the correct child table. It was also a bitmap scan on the index IIRC.

Would I be better off creating many partial indexes instead of multiple tables AND multiple indexes?
Am I using a horrid method for partitioning the data? (% 10)
Should there be that big of an improvement for multiple tables given that all the data is still stored on the same filesystem?
Any advice on table splitting much appreciated.

Cheers,

Mike C.

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Luke Lonergan"
Дата:
Сообщение: Re: Should Oracle outperform PostgreSQL on a complex
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Table Partitions / Partial Indexes