Обсуждение: [GENERAL] Question about paritioning

Поиск
Список
Период
Сортировка

[GENERAL] Question about paritioning

От
Alex Samad
Дата:
Hi

I have a large table about 3B rows, that I would like to partition on a column called _received which is  of type timestamp 

I followed this


and came up with 
CREATE OR REPLACE FUNCTION create_partition_and_insert() RETURNS trigger AS
  $BODY$
    DECLARE
      partition_date TEXT;
      partition TEXT;
    BEGIN
      partition_date := to_char(NEW._received,'YYYYMM');
      partition := TG_TABLE_NAME || '_' || partition_date;
      IF NOT EXISTS(SELECT relname FROM pg_class WHERE relname=partition) THEN
        RAISE NOTICE 'A partition has been created %',partition;
        EXECUTE 'CREATE TABLE ' || partition || ' (check ( to_char(_received, '''YYYYMM''') = ''' || partition_date || ''')) INHERITS (' || TG_TABLE_NAME || ');';
      END IF;
      EXECUTE 'INSERT INTO ' || partition || ' SELECT(' || TG_TABLE_NAME || ' ' || quote_literal(NEW) || ').* RETURNING patent_id;';
      RETURN NULL;
    END;
  $BODY$
LANGUAGE plpgsql VOLATILE
COST 100;


Which I believe generates something like this
CREATE TABLE feedintra_201707( check (to_char(_received,'YYYYMM') = '201707' )) INHERITS (XXXXXX);

My concern is the cost of doing to_char on every row.

So 
Is this going to work 
Is this the right way to do it.

I was thinking if I could change the check to be something like 

check ( _recieved >= YYYYMM 1 00:00 and _recieved <= YYYYMM <last day of the month> 23:59:59.999 )

so I am not sure how to make up the above line based on a timestamp

Thanks
Alex










Re: [GENERAL] Question about paritioning

От
John R Pierce
Дата:
On 7/26/2017 10:08 PM, Alex Samad wrote:
> I have a large table about 3B rows, that I would like to partition on
> a column called _received which is  of type timestamp
>

a good goal is to have no more than about 100 partitions max, and
ideally more like 25.

when we partition on time stamp, we typically do it by the week, as
we're doing 6 month data retention.

IIRC, we're using DATE_TRUNC('week', timestamp)::DATE  for use as the
partition label and key.



--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] Question about paritioning

От
Alex Samad
Дата:
So is date_trunc better than to_char ? I'm thinking it probably is 

as for the number of partitions, well we don't plan on deleting anything, but from my reading as long as the queries stay on a small amount of parts that we should be okay.

A

On 27 July 2017 at 15:33, John R Pierce <pierce@hogranch.com> wrote:
On 7/26/2017 10:08 PM, Alex Samad wrote:
I have a large table about 3B rows, that I would like to partition on a column called _received which is  of type timestamp


a good goal is to have no more than about 100 partitions max, and ideally more like 25.

when we partition on time stamp, we typically do it by the week, as we're doing 6 month data retention.

IIRC, we're using DATE_TRUNC('week', timestamp)::DATE  for use as the partition label and key.



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question about paritioning

От
John R Pierce
Дата:
On 7/27/2017 12:43 AM, Alex Samad wrote:
>
> ... as long as the queries stay on a small amount of parts that we
> should be okay.

thats true as long as the planner can restrict the queries to the
correct partition...  but there's lots of queries that end up having to
hit all partitions because the planner can't guess correctly.


--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] Question about paritioning

От
Alex Samad
Дата:
so I have a pgsql function that will create tables based on YYYYMD for x years in advance and inherits of base table.
with this
CHECK (_received >= '2013-01-01 00:00:00+11'::timestamp with time zone AND _received < '2013-02-01 01:00:00+11'::timestamp with time zone)

now for the insert, do I create / update this monthly have had a 2 or 3 level if then check before inserting

or do I create a programatic insert that works out the table name

On 27 July 2017 at 18:36, John R Pierce <pierce@hogranch.com> wrote:
On 7/27/2017 12:43 AM, Alex Samad wrote:

... as long as the queries stay on a small amount of parts that we should be okay.

thats true as long as the planner can restrict the queries to the correct partition...  but there's lots of queries that end up having to hit all partitions because the planner can't guess correctly.



--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general