Обсуждение: [GENERAL] Table partition - parent table use

Поиск
Список
Период
Сортировка

[GENERAL] Table partition - parent table use

От
"Luiz Hugo Ronqui"
Дата:
Hello everybody!

I have searched quite a bit, but haven't found a setup like the ours, so I
decided to ask if I'm missing something:

We have a database with data being inserted for almost 10 years and no
policy defined to get rid of old records, even though we mostly use only the
current and last year's data. Some etl processes run on older data from time
to time.

After this time, some tables have grown to a point where even their indexes
are bigger than the server's available RAM. Because some queries were
getting slower, despite optimizations, we started experimenting with table
partitioning. 
The idea was creating 3 partitions for each table of interest: the "hot",
the "warm" and the "cold". The first would have the last 2 years. The
second, data from 3 to 5 years and the third, all the rest.

Our current production server is 9.4 and the main application uses hybernate
as persistence mechanism to access the database. Some of the main points of
our environment are:
1)    There are a lot of inserts (millions, some days) but no updates or
deletes;
2)    There are a lot of foreign keys between most tables;
3)    There are some very long select statements;
4)    We can't touch any point of the application.

Taking into account those, the "usual" partitioning path would bring more
problems than solutions because:
1)    The issue between trigger redirection and hybernate;    The foreign keys dont propagate down the hierarchy    
Then one thing came to mind: Why not to use the "parent" table as the hot
one, without doing any redirection at all? That way we could:
1)    Keep the referential integrity of the current model untouched;
2)    Dismiss the trigger redirection along with the hybernate issue;
3)    Have a much smaller dataset to use in most of our queries;
4)    Have all the historic data when needed 

Sure, we would have to move data between partitions and update the
constraints of the tables every year but, it doesn't look like that much of
a burden, since there will be only a few tables. 

The question is: is there something wrong that I am not seeing here?

I have run some basic tests and all seemed to work as expected, but since I
couldn't find any use of  the parent table besides  being the head of the
hierarchy, I am affraid of doing something that could stop  because it wasnt
designed to work like that to begin with...

Thanks for your time and sorry for such a big bunch of words


Luiz Hugo Ronqui


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Table partition - parent table use

От
Francisco Olarte
Дата:
Hi Luiz:

On Thu, Sep 14, 2017 at 11:06 PM, Luiz Hugo Ronqui
<lronqui@tce.sp.gov.br> wrote:
...
> We have a database with data being inserted for almost 10 years and no
> policy defined to get rid of old records, even though we mostly use only the
> current and last year's data. Some etl processes run on older data from time
> to time.
> After this time, some tables have grown to a point where even their indexes
> are bigger than the server's available RAM. Because some queries were
> getting slower, despite optimizations, we started experimenting with table
> partitioning.
> The idea was creating 3 partitions for each table of interest: the "hot",
> the "warm" and the "cold". The first would have the last 2 years. The
> second, data from 3 to 5 years and the third, all the rest.

I would consider using more than one cold partition, and maybe moving
them AND warm to a different schema. Maybe 5 years in each, something
like cold-2000-2009, cold-2010-2019. You can update the constraints
adequately, but the thing is you periodically update your constraints
in the hot, warm and last cold, moving data among them appropiately,
then do a really good backup of warm and colds and you can forget
about them in daily backups, and also if you want to drop "stale" in
the future, or un-inherit them to speed up queries, it is easier to
do.

...
> Then one thing came to mind: Why not to use the "parent" table as the hot
> one, without doing any redirection at all? That way we could:
> 1)      Keep the referential integrity of the current model untouched;
> 2)      Dismiss the trigger redirection along with the hybernate issue;
> 3)      Have a much smaller dataset to use in most of our queries;
> 4)      Have all the historic data when needed

You can do it, but remember parent normally does not have constraints,
so it is always scanned ( fastly as it is known empty ). Also select
from only parent is useful to detect when you are missing partitions,
won't work in this case. But you can test it.

...
> I have run some basic tests and all seemed to work as expected, but since I
> couldn't find any use of  the parent table besides  being the head of the
> hierarchy, I am affraid of doing something that could stop  because it wasnt
> designed to work like that to begin with...

Seems fine to me. Never used that because y normally use special
insertion programs for my partitiones tables ( my usage allows thats
), so I insert directly in the appropiate partition always ( so I just
use inheritance, no triggers or rules ).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

RES: [GENERAL] Table partition - parent table use

От
"Luiz Hugo Ronqui"
Дата:
Thanks for your tips!


Our usage allows us to insert all rows into the hot partition, since its a rare event to receive data that otherwise
wouldhave to be redirected to a "colder" partition. 

This way, its not a problem that the parent table would always be searched. In fact it would guarantee that these bits,
received"out of time", would get accounted.  

The number of partitions, especially the "cold" ones, is not a hard limit... we can expand it with time.

The idea includes schemas and tablespaces, along with its management benefits,  specifically for these partitioned
data.One of our current problems is exactly the time it takes for backup and restore operations. I did not mentioned it
beforebecause of the size of the original message. 


Luiz Hugo Ronqui


-----Mensagem original-----
De: pgsql-general-owner+m233282=lronqui=tce.sp.gov.br@postgresql.org
[mailto:pgsql-general-owner+m233282=lronqui=tce.sp.gov.br@postgresql.org]Em nome de Francisco Olarte 
Enviada em: sexta-feira, 15 de setembro de 2017 08:37
Para: Luiz Hugo Ronqui
Cc: pgsql-general@postgresql.org
Assunto: Re: [GENERAL] Table partition - parent table use

Hi Luiz:

On Thu, Sep 14, 2017 at 11:06 PM, Luiz Hugo Ronqui
<lronqui@tce.sp.gov.br> wrote:
...
> We have a database with data being inserted for almost 10 years and no
> policy defined to get rid of old records, even though we mostly use only the
> current and last year's data. Some etl processes run on older data from time
> to time.
> After this time, some tables have grown to a point where even their indexes
> are bigger than the server's available RAM. Because some queries were
> getting slower, despite optimizations, we started experimenting with table
> partitioning.
> The idea was creating 3 partitions for each table of interest: the "hot",
> the "warm" and the "cold". The first would have the last 2 years. The
> second, data from 3 to 5 years and the third, all the rest.

I would consider using more than one cold partition, and maybe moving
them AND warm to a different schema. Maybe 5 years in each, something
like cold-2000-2009, cold-2010-2019. You can update the constraints
adequately, but the thing is you periodically update your constraints
in the hot, warm and last cold, moving data among them appropiately,
then do a really good backup of warm and colds and you can forget
about them in daily backups, and also if you want to drop "stale" in
the future, or un-inherit them to speed up queries, it is easier to
do.

...
> Then one thing came to mind: Why not to use the "parent" table as the hot
> one, without doing any redirection at all? That way we could:
> 1)      Keep the referential integrity of the current model untouched;
> 2)      Dismiss the trigger redirection along with the hybernate issue;
> 3)      Have a much smaller dataset to use in most of our queries;
> 4)      Have all the historic data when needed

You can do it, but remember parent normally does not have constraints,
so it is always scanned ( fastly as it is known empty ). Also select
from only parent is useful to detect when you are missing partitions,
won't work in this case. But you can test it.

...
> I have run some basic tests and all seemed to work as expected, but since I
> couldn't find any use of  the parent table besides  being the head of the
> hierarchy, I am affraid of doing something that could stop  because it wasnt
> designed to work like that to begin with...

Seems fine to me. Never used that because y normally use special
insertion programs for my partitiones tables ( my usage allows thats
), so I insert directly in the appropiate partition always ( so I just
use inheritance, no triggers or rules ).

Francisco Olarte.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Table partition - parent table use

От
Francisco Olarte
Дата:
Luiz:

1st thing, do not top-quote. It's hard to read and I, personally,
consider it insulting ( not the first time it's done, and for obvious
reasons ).

On Fri, Sep 15, 2017 at 4:24 PM, Luiz Hugo Ronqui <lronqui@tce.sp.gov.br> wrote:
> Our usage allows us to insert all rows into the hot partition, since its a rare event to receive data that otherwise
wouldhave to be redirected to a "colder" partition. 
> This way, its not a problem that the parent table would always be searched. In fact it would guarantee that these
bits,received "out of time", would get accounted. 

The problem of always being searched is not for recent rows, but for
historic. Imagine hot=2016-7, warm=2013-5 and cold=rest

If hot=parent and you make a query for 2014 data it's going to search
hot and warm, not just warm. If hot!=parent it is going to search
parent and warm ( and use a seq-scan in parent in the normal case, as
stats show it as empty , and it will be if things are going well ).

> The number of partitions, especially the "cold" ones, is not a hard limit... we can expand it with time.

I know, my recomendation was to made them in such a way that once a
row lands in an historic partition it never moves if you use more than
one ( i.e., use things as cold-200x, cold-201x, not cold-prev-decade,
cold-two-decades-ago )

> The idea includes schemas and tablespaces, along with its management benefits,  specifically for these partitioned
data.One of our current problems is exactly the time it takes for backup and restore operations. I did not mentioned it
beforebecause of the size of the original message. 

We normally do the schema trick, and as 90% of data is in historic
schema, we skip most of it.

Francisco Olarte.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general