Обсуждение: [GENERAL] Table partition - parent table use
Hello everybody! I have searched quite a bit, but haven't found a setup like the ours, so I decided to ask if I'm missing something: We have a database with data being inserted for almost 10 years and no policy defined to get rid of old records, even though we mostly use only the current and last year's data. Some etl processes run on older data from time to time. After this time, some tables have grown to a point where even their indexes are bigger than the server's available RAM. Because some queries were getting slower, despite optimizations, we started experimenting with table partitioning. The idea was creating 3 partitions for each table of interest: the "hot", the "warm" and the "cold". The first would have the last 2 years. The second, data from 3 to 5 years and the third, all the rest. Our current production server is 9.4 and the main application uses hybernate as persistence mechanism to access the database. Some of the main points of our environment are: 1) There are a lot of inserts (millions, some days) but no updates or deletes; 2) There are a lot of foreign keys between most tables; 3) There are some very long select statements; 4) We can't touch any point of the application. Taking into account those, the "usual" partitioning path would bring more problems than solutions because: 1) The issue between trigger redirection and hybernate; The foreign keys dont propagate down the hierarchy Then one thing came to mind: Why not to use the "parent" table as the hot one, without doing any redirection at all? That way we could: 1) Keep the referential integrity of the current model untouched; 2) Dismiss the trigger redirection along with the hybernate issue; 3) Have a much smaller dataset to use in most of our queries; 4) Have all the historic data when needed Sure, we would have to move data between partitions and update the constraints of the tables every year but, it doesn't look like that much of a burden, since there will be only a few tables. The question is: is there something wrong that I am not seeing here? I have run some basic tests and all seemed to work as expected, but since I couldn't find any use of the parent table besides being the head of the hierarchy, I am affraid of doing something that could stop because it wasnt designed to work like that to begin with... Thanks for your time and sorry for such a big bunch of words Luiz Hugo Ronqui -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Hi Luiz: On Thu, Sep 14, 2017 at 11:06 PM, Luiz Hugo Ronqui <lronqui@tce.sp.gov.br> wrote: ... > We have a database with data being inserted for almost 10 years and no > policy defined to get rid of old records, even though we mostly use only the > current and last year's data. Some etl processes run on older data from time > to time. > After this time, some tables have grown to a point where even their indexes > are bigger than the server's available RAM. Because some queries were > getting slower, despite optimizations, we started experimenting with table > partitioning. > The idea was creating 3 partitions for each table of interest: the "hot", > the "warm" and the "cold". The first would have the last 2 years. The > second, data from 3 to 5 years and the third, all the rest. I would consider using more than one cold partition, and maybe moving them AND warm to a different schema. Maybe 5 years in each, something like cold-2000-2009, cold-2010-2019. You can update the constraints adequately, but the thing is you periodically update your constraints in the hot, warm and last cold, moving data among them appropiately, then do a really good backup of warm and colds and you can forget about them in daily backups, and also if you want to drop "stale" in the future, or un-inherit them to speed up queries, it is easier to do. ... > Then one thing came to mind: Why not to use the "parent" table as the hot > one, without doing any redirection at all? That way we could: > 1) Keep the referential integrity of the current model untouched; > 2) Dismiss the trigger redirection along with the hybernate issue; > 3) Have a much smaller dataset to use in most of our queries; > 4) Have all the historic data when needed You can do it, but remember parent normally does not have constraints, so it is always scanned ( fastly as it is known empty ). Also select from only parent is useful to detect when you are missing partitions, won't work in this case. But you can test it. ... > I have run some basic tests and all seemed to work as expected, but since I > couldn't find any use of the parent table besides being the head of the > hierarchy, I am affraid of doing something that could stop because it wasnt > designed to work like that to begin with... Seems fine to me. Never used that because y normally use special insertion programs for my partitiones tables ( my usage allows thats ), so I insert directly in the appropiate partition always ( so I just use inheritance, no triggers or rules ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Thanks for your tips! Our usage allows us to insert all rows into the hot partition, since its a rare event to receive data that otherwise wouldhave to be redirected to a "colder" partition. This way, its not a problem that the parent table would always be searched. In fact it would guarantee that these bits, received"out of time", would get accounted. The number of partitions, especially the "cold" ones, is not a hard limit... we can expand it with time. The idea includes schemas and tablespaces, along with its management benefits, specifically for these partitioned data.One of our current problems is exactly the time it takes for backup and restore operations. I did not mentioned it beforebecause of the size of the original message. Luiz Hugo Ronqui -----Mensagem original----- De: pgsql-general-owner+m233282=lronqui=tce.sp.gov.br@postgresql.org [mailto:pgsql-general-owner+m233282=lronqui=tce.sp.gov.br@postgresql.org]Em nome de Francisco Olarte Enviada em: sexta-feira, 15 de setembro de 2017 08:37 Para: Luiz Hugo Ronqui Cc: pgsql-general@postgresql.org Assunto: Re: [GENERAL] Table partition - parent table use Hi Luiz: On Thu, Sep 14, 2017 at 11:06 PM, Luiz Hugo Ronqui <lronqui@tce.sp.gov.br> wrote: ... > We have a database with data being inserted for almost 10 years and no > policy defined to get rid of old records, even though we mostly use only the > current and last year's data. Some etl processes run on older data from time > to time. > After this time, some tables have grown to a point where even their indexes > are bigger than the server's available RAM. Because some queries were > getting slower, despite optimizations, we started experimenting with table > partitioning. > The idea was creating 3 partitions for each table of interest: the "hot", > the "warm" and the "cold". The first would have the last 2 years. The > second, data from 3 to 5 years and the third, all the rest. I would consider using more than one cold partition, and maybe moving them AND warm to a different schema. Maybe 5 years in each, something like cold-2000-2009, cold-2010-2019. You can update the constraints adequately, but the thing is you periodically update your constraints in the hot, warm and last cold, moving data among them appropiately, then do a really good backup of warm and colds and you can forget about them in daily backups, and also if you want to drop "stale" in the future, or un-inherit them to speed up queries, it is easier to do. ... > Then one thing came to mind: Why not to use the "parent" table as the hot > one, without doing any redirection at all? That way we could: > 1) Keep the referential integrity of the current model untouched; > 2) Dismiss the trigger redirection along with the hybernate issue; > 3) Have a much smaller dataset to use in most of our queries; > 4) Have all the historic data when needed You can do it, but remember parent normally does not have constraints, so it is always scanned ( fastly as it is known empty ). Also select from only parent is useful to detect when you are missing partitions, won't work in this case. But you can test it. ... > I have run some basic tests and all seemed to work as expected, but since I > couldn't find any use of the parent table besides being the head of the > hierarchy, I am affraid of doing something that could stop because it wasnt > designed to work like that to begin with... Seems fine to me. Never used that because y normally use special insertion programs for my partitiones tables ( my usage allows thats ), so I insert directly in the appropiate partition always ( so I just use inheritance, no triggers or rules ). Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Luiz: 1st thing, do not top-quote. It's hard to read and I, personally, consider it insulting ( not the first time it's done, and for obvious reasons ). On Fri, Sep 15, 2017 at 4:24 PM, Luiz Hugo Ronqui <lronqui@tce.sp.gov.br> wrote: > Our usage allows us to insert all rows into the hot partition, since its a rare event to receive data that otherwise wouldhave to be redirected to a "colder" partition. > This way, its not a problem that the parent table would always be searched. In fact it would guarantee that these bits,received "out of time", would get accounted. The problem of always being searched is not for recent rows, but for historic. Imagine hot=2016-7, warm=2013-5 and cold=rest If hot=parent and you make a query for 2014 data it's going to search hot and warm, not just warm. If hot!=parent it is going to search parent and warm ( and use a seq-scan in parent in the normal case, as stats show it as empty , and it will be if things are going well ). > The number of partitions, especially the "cold" ones, is not a hard limit... we can expand it with time. I know, my recomendation was to made them in such a way that once a row lands in an historic partition it never moves if you use more than one ( i.e., use things as cold-200x, cold-201x, not cold-prev-decade, cold-two-decades-ago ) > The idea includes schemas and tablespaces, along with its management benefits, specifically for these partitioned data.One of our current problems is exactly the time it takes for backup and restore operations. I did not mentioned it beforebecause of the size of the original message. We normally do the schema trick, and as 90% of data is in historic schema, we skip most of it. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general