Обсуждение: Table partitioning

Поиск
Список
Период
Сортировка

Table partitioning

От
Herouth Maoz
Дата:
I have a rather large and slow table in Postgresql 9.1. I'm thinking of partitioning it by months, but I don't like the
ideaof creating and dropping tables all the time. 

I'm thinking of simply creating 12 child tables, in which the check condition will be, for example, date_part('month'',
time_arrived)= 1 (or 2 for February, 3 for March etc.). 

I'll just be deleting records rather than dropping tables, the same way I do in my current setup. I delete a week's
worthevery time. 

So, I have two questions.

First, is constraint exclusion going to work with that kind of condition? I mean, if my WHERE clause says something
like"time_arrived >= '2013-04-05' and time_arrived < '2013-04-17'", will it be able to tell that
date_part("month",time_arrived)for all the records is 4, and therefore avoid selecting from any partitions other than
theapril one? 

Second, when I delete (not drop!) from the mother table, are records deleted automatically from the child tables or do
Ineed to create rules/triggers for that? 


TIA,
Herouth

Re: Table partitioning

От
Elliot
Дата:
On 2013-10-28 12:27, Herouth Maoz wrote:
> I have a rather large and slow table in Postgresql 9.1. I'm thinking of partitioning it by months, but I don't like
theidea of creating and dropping tables all the time. 
>
> I'm thinking of simply creating 12 child tables, in which the check condition will be, for example,
date_part('month'',time_arrived) = 1 (or 2 for February, 3 for March etc.). 
>
> I'll just be deleting records rather than dropping tables, the same way I do in my current setup. I delete a week's
worthevery time. 
>
> So, I have two questions.
>
> First, is constraint exclusion going to work with that kind of condition? I mean, if my WHERE clause says something
like"time_arrived >= '2013-04-05' and time_arrived < '2013-04-17'", will it be able to tell that
date_part("month",time_arrived)for all the records is 4, and therefore avoid selecting from any partitions other than
theapril one? 
>
> Second, when I delete (not drop!) from the mother table, are records deleted automatically from the child tables or
doI need to create rules/triggers for that? 
>
>
> TIA,
> Herouth
>
1. No - you'd need a condition like "where date_part("month",
time_arrived) = 1" in your select statements in order for the constraint
exclusion to kick in
2. Yes - there is no need to create rules or triggers for deletes on the
parent table (check out the syntax for "delete from <table>" versus
"delete from only <table>)



Re: Table partitioning

От
Herouth Maoz
Дата:
Thanks. Assuming there is an index on the time_arrived column, and that there are about 10.5 million records in each
childtable, how bad will performance be if the query actually accesses all the 12 tables? Will it be as bad as using
thefull table? 

On 28/10/2013, at 18:31, Elliot wrote:

> On 2013-10-28 12:27, Herouth Maoz wrote:
>> I have a rather large and slow table in Postgresql 9.1. I'm thinking of partitioning it by months, but I don't like
theidea of creating and dropping tables all the time. 
>>
>> I'm thinking of simply creating 12 child tables, in which the check condition will be, for example,
date_part('month'',time_arrived) = 1 (or 2 for February, 3 for March etc.). 
>>
>> I'll just be deleting records rather than dropping tables, the same way I do in my current setup. I delete a week's
worthevery time. 
>>
>> So, I have two questions.
>>
>> First, is constraint exclusion going to work with that kind of condition? I mean, if my WHERE clause says something
like"time_arrived >= '2013-04-05' and time_arrived < '2013-04-17'", will it be able to tell that
date_part("month",time_arrived)for all the records is 4, and therefore avoid selecting from any partitions other than
theapril one? 
>>
>> Second, when I delete (not drop!) from the mother table, are records deleted automatically from the child tables or
doI need to create rules/triggers for that? 
>>
>>
>> TIA,
>> Herouth
>>
> 1. No - you'd need a condition like "where date_part("month", time_arrived) = 1" in your select statements in order
forthe constraint exclusion to kick in 
> 2. Yes - there is no need to create rules or triggers for deletes on the parent table (check out the syntax for
"deletefrom <table>" versus "delete from only <table>) 
>




Re: Table partitioning

От
Elliot
Дата:
On 2013-10-28 12:47, Herouth Maoz wrote:
> Thanks. Assuming there is an index on the time_arrived column, and that there are about 10.5 million records in each
childtable, how bad will performance be if the query actually accesses all the 12 tables? Will it be as bad as using
thefull table? 
>
>
In the case where you don't add a predicate to your select statement
that matches your constraint checks, meaning your statements are
visiting every child table, you're not going to experience any
performance gains over leaving everything in one table (plus you're
adding on extra maintenance overhead and likely degrading query
performance, even if only by a small amount).

What are your concerns with creating/dropping tables? You may have even
better success with creating a child table for each week. Fifty-two
extra tables is extra maintenance but the performance boost you could
get by dividing out your data is probably significant, and that kind of
maintenance should be automated anyway.



Re: Table partitioning

От
Steve Crawford
Дата:
On 10/28/2013 09:27 AM, Herouth Maoz wrote:
> I have a rather large and slow table in Postgresql 9.1. I'm thinking of partitioning it by months, but I don't like
theidea of creating and dropping tables all the time. 
What is slow about it? Inserting? Selecting? Deleting? Partitioning can
assist with some issues but does no good if what you really need is an
index or better query. Partitioning shines as an option to manage
archiving/purging of time-series data but only if you work with it, not
against it.

What don't you like about creating and dropping tables? You can easily
automate it: https://github.com/keithf4/pg_partman

>
> I'm thinking of simply creating 12 child tables, in which the check condition will be, for example,
date_part('month'',time_arrived) = 1 (or 2 for February, 3 for March etc.). 
>
> I'll just be deleting records rather than dropping tables, the same way I do in my current setup. I delete a week's
worthevery time. 
You are missing out on one of the best aspects of partitioning. Compared
to dropping or truncating a child table, deleting is far slower and
causes table bloat which may impact future queries.
>
> Second, when I delete (not drop!) from the mother table, are records deleted automatically from the child tables or
doI need to create rules/triggers for that? 
>

Yes unless you use the keyword "ONLY": "If specified, deletes rows from
the named table only. When not specified, any tables inheriting from the
named table are also processed."

Cheers,
Steve



Re: Table partitioning

От
"Herouth Maoz"
Дата:

Everything is slow about it - selects, deletes and inserts, that is. I don't do updates on that table.

The inserts and deletes are less of an issue because they are done once a week. Of course it would be nicer if they were faster, but that's less of an issue. The real issue is with self-joins, which are a common query. But I have indexes on the relevant fields: the connecting field (the one used for the self-join) as well as the date field. The queries are mostly of the format SELECT ... FROM tablename t1 left join tablename t2 on t1.indexed_field = t2.indexed_field and t2.boolean_field where t1.date_field >= 'date1' and t1.date_field < 'date2' and not t1.boolean; This can take about 15 minutes to run, depending on the date range. And it doesn't matter much if I put a date range within the left join, either.

As I mentioned in another message, I don't like the idea that my script may fail, and then inserts into the table might start to fail as well. I'm not always available to run it manually within a set time. And writing an automation that is different than all the other tables I maintain in that database makes for maintenance spaghetti. I also don't like running automated DDL commands. They don't play well with backups.


-----הודעה מקורית-----
מאת: Steve Crawford [mailto:scrawford@pinpointresearch.com]
נשלח: ב 28/10/2013 22:31
אל: Herouth Maoz; pgsql-general@postgresql.org
נושא: Re: [GENERAL] Table partitioning

On 10/28/2013 09:27 AM, Herouth Maoz wrote:
> I have a rather large and slow table in Postgresql 9.1. I'm thinking of partitioning it by months, but I don't like the idea of creating and dropping tables all the time.
What is slow about it? Inserting? Selecting? Deleting? Partitioning can
assist with some issues but does no good if what you really need is an
index or better query. Partitioning shines as an option to manage
archiving/purging of time-series data but only if you work with it, not
against it.

What don't you like about creating and dropping tables? You can easily
automate it: https://github.com/keithf4/pg_partman

>
> I'm thinking of simply creating 12 child tables, in which the check condition will be, for example, date_part('month'', time_arrived) = 1 (or 2 for February, 3 for March etc.).
>
> I'll just be deleting records rather than dropping tables, the same way I do in my current setup. I delete a week's worth every time.
You are missing out on one of the best aspects of partitioning. Compared
to dropping or truncating a child table, deleting is far slower and
causes table bloat which may impact future queries.
>
> Second, when I delete (not drop!) from the mother table, are records deleted automatically from the child tables or do I need to create rules/triggers for that?
>

Yes unless you use the keyword "ONLY": "If specified, deletes rows from
the named table only. When not specified, any tables inheriting from the
named table are also processed."

Cheers,
Steve