Обсуждение: nested partitioning

Поиск
Список
Период
Сортировка

nested partitioning

От
"Gabriel E. Sánchez Martínez"
Дата:
Suppose I have a lot of data (several GB worth) to store each day, going
back a few years.  I am considering a nested partition schema, in which
the data for each day is stored in a separate child table, which
inherits from an empty table for each month, which inherits from an
empty table for each year.  For example, if I had data going back three
years, I'd have one master table with the column definitions.  Then I'd
have three year tables inherit from that master table.  Then I'd have
twelve month tables inherit from each year table.  Then I'd have the
appropriate number of day tables inherit from each month table.

MASTER
     2011
         01
             01
             ...
             31
         ...
         12
             ...
     2012
         ...
     2013
         01
         ...
         12
             01
             ...
             31

Check constraints would be used on each day table to ensure that only
data for that year-month-day is stored there (performing a range check
on an integer or timestamp type, for example).  Check constraints would
also be used on each month table to ensure that only data for that
year-month is stored in that month's child tables.  And check
constraints would also be used on each year table to ensure that only
data for that year is stored in that year's child tables.  Each day
table would have its day check constraint, plus inherited year and month
check constraints.

Now to the questions.  Would constraint exclusion work in a tree fashion
to prune tables from a query plan?  In other words, if I query data for
a specific day, would constraint exclusion first exclude all the years
but the relevant one, then exclude all the months of that year but the
relevant one, then exclude all the days of that month but the relevant
one, and finally perform a sequential or index scan of the right
partition?  The idea is that by nesting the partitions in this manner
the number of exclusion checks is reduced.  One can see that with a
decade of data stored, there would be approximately 3652 partitions.
The documentation discourages having this many partitions.

Thanks in advance for advice.

Regards,

Gabriel Sánchez-Martínez
PhD Candidate
Massachusetts Institute of Technology





Re: nested partitioning

От
Shaun Thomas
Дата:
On 09/18/2013 07:45 AM, "Gabriel E. Sánchez Martínez" wrote:

> Now to the questions.  Would constraint exclusion work in a tree
> fashion to prune tables from a query plan?

Constraint exclusion works by checking the constraints themselves. So if
your constraints are based on date, they will always work as expected.
Assuming they're generated, you could easily check down to the day level
by >= desired day, and < next day.

With that in mind, if you queried a specific day, the query would remove
every partition but the relevant day (or range of days).

> One can see that with a decade of data stored, there would be
> approximately 3652 partitions. The documentation discourages having
> this many partitions.

And you shouldn't. In our partitions, we try to keep it around one
partition per month. I've recently written a library that eschews
triggers in favor of a nightly movement job to redistribute data, and it
can handle partition granularity on a sliding scale. The default is one
month, but my tests show it works properly down to 1 day. Still, the
lowest I'd use it is probably 1 week. That would still drastically
reduce your amount of estimated partitions.

However you decide to do it, don't optimize prematurely. Monthly
partitions are enough for us and our 130GB table with 250M rows, and
that's for a single year.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: nested partitioning

От
"Gabriel E. Sánchez Martínez"
Дата:
On 09/18/2013 11:34 AM, Shaun Thomas wrote:
> On 09/18/2013 07:45 AM, "Gabriel E. Sánchez Martínez" wrote:
>
>> Now to the questions.  Would constraint exclusion work in a tree
>> fashion to prune tables from a query plan?
>
> Constraint exclusion works by checking the constraints themselves. So
> if your constraints are based on date, they will always work as
> expected. Assuming they're generated, you could easily check down to
> the day level by >= desired day, and < next day.
>
> With that in mind, if you queried a specific day, the query would
> remove every partition but the relevant day (or range of days).
I understand that, but I want to know whether all the child day tables
would be checked for exclusion, or if instead entire groups of tables
can be excluded by checking the year and month first, and then checking
the days only for the queried year-month.  For 10 years of data, that is
a difference between checking 10 years + 12 months + 30 days = 52
partitions vs. checking all 3652 partitions.

Example: Querying for data of 2012-01-01.

Would a query plan prune tables in a tree fashion, as follows?

#
Check partitions 2000, 2001, 2002, ... , 2013.  14 checks.  All years
but 2012 are excluded.  Approximately 4748 partitions excluded with only
14 constraint exclusion checks.

Check partitions 2012-01, 2012-02, ... , 2012-12.  12 checks.  All
months but 2012-01 are excluded.  325 partitions excluded with only 12
constraint exclusion checks.

Check partitions 2012-01-01, 2012-01-02, ... , 2012-01-31. 31 checks.
All days but 2012-01-01 are excluded.  30 partitions excluded with 31
constraint exclusion checks.

Sequential scan on partition 2012-01-01.
#

Or would it have to check every child partition, as follows?

#
Check partition 2000-01-01, 2000-01-02, ... , 2013-09-18.  5010 checks.
All days but 2012-01-01 are excluded.  5009 partitions excluded with
5010 constraint exclusion checks.

Sequential scan on partition 2012-01-01.
#

The former would be much more efficient.

>
>> One can see that with a decade of data stored, there would be
>> approximately 3652 partitions. The documentation discourages having
>> this many partitions.
>
> And you shouldn't. In our partitions, we try to keep it around one
> partition per month. I've recently written a library that eschews
> triggers in favor of a nightly movement job to redistribute data, and
> it can handle partition granularity on a sliding scale. The default is
> one month, but my tests show it works properly down to 1 day. Still,
> the lowest I'd use it is probably 1 week. That would still drastically
> reduce your amount of estimated partitions.
>
> However you decide to do it, don't optimize prematurely. Monthly
> partitions are enough for us and our 130GB table with 250M rows, and
> that's for a single year.
>
>



Re: nested partitioning

От
Shaun Thomas
Дата:
On 09/18/2013 11:05 AM, "Gabriel E. Sánchez Martínez" wrote:

> Or would it have to check every child partition, as follows?
>
> #
> Check partition 2000-01-01, 2000-01-02, ... , 2013-09-18.  5010 checks.
> All days but 2012-01-01 are excluded.  5009 partitions excluded with
> 5010 constraint exclusion checks.

My guess based on the fact the planner has no concept of ranges aside
from their width, is that this is the approach it'll likely take. You
can't really teach it that your particular inheritance tree is range
constrained per level, so it has to check every range.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: nested partitioning

От
"Gabriel E. Sánchez Martínez"
Дата:
On 09/18/2013 12:20 PM, Shaun Thomas wrote:
> On 09/18/2013 11:05 AM, "Gabriel E. Sánchez Martínez" wrote:
>
>> Or would it have to check every child partition, as follows?
>>
>> #
>> Check partition 2000-01-01, 2000-01-02, ... , 2013-09-18.  5010 checks.
>> All days but 2012-01-01 are excluded.  5009 partitions excluded with
>> 5010 constraint exclusion checks.
>
> My guess based on the fact the planner has no concept of ranges aside
> from their width, is that this is the approach it'll likely take. You
> can't really teach it that your particular inheritance tree is range
> constrained per level, so it has to check every range.
>

In that case maybe this could be considered as a performance improvement
for future versions of PG?  If I am not mistaken check constraints are
inherited from parent table to child table, making it feasible for the
query planner to prune entire branches at once by checking the
inheritance structure.  I know that query performance should be very
good with partitions for each month and indexes on the timestamp column,
but I can think of many applications where data is extracted in daily
batches and it is more natural and convenient for DB maintenance to have
a partition per day.  In big data applications, it may be feasible to
fit a day's worth of data in RAM, but not a whole month; in such cases
there could be performance gains from tree constraint exclusion.

Thanks for your insight.