Re: On partitioning

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: On partitioning
Дата
Msg-id CA+Tgmob5DEtO4SbD15q0OQJjyc05cTk8043Utwu_=XDtvyGNSw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: On partitioning  (Martijn van Oosterhout <kleptog@svana.org>)
Список pgsql-hackers
On Tue, Sep 2, 2014 at 4:18 PM, Martijn van Oosterhout
<kleptog@svana.org> wrote:
> On Tue, Sep 02, 2014 at 09:44:17AM -0400, Bruce Momjian wrote:
>> On Sun, Aug 31, 2014 at 10:45:29PM +0200, Martijn van Oosterhout wrote:
>> > There is one situation where you need to be more flexible, and that is
>> > if you ever want to support online repartitioning. To do that you have
>> > to distinguish between "I want to insert tuple X, which partition
>> > should it go into" and "I want to know which partitions I need to look
>> > for partition_key=Y".
>>
>> I am unclear why having information per-partition rather than on the
>> parent table helps with online reparitioning.
>
> An example:
>
> We have three partitions, one for X<0 (A), one for 0<=X<5 (B) and one
> for X>=5 (C).  These are in three different tables.
>
> Now we give the command to merge the last two partitions B&C. You now
> have the choice to lock the table while you move all the tuples from C
> to B.
>
> Or you can make some adjustments such that new tuples that would have gone
> to C now go to B. And if there is a query for X=10 that you look in
> *both* B & C. Then the existing tuples can be moved from C to B at any
> time without blocking any other operations.
>
> Is this clearer? If you up front decide that which partition to query
> will be determined by a function that can only return one table, then
> the above becomes impossible.
>
>> Robert's idea of using normal table inheritance means we can access/move
>> the data independently of the partitioning system.  My guess is that we
>> will need to do repartitioning with some tool, rather than as part of
>> normal database operation.
>
> Doing it as some tool seems like a hack to me. And since the idea was (I
> thought) that partitions would not be directly accessable from SQL, it
> has to be in the database itself.

I agree.  My main point about reusing the inheritance stuff is that
we've done over the years is that we shouldn't reinvent the wheel, but
rather build on what we've already got.

If the proposed design somehow involved treating all of the partitions
as belonging to the same TID space (which doesn't really seem
possible, but let's suspend disbelief) so that you could have a single
index that covers all the partitions, and the system would somehow
work out which TIDs live in which physical files, then it would be
reasonable to view the storage layer as an accident that higher levels
of the system don't need to know anything about.

But the actual proposal involves having multiple relations that have
to get planned just like real tables, and that means all the
optimizations that we've done on gathering statistics for inheritance
hierarchies, and MergeAppend, and every other bit of planner smarts
that we have will be applicable to this new method, too.  Let's not do
anything that forces us to reinvent all of those things.

Now, to be fair, one could certainly argue (and I would agree) that
the existing optimizations are insufficient.  In particular, the fact
that SELECT * FROM partitioned_table WHERE not_the_partitioning_key =
1 has to be planned separately for every partition is horrible, and
the fact that SELECT * FROM partitioned_table WHERE partitioning_key =
1 has to use an algorithm that is both O(n) in the partition count and
has a relatively high constant factor to exclude all of the
non-matching partitions also sucks.  But I think we're better off
trying to view those as further optimizations that we can apply to
certain special cases of partitioning - e.g. when the partitioning
syntax is used, constrain all the tables to have identical tuple
descriptors and matching indexes (and maybe constraints) so that when
you plan, you can do it once and then used the transposed plan for all
partitions.  Figuring out how to do run-time partition pruning would
be awesome, too.

But I don't see that any of this stuff gets easier by ignoring what's
already been built; then you're likely to spend all your time
reinventing the crap we've already done, and any cases where the new
system misses an optimization that's been achieved in the current
system become unpleasant dilemmas for our users.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Marko Tiikkaja
Дата:
Сообщение: Re: pgcrypto: PGP signatures
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: delta relations in AFTER triggers