Re: Modest proposal to extend TableAM API for controlling cluster commands

Поиск
Список
Период
Сортировка
От Mark Dilger
Тема Re: Modest proposal to extend TableAM API for controlling cluster commands
Дата
Msg-id 63B02BB6-22AF-486C-AC47-18705CBDC918@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Modest proposal to extend TableAM API for controlling cluster commands  (Andres Freund <andres@anarazel.de>)
Ответы Re: Modest proposal to extend TableAM API for controlling cluster commands  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers

> On Jun 15, 2022, at 6:01 PM, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-06-15 17:21:56 -0700, Mark Dilger wrote:
>> While developing various Table Access Methods, I have wanted a callback for
>> determining if CLUSTER (and VACUUM FULL) should be run against a table
>> backed by a given TAM.  The current API contains a callback for doing the
>> guts of the cluster, but by that time, it's a bit too late to cleanly back
>> out.  For single relation cluster commands, raising an error from that
>> callback is probably not too bad.  For multi-relation cluster commands, that
>> aborts the clustering of other yet to be processed relations, which doesn't
>> seem acceptable.
>
> Why not? What else do you want to do in that case? Silently ignoring
> non-clusterable tables doesn't seem right either. What's the use-case for
> swallowing the error?

Imagine you develop a TAM for which the concept of "clustering" doesn't have any defined meaning.  Perhaps you've
arrangedthe data in a way that has no similarity to heap, and now somebody runs a CLUSTER command (with no arguments.)
It'sreasonable that they want all their heap tables to get the usual cluster_rel treatment, and that the existence of a
tableusing your exotic TAM shouldn't interfere with that.  Then what?  You are forced to copy all the data from your
OldHeap(badly named) to the NewHeap (also badly named), or to raise an error.  That doesn't seem ok. 

>> I tried fixing this with a ProcessUtility_hook, but that
>> fires before the multi-relation cluster command has compiled the list of
>> relations to cluster, meaning the ProcessUtility_hook doesn't have access to
>> the necessary information.  (It can be hacked to compile the list of
>> relations itself, but that duplicates both code and effort, with the usual
>> risks that the code will get out of sync.)
>>
>> For my personal development, I have declared a new hook, bool
>> (*relation_supports_cluster) (Relation rel).  It gets called from
>> commands/cluster.c in both the single-relation and multi-relation code
>> paths, with warning or debug log messages output for relations that decline
>> to be clustered, respectively.
>
> Do you actually need to dynamically decide whether CLUSTER is supported?
> Otherwise we could just make the existing cluster callback optional and error
> out if a table is clustered that doesn't have the callback.

Same as above, I don't know why erroring would be the right thing to do.  As a comparison, consider that we don't
attemptto cluster a partitioned table, but rather just silently skip it.  Imagine if, when we introduced the concept of
partitionedtables, we made unqualified CLUSTER commands always fail when they encountered a partitioned table. 

>> Before posting a patch, do people think this sounds useful?  Would you like
>> the hook function signature to differ in some way?  Is a simple "yes this
>> relation may be clustered" vs. "no this relation may not be clustered"
>> interface overly simplistic?
>
> It seems overly complicated, if anything ;)

For the TAMs I've developed thus far, I don't need the (Relation rel) parameter, and could just have easily used
(void). But that seems to fence in what other TAM authors could do in future. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mark Dilger
Дата:
Сообщение: Extending USING [heap | mytam | yourtam] grammar and behavior
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Using PQexecQuery in pipeline mode produces unexpected Close messages