Re: Sequence Access Method WIP

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Sequence Access Method WIP
Дата
Msg-id 54C917DB.2090307@vmware.com
обсуждение исходный текст
Ответ на Re: Sequence Access Method WIP  (Petr Jelinek <petr@2ndquadrant.com>)
Ответы Re: Sequence Access Method WIP  (Petr Jelinek <petr@2ndquadrant.com>)
Список pgsql-hackers
On 01/23/2015 02:34 AM, Petr Jelinek wrote:
> On 22/01/15 17:02, Petr Jelinek wrote:
>>
>> The new version (the one that is not submitted yet) of gapless sequence
>> is way more ugly and probably not best example either but does guarantee
>> gaplessness (it stores the last value in it's own value table). So I am
>> not sure if it should be included either...
>
> Here it is as promised.

I generally like the division of labour between the seqam 
implementations and the API now.

I don't like the default_sequenceam GUC. That seems likely to create 
confusion. And it's not really enough for something like a remote 
sequence AM anyway that definitely needs some extra reloptions, like the 
hostname of the remote server. The default should be 'local', unless you 
specify something else with CREATE SEQUENCE USING - no GUCs.

Some comments on pg_dump:

* In pg_dump's dumpSequence function, check the server version number 
instead of checking whether pg_sequence_dump_state() function exists. 
That's what we usually do when new features are introduced. And there's 
actually a bug there: you have the check backwards. (try dumping a 
database with any sequences in it; it fails)

* pg_dump should not output a USING clause when a sequence uses the 
'local' sequence. No point in adding such noise - making the SQL command 
not standard-compatible - for the 99% of databases that don't use other 
sequence AMs.

* Be careful to escape all strings correctly in pg_dump. I think you're 
missing escaping for the 'state' variable at least.

In sequence_save_tuple:
>     else
>     {
>         /*
>          * New tuple was not sent, so the original tuple was probably just
>          * changed inline, all we need to do is mark the buffer dirty and
>          * optionally log the update tuple.
>          */
>         START_CRIT_SECTION();
>
>         MarkBufferDirty(seqh->buf);
>
>         if (do_wal)
>             log_sequence_tuple(seqh->rel, &seqh->tup, seqh->buf, page);
>
>         END_CRIT_SECTION();
>     }

This is wrong when checksums are enabled and !do_wal. I believe this 
should use MarkBufferDirtyHint().

> Notable changes:
> - The gapless sequence rewritten to use the transactional storage as
> that's the only way to guarantee gaplessness between dump and restore.

Can you elaborate?

Using the auxiliary seqam_gapless_values is a bit problematic. First of 
all, the event trigger on DROP SEQUENCE doesn't fire if you drop the 
whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly, 
updating a row in a table for every nextval() call is pretty darn 
expensive. But I don't actually see a problem with dump and restore.

Can you rewrite it without using the values table? AFAICS, there are a 
two of problems to solve:

1. If the top-level transaction aborts, you need to restore the old 
value. I suggest keeping two values in the sequence tuple: the old, 
definitely committed one, and the last value. The last value is only 
considered current if the associated XID committed; otherwise the old 
value is current. When a transaction is about to commit, it stores its 
top-level XID and the new value in the "last" field, and copies the 
previously current value to the "old" field.

2. You need to track the last value on a per-subtransaction basis, until 
the transaction commits/rolls back, in order to have "rollback to 
savepoint" to retreat the sequence's value. That can be done in 
backend-private memory, maintaining a stack of subtransactions and the 
last value of each. Use RegisterSubXactCallback to hook into subxact 
commit/abort. Just before top-level commit (in pre-commit callback), 
update the sequence tuple with the latest value, so that it gets 
WAL-logged before the commit record.

- Heikki




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: PQgetssl() and alternative SSL implementations
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: jsonb, unicode escapes and escaped backslashes