Re: Sequence Access Method WIP

Поиск
Список
Период
Сортировка
От Petr Jelinek
Тема Re: Sequence Access Method WIP
Дата
Msg-id 54C922C5.7030107@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Sequence Access Method WIP  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: Sequence Access Method WIP  (Petr Jelinek <petr@2ndquadrant.com>)
Список pgsql-hackers
On 28/01/15 18:09, Heikki Linnakangas wrote:
> On 01/23/2015 02:34 AM, Petr Jelinek wrote:
>> On 22/01/15 17:02, Petr Jelinek wrote:
>>>
>>> The new version (the one that is not submitted yet) of gapless sequence
>>> is way more ugly and probably not best example either but does guarantee
>>> gaplessness (it stores the last value in it's own value table). So I am
>>> not sure if it should be included either...
>>
>> Here it is as promised.
>
> I generally like the division of labour between the seqam
> implementations and the API now.
>
> I don't like the default_sequenceam GUC. That seems likely to create
> confusion. And it's not really enough for something like a remote
> sequence AM anyway that definitely needs some extra reloptions, like the
> hostname of the remote server. The default should be 'local', unless you
> specify something else with CREATE SEQUENCE USING - no GUCs.
>

Hmm, I am not too happy about this, I want SERIAL to work with custom 
sequences (as long as it's possible for the sequence to work that way at 
least). That's actually important feature for me. Your argument about 
this being potential problem for some sequenceAMs is valid though.

> Some comments on pg_dump:
>
> * In pg_dump's dumpSequence function, check the server version number
> instead of checking whether pg_sequence_dump_state() function exists.
> That's what we usually do when new features are introduced. And there's
> actually a bug there: you have the check backwards. (try dumping a
> database with any sequences in it; it fails)

Right.

> * pg_dump should not output a USING clause when a sequence uses the
> 'local' sequence. No point in adding such noise - making the SQL command
> not standard-compatible - for the 99% of databases that don't use other
> sequence AMs.

Well this only works if we remove the GUC. Because otherwise if GUC is 
there then you always need to either add USING clause or set the GUC in 
advance (like we do with search_path for example).

> * Be careful to escape all strings correctly in pg_dump. I think you're
> missing escaping for the 'state' variable at least.

Ouch.

> In sequence_save_tuple:
>>     else
>>     {
>>         /*
>>          * New tuple was not sent, so the original tuple was probably
>> just
>>          * changed inline, all we need to do is mark the buffer dirty and
>>          * optionally log the update tuple.
>>          */
>>         START_CRIT_SECTION();
>>
>>         MarkBufferDirty(seqh->buf);
>>
>>         if (do_wal)
>>             log_sequence_tuple(seqh->rel, &seqh->tup, seqh->buf, page);
>>
>>         END_CRIT_SECTION();
>>     }
>
> This is wrong when checksums are enabled and !do_wal. I believe this
> should use MarkBufferDirtyHint().
>

Oh ok, didn't realize.

>> Notable changes:
>> - The gapless sequence rewritten to use the transactional storage as
>> that's the only way to guarantee gaplessness between dump and restore.
>
> Can you elaborate?
>
> Using the auxiliary seqam_gapless_values is a bit problematic. First of
> all, the event trigger on DROP SEQUENCE doesn't fire if you drop the
> whole schema containing the sequence with DROP SCHEMA CASCADE. Secondly,

Yeah but at worst there are some unused rows there, it's not too bad. I 
could also create relation per sequence so that dependency code would 
handle everything correctly, but seems bit too expensive to create not 
one but two relations per sequence...

> updating a row in a table for every nextval() call is pretty darn
> expensive.

Yes it's expensive, but the gapless sequence also serializes access so 
it will never be speed daemon.

> But I don't actually see a problem with dump and restore.

The problem is that the tuple stored in the sequence relation is always 
the one with latest state (and with frozen xid), so pg_dump has no way 
of getting the value as it was at the time it took the snapshot. This 
means that after dump/restore cycle, the sequence can be further away 
than the table it's attached to and you end up with a gap there.


> Can you rewrite it without using the values table? AFAICS, there are a
> two of problems to solve:
>
> 1. If the top-level transaction aborts, you need to restore the old
> value. I suggest keeping two values in the sequence tuple: the old,
> definitely committed one, and the last value. The last value is only
> considered current if the associated XID committed; otherwise the old
> value is current. When a transaction is about to commit, it stores its
> top-level XID and the new value in the "last" field, and copies the
> previously current value to the "old" field.
>

Yes, this is how the previous implementation worked.

> 2. You need to track the last value on a per-subtransaction basis, until
> the transaction commits/rolls back, in order to have "rollback to
> savepoint" to retreat the sequence's value. That can be done in
> backend-private memory, maintaining a stack of subtransactions and the
> last value of each. Use RegisterSubXactCallback to hook into subxact
> commit/abort. Just before top-level commit (in pre-commit callback),
> update the sequence tuple with the latest value, so that it gets
> WAL-logged before the commit record.
>

I started writing something like what you described here but then I 
realized the dump/restore problem and went with the values table which 
solves both of these issues at the same time.

BTW I am happy to discuss this at FOSDEM if you are interested and will 
have time.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: jsonb, unicode escapes and escaped backslashes
Следующее
От: Tom Lane
Дата:
Сообщение: Re: PQgetssl() and alternative SSL implementations