Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions

Поиск
Список
Период
Сортировка
От Alexey Kondratov
Тема Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Дата
Msg-id b25ce80e-f536-78c8-d5c8-a5df3e230785@postgrespro.ru
обсуждение исходный текст
Ответ на Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Ответы Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список pgsql-hackers
>>
>> FWIW my understanding is that the speedup comes mostly from 
>> elimination of
>> the serialization to a file. That however requires savepoints to handle
>> aborts of subtransactions - I'm pretty sure I'd be trivial to create a
>> workload where this will be much slower (with many aborts of large
>> subtransactions).
>>

Yes, and it was my main motivation to eliminate that extra serialization 
to file. I've experimented a bit with large transactions + savepoints + 
aborts and ended up with a following query (the same schema as before 
with 600k rows):

BEGIN;
SAVEPOINT s1;
UPDATE large_test SET num1 = num1 + 1, num2 = num2 + 1, num3 = num3 + 1;
SAVEPOINT s2;
UPDATE large_test SET num1 = num1 + 1, num2 = num2 + 1, num3 = num3 + 1;
SAVEPOINT s3;
UPDATE large_test SET num1 = num1 + 1, num2 = num2 + 1, num3 = num3 + 1;
ROLLBACK TO SAVEPOINT s3;
ROLLBACK TO SAVEPOINT s2;
ROLLBACK TO SAVEPOINT s1;
END;

It looks like the worst case scenario, as we do a lot of work and then 
abort all subxacts one by one. As expected,it takes much longer (up to 
x30) to process using background worker instead of spilling to file. 
Surely, it is much easier to truncate a file, than apply all changes + 
abort. However, I guess that this kind of load pattern is not the most 
typical for real-life applications.

Also this test helped me to find a bug in my current savepoints routine, 
so new patch is attached.

On 30.08.2019 18:59, Konstantin Knizhnik wrote:
>
> I think that instead of defining savepoints it is simpler and more 
> efficient to use
>
> BeginInternalSubTransaction + 
> ReleaseCurrentSubTransaction/RollbackAndReleaseCurrentSubTransaction
>
> as it is done in PL/pgSQL (pl_exec.c).
> Not sure if it can pr
>

Both BeginInternalSubTransaction and DefineSavepoint use 
PushTransaction() internally for a normal subtransaction start. So they 
seems to be identical from the performance perspective, which is also 
stated in the comment section:

/*
  * BeginInternalSubTransaction
  *        This is the same as DefineSavepoint except it allows 
TBLOCK_STARTED,
  *        TBLOCK_IMPLICIT_INPROGRESS, TBLOCK_END, and TBLOCK_PREPARE 
states,
  *        and therefore it can safely be used in functions that might 
be called
  *        when not inside a BEGIN block or when running deferred 
triggers at
  *        COMMIT/PREPARE time.  Also, it automatically does
  *        CommitTransactionCommand/StartTransactionCommand instead of 
expecting
  *        the caller to do it.
  */

Please, correct me if I'm wrong.

Anyway, I've performed a profiling of my apply worker (flamegraph is 
attached) and it spends the vast amount of time (>90%) applying changes. 
So the problem is not in the savepoints their-self, but in the fact that 
we first apply all changes and then abort all the work. Not sure, that 
it is possible to do something in this case.


Regards

-- 
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: block-level incremental backup
Следующее
От: Robert Haas
Дата:
Сообщение: Re: POC: Cleaning up orphaned files using undo logs