Re: [WIP] Performance Improvement by reducing WAL for Update Operation

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Дата
Msg-id 001901cd7620$887a9b60$996fd220$@kapila@huawei.com
обсуждение исходный текст
Ответ на Re: [WIP] Performance Improvement by reducing WAL for Update Operation  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Simon Riggs
Sent: Thursday, August 09, 2012 2:49 PM
On 9 August 2012 09:49, Amit Kapila <amit.kapila@huawei.com> wrote:

>>> I'd suggest we do this only when the saving is large enough for
>>> benefit, rather than do this every time.
>>   Do you mean to say that when length of updated values of tuple is less
>> than some threshold(1/3 or 2/3, etc..) value of
>>   total length?

> Some heuristic, yes, similar to TOAST's minimum threshold. To attempt
> removal of rows in all cases would not be worth it, so we need a fast
> path way of saying lets just take all of the columns.
 Yes, it has to be done. Currently I have 2 ideas to take care of this: a. Based on number of updated columns b. Based
onlength of updated values If you have any other idea or you favor among one of the above, let me
 
know your opinion.

>>> You don't mention whether or not the old and the new tuple are on the
>>> same data block.
>
>>   WAL reduction is done for the case even when old and new are on
different
>> data blocks as well.

> That makes me feel nervous. I doubt the marginal gain is worth it.
> Most updates don't cross blocks.

How can it be proved whether gain is marginal or substantial to handle the
case.

One way is test after modification:
I have updated pg_bench tpc_b case:
1. Schema is such that it contains 1800 length rows
2. tpc_b only has updates
3. length of updated column values is 300.
4. All tables has 100% fill factor.
5. Vacuum is OFF

So in such a run, I think many should be updates are across blocks. But not
sure, neither I have verified it in any way.
The above run has given a good performance improvement.



>>> Please also bear in mind that Andres will be looking to include the PK
>>> columns in every WAL record for BDR. That could be an option, but I
>>> doubt there is much value in excluding PK columns.
>
>>   Agreed. However once the implementation by Andres is done I can merge
both
>> codes and
>>   take the performance data again, based on which we can take decision.

> It won't happen like that because there won't be a single point where
> Andres is done. If you agree, then its worth doing it that way to
> begin with, rather than requiring us to revisit the same section of
> code twice.

This optimization is to reduce the amount of WAL and definitely adding
anything extra will 
have some impact. 
However if there is no better way other than by including PK in WAL, then I
don't have any problem.

> One huge point that needs to be thought through is how we prove this
> code actually works on WAL/recovery side. A normal regression test
> won't prove that and we don't have a framework in place for that.

My initial idea to validate recovery :
1. Manual Test: a. To generate enough scenarios for update operation.                b. For each scenario, make sure
Replayhappens properly.
 
2. Community Review.



With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: [WIP] Performance Improvement by reducing WAL for Update Operation