Обсуждение: WAL log only necessary part of 2PC GID

Поиск
Список
Период
Сортировка

WAL log only necessary part of 2PC GID

От
Pavan Deolasee
Дата:
Hello Hackers,

The maximum size of the GID, used as a 2PC identifier is currently defined as 200 bytes (see src/backend/access/transam/twophase.c). The actual GID used by the applications though may be much smaller than that. So IMO instead of WAL logging the entire 200 bytes during PREPARE TRANSACTION, we should just WAL log strlen(gid) bytes.

The attached patch does that. The changes are limited to twophase.c and some simple crash recovery tests seem to be work ok. In terms of performance, a quick test shows marginal improvement in tps using the script that Stas Kelvich used for his work on speeding up twophase transactions. The only change I made is to keep the :scale unchanged because increasing the :scale in every iteration will result in only a handful updates (not sure why Stas had that in his original script)

\set naccounts 100000 * :scale
\setrandom from_aid 1 :naccounts
\setrandom to_aid 1 :naccounts
\setrandom delta 1 100
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance - :delta WHERE aid = :from_aid;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :to_aid;
PREPARE TRANSACTION ':client_id.:scale';
COMMIT PREPARED ':client_id.:scale';

The amount of WAL generated during a 60s run shows a decline of about 25% with default settings except full_page_writes which is turned off.

HEAD: 861 WAL bytes / transaction
PATCH: 670 WAL bytes / transaction

Actually, the above numbers probably include a lot of WAL generated because of HOT pruning and page defragmentation. If we just look at the WAL overhead caused by 2PC, the decline is somewhere close to 50%. I took numbers using simple 1PC for reference and to understand the overhead of 2PC.

HEAD (1PC): 382 bytes / transaction

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: WAL log only necessary part of 2PC GID

От
Jesper Pedersen
Дата:
On 02/29/2016 08:45 AM, Pavan Deolasee wrote:
> Hello Hackers,
>
> The maximum size of the GID, used as a 2PC identifier is currently defined
> as 200 bytes (see src/backend/access/transam/twophase.c). The actual GID
> used by the applications though may be much smaller than that. So IMO
> instead of WAL logging the entire 200 bytes during PREPARE TRANSACTION, we
> should just WAL log strlen(gid) bytes.
>
> The attached patch does that. The changes are limited to twophase.c and
> some simple crash recovery tests seem to be work ok. In terms of
> performance, a quick test shows marginal improvement in tps using the
> script that Stas Kelvich used for his work on speeding up twophase
> transactions. The only change I made is to keep the :scale unchanged
> because increasing the :scale in every iteration will result in only a
> handful updates (not sure why Stas had that in his original script)
>
> \set naccounts 100000 * :scale
> \setrandom from_aid 1 :naccounts
> \setrandom to_aid 1 :naccounts
> \setrandom delta 1 100
> BEGIN;
> UPDATE pgbench_accounts SET abalance = abalance - :delta WHERE aid =
> :from_aid;
> UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid =
> :to_aid;
> PREPARE TRANSACTION ':client_id.:scale';
> COMMIT PREPARED ':client_id.:scale';
>
> The amount of WAL generated during a 60s run shows a decline of about 25%
> with default settings except full_page_writes which is turned off.
>
> HEAD: 861 WAL bytes / transaction
> PATCH: 670 WAL bytes / transaction
>
> Actually, the above numbers probably include a lot of WAL generated because
> of HOT pruning and page defragmentation. If we just look at the WAL
> overhead caused by 2PC, the decline is somewhere close to 50%. I took
> numbers using simple 1PC for reference and to understand the overhead of
> 2PC.
>
> HEAD (1PC): 382 bytes / transaction
>

I can confirm the marginal speed up in tps due to the new WAL size.

The TWOPHASE_MAGIC constant should be changed, as the file header has 
changed definition, right ?

Thanks for working on this !

Best regards, Jesper




Re: WAL log only necessary part of 2PC GID

От
Pavan Deolasee
Дата:


On Fri, Mar 4, 2016 at 9:16 PM, Jesper Pedersen <jesper.pedersen@redhat.com> wrote:



I can confirm the marginal speed up in tps due to the new WAL size.

The TWOPHASE_MAGIC constant should be changed, as the file header has changed definition, right ?

Thanks for looking at it. I've revised the patch by incrementing the TWOPHASE_MAGIC identifier.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: WAL log only necessary part of 2PC GID

От
Jesper Pedersen
Дата:
On 03/08/2016 11:54 PM, Pavan Deolasee wrote:
> On Fri, Mar 4, 2016 at 9:16 PM, Jesper Pedersen <jesper.pedersen@redhat.com>
> wrote:
>> I can confirm the marginal speed up in tps due to the new WAL size.
>>
>> The TWOPHASE_MAGIC constant should be changed, as the file header has
>> changed definition, right ?
>>
>
> Thanks for looking at it. I've revised the patch by incrementing the
> TWOPHASE_MAGIC identifier.
>

Great, thanks.

Marked "Ready for Committer".

Best regards, Jesper





Re: WAL log only necessary part of 2PC GID

От
Petr Jelinek
Дата:
Hi,

I wonder why you define the gidlen as uint32 when it would fit into 
uint8 which in the current TwoPhaseFileHeader struct should be win of  8 
bytes on padding (on 64bit). I think that's something worth considering 
given that this patch aims to lower the size of the data.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: WAL log only necessary part of 2PC GID

От
Pavan Deolasee
Дата:
On Wed, Mar 9, 2016 at 7:56 PM, Petr Jelinek <petr@2ndquadrant.com> wrote:
Hi,

I wonder why you define the gidlen as uint32 when it would fit into uint8 which in the current TwoPhaseFileHeader struct should be win of  8 bytes on padding (on 64bit). I think that's something worth considering given that this patch aims to lower the size of the data.


Hi Petr,

That sounds like a good idea; I didn't think about that. I would like to make it uint16 though just in case if we decide to increase GIDSIZE from 200 to something more than 256 (Postgres-XL does that already). That still fits in the same aligned width, on both 32 as well as 64-bit machines. New version attached.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
Вложения

Re: WAL log only necessary part of 2PC GID

От
Petr Jelinek
Дата:
On 10/03/16 13:43, Pavan Deolasee wrote:
> On Wed, Mar 9, 2016 at 7:56 PM, Petr Jelinek <petr@2ndquadrant.com
> <mailto:petr@2ndquadrant.com>> wrote:
>
>     Hi,
>
>     I wonder why you define the gidlen as uint32 when it would fit into
>     uint8 which in the current TwoPhaseFileHeader struct should be win
>     of  8 bytes on padding (on 64bit). I think that's something worth
>     considering given that this patch aims to lower the size of the data.
>
>
> Hi Petr,
>
> That sounds like a good idea; I didn't think about that. I would like to
> make it uint16 though just in case if we decide to increase GIDSIZE from
> 200 to something more than 256 (Postgres-XL does that already). That
> still fits in the same aligned width, on both 32 as well as 64-bit
> machines. New version attached.

Correct, and I see Simon committed it like this in meantime, thanks.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services