Re: pg_combinebackup --copy-file-range

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: pg_combinebackup --copy-file-range
Дата
Msg-id c82137eb-460e-41ca-be78-1bb32829bf1f@enterprisedb.com
обсуждение исходный текст
Ответ на Re: pg_combinebackup --copy-file-range  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: pg_combinebackup --copy-file-range  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers

On 3/31/24 06:46, Thomas Munro wrote:
> On Sun, Mar 31, 2024 at 5:33 PM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
>> I'm on 2.2.2 (on Linux). But there's something wrong, because the
>> pg_combinebackup that took ~150s on xfs/btrfs, takes ~900s on ZFS.
>>
>> I'm not sure it's a ZFS config issue, though, because it's not CPU or
>> I/O bound, and I see this on both machines. And some simple dd tests
>> show the zpool can do 10x the throughput. Could this be due to the file
>> header / pool alignment?
> 
> Could ZFS recordsize > 8kB be making it worse, repeatedly dealing with
> the same 128kB record as you copy_file_range 16 x 8kB blocks?
> (Guessing you might be using the default recordsize?)
> 

No, I reduced the record size to 8kB. And the pgbench init takes about
the same as on other filesystems on this hardware, I think. ~10 minutes
for scale 5000.

>> I admit I'm not very familiar with the format, but you're probably right
>> there's a header, and header_length does not seem to consider alignment.
>> make_incremental_rfile simply does this:
>>
>>     /* Remember length of header. */
>>     rf->header_length = sizeof(magic) + sizeof(rf->num_blocks) +
>>         sizeof(rf->truncation_block_length) +
>>         sizeof(BlockNumber) * rf->num_blocks;
>>
>> and sendFile() does the same thing when creating incremental basebackup.
>> I guess it wouldn't be too difficult to make sure to align this to
>> BLCKSZ or something like this. I wonder if the file format is documented
>> somewhere ... It'd certainly be nicer to tweak before v18, if necessary.
>>
>> Anyway, is that really a problem? I mean, in my tests the CoW stuff
>> seemed to work quite fine - at least on the XFS/BTRFS. Although, maybe
>> that's why it took longer on XFS ...
> 
> Yeah I'm not sure, I assume it did more allocating and copying because
> of that.  It doesn't matter and it would be fine if a first version
> weren't as good as possible, and fine if we tune the format later once
> we know more, ie leaving improvements on the table.  I just wanted to
> share the observation.  I wouldn't be surprised if the block-at-a-time
> coding makes it slower and maybe makes the on disk data structures
> worse, but I dunno I'm just guessing.
> 
> It's also interesting but not required to figure out how to tune ZFS
> well for this purpose right now...

No idea. Any idea if there's some good ZFS statistics to check?


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Andrey M. Borodin"
Дата:
Сообщение: Re: [PATCH] pgbench log file headers
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: [HACKERS] make async slave to wait for lsn to be replayed