Re: Initdb-time block size specification

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Initdb-time block size specification
Дата
Msg-id 61a088c5-197b-7364-a8a8-f7f22c3b071f@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Initdb-time block size specification  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On 7/1/23 00:05, Andres Freund wrote:
> Hi,
> 
> On 2023-06-30 23:27:45 +0200, Tomas Vondra wrote:
>> On 6/30/23 23:11, Andres Freund wrote:
>>> ...
>>>
>>> If we really wanted to do this - but I don't think we do - I'd argue for
>>> working on the buildsystem support to build the postgres binary multiple
>>> times, for 4, 8, 16 kB BLCKSZ and having a wrapper postgres binary that just
>>> exec's the relevant "real" binary based on the pg_control value.  I really
>>> don't see us ever wanting to make BLCKSZ runtime configurable within one
>>> postgres binary. There's just too much intrinsic overhead associated with
>>> that.
>>
>> I don't quite understand why we shouldn't do this (or at least try to).
>>
>> IMO the benefits of using smaller blocks were substantial (especially
>> for 4kB, most likely due matching the internal SSD page size). The other
>> benefits (reducing WAL volume) seem rather interesting too.
> 
> Mostly because I think there are bigger gains to be had elsewhere.
> 

I think that decision is up to whoever chooses to work on it, especially
if performance is not their primary motivation (IIRC this was discussed
as part of the TDE session).

> IME not a whole lot of storage ships by default with externally visible 4k
> sectors, but needs to be manually reformated [1], which looses all data, so it
> has to be done initially.

I don't see why "you have to configure stuff" would be a reason against
improvements in this area. I don't know how prevalent storage with 4k
sectors is now, but AFAIK it's not hard to get and it's likely to get
yet more common in the future.

FWIW I don't think the benefits of different (lower) page sizes hinge on
4k sectors - it's just that not having to do FPIs would make it even
more interesting.

> Then postgres would also need OS specific trickery
> to figure out that indeed the IO stack is entirely 4k (checking sector size is
> not enough).

I haven't suggested we should be doing that automatically (would be
nice, but I'd be happy with knowing when it's safe to disable FPW using
the GUC in config). But knowing when it's safe would make it yet more
interesting be able to use a different block page size at initdb.

> And you run into the issue that suddenly the #column and
> index-tuple-size limits are lower, which won't make it easier.
> 

True. This limit is annoying, but no one is proposing to change the
default page size. initdb would just provide a more convenient way to do
that, but the user would have to check. (I rather doubt many people
actually index such large values).

> 
> I think we should change the default of the WAL blocksize to 4k
> though. There's practically no downsides, and it drastically reduces
> postgres-side write amplification in many transactional workloads, by only
> writing out partially filled 4k pages instead of partially filled 8k pages.
> 

+1 (although in my tests the benefits we much smaller than for BLCKSZ)

> 
>> Sure, there are challenges (e.g. the overhead due to making it dynamic).
>> No doubt about that.
> 
> I don't think the runtime-dynamic overhead is avoidable with reasonable effort
> (leaving aside compiling code multiple times and switching between).
> 
> If we were to start building postgres for multiple compile-time settings, I
> think there are uses other than switching between BLCKSZ, potentially more
> interesting. E.g. you can see substantially improved performance by being able
> to use SSE4.2 without CPU dispatch (partially because it allows compiler
> autovectorization, partially because it allows to compiler to use newer
> non-vectorized math instructions (when targetting AVX IIRC), partially because
> the dispatch overhead is not insubstantial).  Another example: ARMv8
> performance is substantially better if you target ARMv8.1-A instead of
> ARMv8.0, due to having atomic instructions instead of LL/SC (it still baffles
> me that they didn't do this earlier, ll/sc is just inherently inefficient).
> 

Maybe, although I think it depends on what parts of the code this would
affect. If it's sufficiently small/isolated, it'd be possible to have
multiple paths, specialized to a particular page size (pretty common
technique for GPU/SIMD, I believe).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Initdb-time block size specification
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Initdb-time block size specification