Re: Initdb-time block size specification
От | David Christensen |
---|---|
Тема | Re: Initdb-time block size specification |
Дата | |
Msg-id | CAOxo6XL5vkKb0oc+AEuMJ9A3j1W=DwSAKJwzOA78zTHcO_QzEQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Initdb-time block size specification (Andres Freund <andres@anarazel.de>) |
Ответы |
Re: Initdb-time block size specification
(Tomas Vondra <tomas.vondra@enterprisedb.com>)
Re: Initdb-time block size specification (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
On Fri, Jun 30, 2023 at 2:39 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2023-06-30 12:35:09 -0500, David Christensen wrote: > > As discussed somewhat at PgCon, enclosed is v1 of a patch to provide > > variable block sizes; basically instead of BLCKSZ being a compile-time > > constant, a single set of binaries can support all of the block sizes > > Pg can support, using the value stored in pg_control as the basis. > > (Possible future plans would be to make this something even more > > dynamic, such as configured per tablespace, but this is out of scope; > > this just sets up the infrastructure for this.) > > I am extremely doubtful this is a good idea. For one it causes a lot of churn, > but more importantly it turns currently cheap code paths into more expensive > ones. > > Changes like > > > -#define BufHdrGetBlock(bufHdr) ((Block) (BufferBlocks + ((Size) (bufHdr)->buf_id) * BLCKSZ)) > > +#define BufHdrGetBlock(bufHdr) ((Block) (BufferBlocks + ((Size) (bufHdr)->buf_id) * CLUSTER_BLOCK_SIZE)) > > Note that CLUSTER_BLOCK_SIZE, despite looking like a macro that's constant, is > actually variable. Correct; that is mainly a notational device which would be easy enough to change (and presumably would follow along the lines of the commit Tomas pointed out above). > I am fairly certain this is going to be causing substantial performance > regressions. I think we should reject this even if we don't immediately find > them, because it's almost guaranteed to cause some. What would be considered substantial? Some overhead would be expected, but I think having an actual patch to evaluate lets us see what potential there is. Seems like this will likely be optimized as an offset stored in a register, so wouldn't expect huge changes here. (There may be other approaches I haven't thought of in terms of getting this.) > Besides this, I've not really heard any convincing justification for needing > this in the first place. Doing this would open up experiments in larger block sizes, so we would be able to have larger indexable tuples, say, or be able to store data types that are larger than currently supported for tuple row limits without dropping to toast (native vector data types come to mind as a candidate here). We've had 8k blocks for a long time while hardware has improved over 20+ years, and it would be interesting to see how tuning things would open up additional avenues for performance without requiring packagers to make a single choice on this regardless of use-case. (The fact that we allow compiling this at a different value suggests there is thought to be some utility having this be something other than the default value.) I just think it's one of those things that is hard to evaluate without actually having something specific, which is why we have this patch now. Thanks, David
В списке pgsql-hackers по дате отправления:
Следующее
От: Robert HaasДата:
Сообщение: Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?