Обсуждение: Fix rounding method used to compute huge pages
Hi, When computing the dynamic value of shared_memory_size_in_huge_pages, (1+size_b/hp_size) is currently used. This works when size_b is not divisible by hp_size. However, it will yield an additional huge page when size_b is divisible by hp_size. On CreateAnonymousSegment's side, the allocation size is rounded up to the next required huge pages when necessary. However, there's no overflow check when doing this round up. 0001: This patch replicates CreateAnonymousSegment's rounding method to InitializeShmemGUCs, only rounding up when the value is not divisible by hp_size. 0002: This patch uses add_size in CreateAnonymousSegment when the allocation size is rounded up, to check for possible overflow. Regards, Anthonin Bonnefoy
Вложения
On Thu, Jan 22, 2026 at 10:13 PM Anthonin Bonnefoy
<anthonin.bonnefoy@datadoghq.com> wrote:
>
> Hi,
>
> When computing the dynamic value of shared_memory_size_in_huge_pages,
> (1+size_b/hp_size) is currently used. This works when size_b is not
> divisible by hp_size. However, it will yield an additional huge page
> when size_b is divisible by hp_size.
>
> On CreateAnonymousSegment's side, the allocation size is rounded up to
> the next required huge pages when necessary. However, there's no
> overflow check when doing this round up.
>
> 0001: This patch replicates CreateAnonymousSegment's rounding method
> to InitializeShmemGUCs, only rounding up when the value is not
> divisible by hp_size.
>
> 0002: This patch uses add_size in CreateAnonymousSegment when the
> allocation size is rounded up, to check for possible overflow.
We have similar incantation in CalculateShmemSize()
size = add_size(size, 8192 - (size % 8192));
I think we should just introduce a method ceil_size() and place it
near add_size() and mul_size() and use wherever we are rounding the
sizes.
Size
ceil_size(size, base)
{
return add_size(size, base - (size % base))
}
InitializeShmemGUCs() also has following line, which can can be
replaced with ceil_size(size_b, 1024 * 1024)
size_mb = add_size(size_b, (1024 * 1024) - 1) / (1024 * 1024);
--
Best Wishes,
Ashutosh Bapat
On Thu, Jan 22, 2026 at 05:42:44PM +0100, Anthonin Bonnefoy wrote:
> When computing the dynamic value of shared_memory_size_in_huge_pages,
> (1+size_b/hp_size) is currently used. This works when size_b is not
> divisible by hp_size. However, it will yield an additional huge page
> when size_b is divisible by hp_size.
Oops, it looks like this is my fault. I doubt this causes any practical
problems, but we might as well fix it.
+ if (size_b % hp_size != 0)
+ size_b = add_size(size_b, hp_size - (size_b % hp_size));
+ hp_required = size_b / hp_size;
I think we could simplify this a tad:
hp_required = size_b / hp_size;
if (size_b % hp_size != 0)
hp_required = add_size(hp_required, 1);
> 0002: This patch uses add_size in CreateAnonymousSegment when the
> allocation size is rounded up, to check for possible overflow.
Seems reasonable.
--
nathan
On Thu, Jan 22, 2026 at 03:24:13PM -0600, Nathan Bossart wrote: > Oops, it looks like this is my fault. I doubt this causes any practical > problems, but we might as well fix it. And it's something I have committed. > + if (size_b % hp_size != 0) > + size_b = add_size(size_b, hp_size - (size_b % hp_size)); > + hp_required = size_b / hp_size; > > I think we could simplify this a tad: > > hp_required = size_b / hp_size; > if (size_b % hp_size != 0) > hp_required = add_size(hp_required, 1); FWIW, we have always been kind of sloppy with slightly overestimating the shmem size required in the backend code, and here it's just a one. I don't see a strong need for a backpatch here. Of course, no objections in adjusting that on HEAD. Nathan, you are planning to take care of that as original author? This should fall under my bucket as original committer, but as you were an author, feel free to take priority here of course. -- Michael
Вложения
On Fri, Jan 23, 2026 at 08:04:29AM +0900, Michael Paquier wrote: > FWIW, we have always been kind of sloppy with slightly overestimating > the shmem size required in the backend code, and here it's just a one. > I don't see a strong need for a backpatch here. Of course, no > objections in adjusting that on HEAD. Agreed. > Nathan, you are planning to > take care of that as original author? This should fall under my > bucket as original committer, but as you were an author, feel free to > take priority here of course. I'll take care of it (likely tomorrow). -- nathan
On Thu, Jan 22, 2026 at 05:10:08PM -0600, Nathan Bossart wrote: > I'll take care of it (likely tomorrow). Thanks! -- Michael
Вложения
On Thu, Jan 22, 2026 at 10:24 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > Oops, it looks like this is my fault. I doubt this causes any practical > problems, but we might as well fix it. Yeah, the chance of this being a problem is pretty low. > + if (size_b % hp_size != 0) > + size_b = add_size(size_b, hp_size - (size_b % hp_size)); > + hp_required = size_b / hp_size; > > I think we could simplify this a tad: > > hp_required = size_b / hp_size; > if (size_b % hp_size != 0) > hp_required = add_size(hp_required, 1); From my understanding, 'add_size(hp_required, 1)' will never overflow since size_b was checked for overflow, and hp_size should always be >1 (except if huge pages of 1 byte exist somewhere). For consistency with CreateAnonymousSegment, using 'add_size(size_b, hp_size - (size_b % hp_size))' will also check that the final requested allocation doesn't overflow.
Committed. On Fri, Jan 23, 2026 at 09:21:53AM +0100, Anthonin Bonnefoy wrote: > From my understanding, 'add_size(hp_required, 1)' will never overflow > since size_b was checked for overflow, and hp_size should always be >1 > (except if huge pages of 1 byte exist somewhere). That's true, but for this sort of thing, I usually prefer to avoid relying on those kinds of assumptions to reason about the correctness of the code. The overflow check costs little, and IIUC this function is run exactly once for the lifetime of the server. > For consistency with CreateAnonymousSegment, using 'add_size(size_b, > hp_size - (size_b % hp_size))' will also check that the final > requested allocation doesn't overflow. *shrug* I don't see a strong reason for consistency here. AFAICT you'd have to be trying to allocate something like 18 exabytes on most systems for there to be a problem, at which point there are probably bigger issues to sort out. Thanks for the patch! -- nathan