Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Дата
Msg-id 20230121232922.juo7t3fhaso7qh3s@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Список pgsql-bugs
Hi,

On 2023-01-22 00:10:29 +0100, Tomas Vondra wrote:
> On 1/20/23 23:48, PG Bug reporting form wrote:
> > In these cases, the initdb phase will attempt to allocate huge pages that
> > are available in the OS, but it will be denied access by Kubernetes and
> > fail.
> 
> Well, so how exactly this fails? Does that mean Kubernetes broke mmap()
> with MAP_HUGETLB so that it doesn't return MAP_FAILED when hugepages are
> not available, or what? Because that's the only explanation I can see,
> looking at the code.

Yea, that's what I was wondering about as well.


> Or it just does not realize there are no hugepages, returns something
> and then crashes with SIGBUS later when trying to access it?

I assume that that's the case. There's references to bus errors in a bunch of
the linked issues. E.g.
https://github.com/CrunchyData/postgres-operator/issues/413

selecting default max_connections ... sh: line 1:    60 Bus error               (core dumped)
"/usr/pgsql-10/bin/postgres"--boot -x0 -F -c max_connections=100 -c shared_buffers=1000 -c
dynamic_shared_memory_type=none< "/dev/null" > "/dev/null" 2>&1
 

It's possible that the problem would go away if we used MAP_POPULATE for the
allocation.

I'd guess that this is annoying cgroups stuff :(


> I doubt we want to just go straight to changing the default value for
> everyone. IMHO if the "try" logic is somehow broken, we should fix the
> try logic, not mess with the defaults.

Agreed. But we could disable huge pages explicitly inside initdb - there's
really no point in using it there...

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #17757: Not honoring huge_pages setting during initdb causes DB crash in Kubernetes