Re: recent failures on lorikeet

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: recent failures on lorikeet
Дата
Msg-id 241120.1623691123@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: recent failures on lorikeet  (Andrew Dunstan <andrew@dunslane.net>)
Ответы Re: recent failures on lorikeet  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
> The line in lmgr.c is where the process title gets changed to "waiting".
> I recently stopped setting process title on this animal on REL_13_STABLE
> and its similar errors have largely gone away.

Oooh, that certainly seems like a smoking gun.

> I can do the same on
> HEAD. But it does make me wonder what the heck has changed to make this
> code fragile.

So what we've got there is

        old_status = get_ps_display(&len);
        new_status = (char *) palloc(len + 8 + 1);
        memcpy(new_status, old_status, len);
        strcpy(new_status + len, " waiting");
        set_ps_display(new_status);
        new_status[len] = '\0'; /* truncate off " waiting" */

Line 1831 is the strcpy, but it seems entirely impossible that that
could fail, unless palloc has shirked its job.  I'm thinking that
the crash is really in the memcpy --- looking at the other lines
in your trace, fingering the line after the call seems common.

What that'd have to imply is that get_ps_display() messed up,
returning a bad pointer or a bad length.

A platform-specific problem in get_ps_display() seems plausible
enough.  The apparent connection to a concurrent VACUUM FULL seems
pretty hard to explain that way ... but maybe that's a mirage.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Question about StartLogicalReplication() error path
Следующее
От: Tom Lane
Дата:
Сообщение: Re: recent failures on lorikeet