Re: 64-bit wait_event and introduction of 32-bit wait_event_arg
| От | Jakub Wartak |
|---|---|
| Тема | Re: 64-bit wait_event and introduction of 32-bit wait_event_arg |
| Дата | |
| Msg-id | CAKZiRmyZzmOODYS6n8mns9zN4RcS3o9kfrdQDyeRupqaGp9PmQ@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: 64-bit wait_event and introduction of 32-bit wait_event_arg (Jakub Wartak <jakub.wartak@enterprisedb.com>) |
| Список | pgsql-hackers |
On Tue, Dec 9, 2025 at 10:11 AM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
>
> Hi Heikki, thanks for having a look!
>
> On Mon, Dec 8, 2025 at 11:12 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> >
> > On 08/12/2025 11:54, Jakub Wartak wrote:
> > > While thinking about cons, the only cons that I could think of is that
> > > when we would be exposing something as 32-bits , then if the following
> > > major release changes some internal structure/data type to be a bit
> > > more heavy, it couldn't be exposed anymore like that (think of e.g.
> > > 64-bit OIDs?)
> > >
> > > Any help, opinions, ideas and code/co-authors are more than welcome.
>
> > Expanding it to 64 bit seems fine as far as performance is concerned. I
> > think the difficult and laborious part is to design the facilities to
> > make use of it.
>
> Right, I'm very interested in hearing what could be added there/what
> people want (bonus points if that is causing some performance issues
> today and we do not have the area covered and exposing that would fit
> in 32-bits ;) )
>
OK, so v3 is attached. Changes in v3:
- added proper RelFileNumber as wait_event_arg for DataFileRead/Write/etc
waits instead of simply using "filedescriptor" as wait_event_arg
- cfbot complained hard on win32 due to lack of support of uint64 for enums
("warning C4309: 'initializing': truncation of constant value"), so i've
tried two ways how enum can be forced into 64-bit ints instead of just
default (32-bit int). However none of the tricks seem to help the MSVC case:
a) `typedef enum : uint64_t` causes ""error C2332: 'enum': missing tag name"
b) putting `PG_WAIT_ACTIVITY_MAX = 0xFFFFFFFFFFFFFFFFULL` at the end of
enum also doesnt work
so I had to get rid of enum{} and stick to #defines to make cfbot happy there
- pass RelFileNumber/tablespaceId as wait_event_arg for recovery conflict waits
(earlier you would get that information only from log, but here we pinpoint
exact RelFileNumber for which startup is waiting), e.g. use case demo, we run
some long analytical query on standby (while read/write pgbench is
hitting hard
primary and we run without hot_standby_feedback):
s1) "SELECT count(*) FROM pgbench_accounts a CROSS JOIN pgbench_accounts b;"
s2) we immediately can see query wait_event_arg and it shows
recovery being stuck
on the specific relationId:
pid | backend_type | type | wait_event | wait_event_arg
-------+--------------+------+--------------------------+----------------
68824 | startup | IPC | RecoveryConflictSnapshot | 16427
postgres=# select relname from pg_class where relfilenode = 16427;
relname
------------------
pgbench_branches
s1) after some time (max_standby_streaming_delay) we get:
ERROR: canceling statement due to conflict with recovery
- added description of wait_event_arg to wait event infrastructure
(pg_wait_events view and docs)
- if there's high I/O on SLRU we can get data from pg_stat_slru,
however previously
one couldn't exactly pinpoint which exact SLRU type affects which backend,
so I've thought I've add class of Slru to IO/SLRU{Read,Write} as
wait_event_arg to make it easier on multitenant DBs, e.g. it shows:
pid | query | type | wait_event | wait_event_arg
------+----------------------------------+------+------------+----------------
57400 | update locations set loc_name .. | IO | SlruRead | 5
57605 | INSERT INTO users (loc_id, fna.. | IO | SlruRead | 6
(2 rows)
postgres=# select waiteventarg_description from pg_wait_events where
name='SlruRead';
waiteventarg_description
---------------------------------------------------------------------------------------
SlruType: unknown(0), [..] multixactoffset (5), multixactmembers(6),
serialializable(7)
-- \d will show FK (so we connect the dots with less ambiguity about
FK <-> multixacts):
postgres=# \d+ users
[..]
Foreign-key constraints:
"fk1" FOREIGN KEY (loc_id) REFERENCES locations(loc_id)
postgres=# \d+ locations
[..]
Referenced by:
TABLE "users" CONSTRAINT "fk1" FOREIGN KEY (loc_id) REFERENCES
locations(loc_id)
> > For example, if you encode an table OID in it, how do
> > you interpret that when you're looking at pg_stat_activity? A new
> > pg_explain_wait_event(bigint waitevent) that returns a text
> > representation of the event perhaps?
>
> Well I was thinking initially[..irrelevant, so snipped out]
Right, so v3 has built-in self-description of wait_event_arg in
pg_wait_events (and also docs also contain such details too)
[..]
> > Inevitably, the extra 32 bits won't be enough to expose everything that
> > you might want to expose. Should we already think about what to do then?
>
> Well I wanted to stick to exposing only stuff that will _always_ fit
> 32-bits. If additional/more detailed instrumentation would be
> necessary then separate monitoring/observability/variables/subsystem
> probably should be built for that specific use case. So if that
> information can become over 32-bit, it should not be encoded into
> wait_event_arg, just to avoid debating performance regressions for any
> other additional wait-event infrastructure. I simply do not want to
> open a can of worms: see Bertrand tried that in [1], but I don't want
> this $thread to follow that route where Andres and Robert expressed
> their concerns earlier. E.g. one of the key questions is that I'm
> somehow lost if we would like to continue the earlier 56-bit [2] /
> 64-bit OID/RelFileNode attempt(s). If the project wants to continue
> with that, then probably we couldn't express ::relation id as 32-bit
> wait_event_arg or maybe I am missing something. (ofc, we could hash
> potential 64-bit OID back into 32-bit OID one day, but it sounds like
> a hack, doesn't it?)
>
Questions:
1. Question about 56-bit relfilenode idea [1] (05d4cbf9b6ba, reverted by
a448e49bcbe): can I assume that it is dead in the water and can I assume
that >> 33-bits RelFileNode is not going to happen?
(if my 64-bit wait_events with 32-bits for wait_events_args use
RelFileNode -- that makes it incompatible)
2. Please ignore the 0002 quality (multixact), but I would grateful for feedback
on is such extending MultiXact routines (to contain RelFileNumber) ok or
not ok? And if not , what would be a better way to pass through
such information?
-J.
[1] - https://www.postgresql.org/message-id/CA+TgmobM5FN5x0u3tSpoNvk_TZPFCdbcHxsXCoY1ytn1dXROvg@mail.gmail.com
Вложения
В списке pgsql-hackers по дате отправления: