Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections

Поиск
Список
Период
Сортировка
От Yura Sokolov
Тема Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections
Дата
Msg-id 997da6a8e1ee5a2e0c85606fde0c19870b72f6d4.camel@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [PATCH] fix wait_event of pg_stat_activity in case of high amount of connections  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
В Пт, 08/07/2022 в 09:44 -0400, Robert Haas пишет:
> On Thu, Jul 7, 2022 at 10:39 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > At Thu, 7 Jul 2022 13:58:06 -0500, Justin Pryzby <pryzby@telsasoft.com> wrote in
> > > I agree that this is a bug, since it can (and did) cause false positives in a
> > > monitoring system.
> > 
> > I'm not this is undoubtfully a bug but agree about the rest.
> 
> I don't agree that this is a bug, and even if it were, I don't think
> this patch can fix it.
> 
> Let's start with the second point first: pgstat_report_wait_start()
> and pgstat_report_wait_end() change the advertised wait event for a
> process, while the backend state is changed by
> pgstat_report_activity(). Since those function calls are in different
> places, those changes are bound to happen at different times, and
> therefore you can observe drift between the two values. Now perhaps
> there are some one-directional guarantees: I think we probably always
> set the state to idle before we start reading from the client, and
> always finish reading from the client before the state ceases to be
> idle. But I don't really see how that helps anything, because when you
> read those values, you must read one and then the other. If you read
> the activity before the wait event, you might see the state before it
> goes idle and then the wait event after it's reached ClientRead. If
> you read the wait event before the activity, you might see the wait
> event as ClientRead, and then by the time you check the activity the
> backend might have gotten some data from the client and no longer be
> idle. The very best a patch like this can hope to do is narrow the
> race condition enough that the discrepancies are observed less
> frequently in practice.
> 
> And that's why I think this is not a bug fix, or even a good idea.
> It's just encouraging people to rely on something which can never be
> fully reliable in the way that the original poster is hoping. There
> was never any intention of having wait events synchronized with the
> pgstat_report_activity() stuff, and I think that's perfectly fine.
> Both systems are trying to provide visibility into states that can
> change very quickly, and therefore they need to be low-overhead, and
> therefore they use very lightweight synchronization, which means that
> ephemeral discrepancies are possible by nature. There are plenty of
> other examples of that as well. You can't for example query pg_locks
> and pg_stat_activity in the same query and expect that all and only
> those backends that are apparently waiting for a lock in
> pg_stat_activity will have an ungranted lock in pg_locks. It just
> doesn't work like that, and there's a very good reason for that:
> trying to make all of these introspection facilities behave in
> MVCC-like ways would be painful to code and probably end up slowing
> the system down substantially.
> 
> I think the right fix here is to change nothing in the code, and stop
> expecting these things to be perfectly consistent with each other.

I see analogy with Bus Stop:
- there is bus stop
- there is a schedule of bus arriving this top
- there are passengers, who every day travel with this bus

Bus occasionally comes later... Well, it comes later quite often...

Which way Major (or other responsible person) should act?

First possibility: do all the best Bus comes at the schedule. And
although there will no be 100% guarantee, it will raise from 90%
to 99%.

Second possibility: tell the passengers "you should not rely on bus
schedule, and we will not do anything to make it more reliable".

If I were passenger, I'd prefer first choice.


regards

Yura




В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Steele
Дата:
Сообщение: Re: Add function to return backup_label and tablespace_map
Следующее
От: Maxim Orlov
Дата:
Сообщение: Re: Add 64-bit XIDs into PostgreSQL 15