Re: Background Processes and reporting

Поиск
Список
Период
Сортировка
От Vladimir Borodin
Тема Re: Background Processes and reporting
Дата
Msg-id 9FE8342A-5A38-4F75-98F6-D1754FFE6CA1@simply.name
обсуждение исходный текст
Ответ на Re: Background Processes and reporting  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Background Processes and reporting
Список pgsql-hackers

14 марта 2016 г., в 22:21, Robert Haas <robertmhaas@gmail.com> написал(а):

On Sat, Mar 12, 2016 at 6:05 AM, Oleg Bartunov <obartunov@gmail.com> wrote:
So?

So, Robert already has experience with the subject, probably,  he has bad
experience with edb implementation and he'd like to see something better in
community version. That's fair and I accept his position.

Bingo - though maybe "bad" experience is not quite as accurate as
"could be better".

Wait monitoring is one of the popular requirement of russian companies, who
migrated from Oracle. Overwhelming majority of them use Linux, so I suggest
to have configure flag for including wait monitoring at compile time
(default is no wait monitoring), or have GUC variable, which is also off by
default, so we have zero to minimal overhead of monitoring. That way we'll
satisfy many enterprises and help them to choose postgres, will get feedback
from production use and have time for feature improving.

So, right now we can only display the wait information in
pg_stat_activity.  There are a couple of other things that somebody
might want to do:

1. Sample the wait state information across all backends in the
system.  On a large, busy system, this figures to be quite cheap, and
the sampling interval could be configurable.

2. Count every instance of every wait event in every backend, and roll
that up either via shared memory or additional stats messges.

3. Like #2, but with timing information.

4. Like #2, but on a per-query basis, somehow integrated with
pg_stat_statements.

5. Show extra information about wait event (i.e. exclusive of shared mode for LWLocks, relation/forknum/blknum for I/O operations, etc.).


The challenge with any of these except #1 is that they are going to
produce a huge volume of data, and, whether you believe it or not, #3
is going to sometimes be crushingly slow.  Really.  I tend to think
that #1 might be better than #2 or #3, but I'm not unwilling to listen
to contrary arguments, especially if backed up by careful benchmarking
showing that the performance hit is negligible.

I have already shown [0, 1] the overhead of measuring timings in linux on representative workload. AFAIK, these tests were the only one that showed any numbers. All other statements about terrible performance have been and remain unconfirmed.

As for the size of such information it of course should be configurable. I.e. in Oracle there is a GUC for the size of ring buffer to store history of sampling with extra information about each wait event.


 My reason for wanting
to get the stuff we already had committed first is because I have
found that it is best to proceed with these kinds of problems
incrementally, not trying to solve too much in a single commit.  Now
that we have the basics, we can build on it, adding more wait events
and possibly more recordkeeping for the ones we have already - but
anything that regresses performance for people not using the feature
is a dead end in my book, as is anything that introduces overall
stability risks.

Ok, doing it in short steps seems to be a good plan. Any objections against giving people an ability to turn some feature (i.e. notorious measuring timings) even if it makes some performance degradation? Of course, it should be turned off by default.


I think the way forward from here is that Postgres Pro should (a)
rework their implementation to work with what has already been
committed, (b) consider carefully whether they've done everything
possible to contain the performance loss, (c) benchmark it on several
different machines and workloads to see how much performance loss
there is, and (d) stop accusing me of acting in bad faith.

If anything, I’m not from PostgresPro and I’m not «accusing you». But to be honest current committed implementation has been tested exactly on one machine with two workloads. And I think, it is somehow unfair to demand more from others. Although it doesn’t mean that testing on exactly one machine with only one OS is enough, of course. I suppose, you should ask the authors to test it on some representative hardware and workload but if authors don’t have them, it would be nice to help them with that.

Also it would be really interesting to hear your opinion about the initial Andres’s question. Any thoughts about changing current committed implementation?


--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
May the force be with you…

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Parallel Aggregate
Следующее
От: Paul Ramsey
Дата:
Сообщение: Re: Parallel Aggregate