Обсуждение: Tracking wait event for latches

Поиск
Список
Период
Сортировка

Tracking wait event for latches

От
Michael Paquier
Дата:
Hi all,

As I mentioned $subject a couple of months back after looking at the
wait event facility here:
http://www.postgresql.org/message-id/CAB7nPqTJpgAvOK4qSC96Fpm5W+aCtJ9D=3Vn9AfiEYsur=-juw@mail.gmail.com
I have actually taken some time to implement this idea.

The particular case that I had in mind was to be able to track in
pg_stat_activity processes that are waiting on a latch for synchronous
replication, and here is what this patch gives in this case:
=# alter system set synchronous_standby_names = 'foo';
ALTER SYSTEM
=# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)
=# -- Do something
[...]

And from another session:
=# select wait_event_type, wait_event from pg_stat_activity where pid = 83316;
 wait_event_type | wait_event
-----------------+------------
 Latch           | SyncRep
(1 row)

This is a boring patch, and it relies on the wait event facility that
has been added recently in 9.6. Note a couple of things though:
1) There is something like 30 code paths calling WaitLatch in the
backend code, all those events are classified and documented similarly
to LWLock events.
2) After discussing this stuff while at PGCon, it does not seem worth
to have any kind of APIs to be able to add in shared memory custom
latch names that extensions could load through _PG_init(). In
replacement to that, there is a latch type flag called "Extension"
that can be used for this purpose.
Comments are welcome, I am adding that in the 9.7 queue.

Regards,
--
Michael

Вложения

Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Thu, May 19, 2016 at 4:14 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Comments are welcome, I am adding that in the 9.7 queue.

Take that as 10.0 as things are going.
-- 
Michael



Re: Tracking wait event for latches

От
Thomas Munro
Дата:
On Fri, May 20, 2016 at 8:14 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Hi all,
>
> As I mentioned $subject a couple of months back after looking at the
> wait event facility here:
> http://www.postgresql.org/message-id/CAB7nPqTJpgAvOK4qSC96Fpm5W+aCtJ9D=3Vn9AfiEYsur=-juw@mail.gmail.com
> I have actually taken some time to implement this idea.
>
> The particular case that I had in mind was to be able to track in
> pg_stat_activity processes that are waiting on a latch for synchronous
> replication, and here is what this patch gives in this case:
> =# alter system set synchronous_standby_names = 'foo';
> ALTER SYSTEM
> =# select pg_reload_conf();
>  pg_reload_conf
> ----------------
>  t
> (1 row)
> =# -- Do something
> [...]
>
> And from another session:
> =# select wait_event_type, wait_event from pg_stat_activity where pid = 83316;
>  wait_event_type | wait_event
> -----------------+------------
>  Latch           | SyncRep
> (1 row)
>
> This is a boring patch, and it relies on the wait event facility that
> has been added recently in 9.6. Note a couple of things though:
> 1) There is something like 30 code paths calling WaitLatch in the
> backend code, all those events are classified and documented similarly
> to LWLock events.
> 2) After discussing this stuff while at PGCon, it does not seem worth
> to have any kind of APIs to be able to add in shared memory custom
> latch names that extensions could load through _PG_init(). In
> replacement to that, there is a latch type flag called "Extension"
> that can be used for this purpose.
> Comments are welcome, I am adding that in the 9.7 queue.

This is very cool, and I can't wait to have this feature!  It'll be
useful for all kinds of developers and users.  I wanted this today to
help debug something I am working on, and remembered this patch.  I
have signed up as a reviewer for the September commitfest.  But here
is some initial feedback based on a quick glance at it:

This patch allows identifiers to be specified by the WaitLatch and
WaitLatchOrSocket calls, but not for WaitEventSetWait, which is the
more general waiting primitive.  I think it should be done by
WaitEventSetWait, and merely passed down by those wrappers functions.

These are actually names for *wait points*, not names for latches.
Some of the language in this patch makes it sound like they are latch
names/latch identifiers, which is inaccurate (the latches themselves
wouldn't be very interesting eg MyLatch).  In some cases the main
thing of interest is actually a socket or timer anyway, not a latch,
so maybe it should appear as wait_event_type = WaitEventSet?

Is there a reason you chose names like 'WALWriterMain'?  That
particular wait point is in the function WalWriterMain (note different
case).  It seems odd to use the containing function names to guide
naming, but not use them exactly.  Since this namespace needs to be
able to name wait points anywhere in the postgres source tree (and
maybe outside it too, for extensions), maybe it should be made
hierarchical, like 'walwriter.WalWriterMain' (or some other
organisational scheme).

I personally think it would be very useful for extensions to be able
to name their wait points.  For example, I'd rather see
'postgres_fdw.pgfdw_get_result' or similar than a vague 'Extension'
string which appears not only for all wait points in an extension but
also for all extensions.  I hope we can figure out a good way to do
that.  Clearly that would involve some shmem registry machinery to
make the names visible across backends (a similar problem exists with
lock tranche names).

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Thu, Jun 2, 2016 at 12:25 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> This patch allows identifiers to be specified by the WaitLatch and
> WaitLatchOrSocket calls, but not for WaitEventSetWait, which is the
> more general waiting primitive.  I think it should be done by
> WaitEventSetWait, and merely passed down by those wrappers functions.

The advantage of having WaitEventSetWait track is that we could track
the events of secure_read and secure_write. One potential problem by
doing so is if those routines are called during authentication. I
don't recall that's the case, but this needs a double-check.

> These are actually names for *wait points*, not names for latches.

OK.

> Some of the language in this patch makes it sound like they are latch
> names/latch identifiers, which is inaccurate (the latches themselves
> wouldn't be very interesting eg MyLatch).  In some cases the main
> thing of interest is actually a socket or timer anyway, not a latch,
> so maybe it should appear as wait_event_type = WaitEventSet?

Hm. A latch could wait for multiple types things it is waiting for, so
don't you think we'd need to add more details in what is reported to
pg_stat_activity? There are two fields now, and in the context of this
patch:
- wait_event_type, which I'd like to think is associated to a latch,
so I named it so.
- wait_event, which is the code path that we are waiting at, like
SyncRep, the WAL writer main routine, etc.

Now if you would like to get a list of all the things that are being
waited for, shouldn't we add a third column to the set that has text[]
as return type? Let's name it wait_event_details, and for a latch we
have the following:
- WL_LATCH_SET
- WL_POSTMASTER_DEATH
- WL_SOCKET_READABLE
- WL_SOCKET_WRITEABLE
Compared to all the other existing wait types, that's a bit new and
that's exclusive to latches because we need a higher level of details.
Don't you think so? But actually I don't think that's necessary to go
into this level of details. We already know what a latch is waiting
for by looking at the code...

> Is there a reason you chose names like 'WALWriterMain'?  That
> particular wait point is in the function WalWriterMain (note different
> case).  It seems odd to use the containing function names to guide
> naming, but not use them exactly.  Since this namespace needs to be
> able to name wait points anywhere in the postgres source tree (and
> maybe outside it too, for extensions), maybe it should be made
> hierarchical, like 'walwriter.WalWriterMain' (or some other
> organisational scheme).

Yeah, possibly this could be improved. I have put some thoughts in
having clear names for each one of them, so I'd rather keep them
simple.

> I personally think it would be very useful for extensions to be able
> to name their wait points.  For example, I'd rather see
> 'postgres_fdw.pgfdw_get_result' or similar than a vague 'Extension'
> string which appears not only for all wait points in an extension but
> also for all extensions.  I hope we can figure out a good way to do
> that.  Clearly that would involve some shmem registry machinery to
> make the names visible across backends (a similar problem exists with
> lock tranche names).

This patch is shaped this way intentionally based on the feedback I
received at PGCon (Robert and others). We could provide a routine that
extensions call in _PG_init to register a new latch event name in
shared memory, but I didn't see much use in doing so, take for example
the case of background worker, it is possible to register a custom
string for pg_stat_activity via pgstat_report_activity. One could take
advantage of having custom latch wait names in shared memory if an
extension has wait points with latches though... But I am still not
sure if that's worth the complexity.
-- 
Michael



Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Thu, Jun 2, 2016 at 1:34 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> This patch is shaped this way intentionally based on the feedback I
> received at PGCon (Robert and others). We could provide a routine that
> extensions call in _PG_init to register a new latch event name in
> shared memory, but I didn't see much use in doing so, take for example
> the case of background worker, it is possible to register a custom
> string for pg_stat_activity via pgstat_report_activity. One could take
> advantage of having custom latch wait names in shared memory if an
> extension has wait points with latches though... But I am still not
> sure if that's worth the complexity.

I can't see how you could ever guarantee that it wouldn't just fail.
We allocate a certain amount of "slop" in the main shared memory
segment, but it's not infinite and can certainly be exhausted.  It
seems like it would suck if you tried to load your extension and it
failed because there was no room left for more wait-point names.
Maybe it would suck less than not having wait-point names, but I'm not
really sure.  I think we'd do better to get something that handles the
core stuff well and then consider extensions later or not at all.  It
only matters if you're running multiple extensions that all use LWLock
tranches and you need to distinguish between waits on their various
LWLocks.  But since LWLock contention is something we hope will be
infrequent I'm just not sure that case is common enough to justify
building a lot of mechanism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Sat, Jun 4, 2016 at 2:41 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jun 2, 2016 at 1:34 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> This patch is shaped this way intentionally based on the feedback I
>> received at PGCon (Robert and others). We could provide a routine that
>> extensions call in _PG_init to register a new latch event name in
>> shared memory, but I didn't see much use in doing so, take for example
>> the case of background worker, it is possible to register a custom
>> string for pg_stat_activity via pgstat_report_activity. One could take
>> advantage of having custom latch wait names in shared memory if an
>> extension has wait points with latches though... But I am still not
>> sure if that's worth the complexity.
>
> I can't see how you could ever guarantee that it wouldn't just fail.
> We allocate a certain amount of "slop" in the main shared memory
> segment, but it's not infinite and can certainly be exhausted.  It
> seems like it would suck if you tried to load your extension and it
> failed because there was no room left for more wait-point names.
> Maybe it would suck less than not having wait-point names, but I'm not
> really sure.  I think we'd do better to get something that handles the
> core stuff well and then consider extensions later or not at all.

Yeah, that's as well my line of thoughts on the matter since the
beginning: keep it simple and done. What is written just after those
words is purely hand-waving and I have no way to prove it, but my
instinctive guess is that more than 90% of the real use cases where we
need to track the latch waits in pgstat would be covered without the
need of this extra shared memory infrastructure for extensions.
-- 
Michael



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Wed, Jun 8, 2016 at 10:11 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Yeah, that's as well my line of thoughts on the matter since the
> beginning: keep it simple and done. What is written just after those
> words is purely hand-waving and I have no way to prove it, but my
> instinctive guess is that more than 90% of the real use cases where we
> need to track the latch waits in pgstat would be covered without the
> need of this extra shared memory infrastructure for extensions.

So, I have done some extra tests with my patch to see if I move the
event reporting down to WaitEventSetWait and see what is the effect on
secure_read and secure_write. And the conclusion is that I am seeing
no difference, so I changed the patch to the way suggested by Thomas.
In this test, what I have done was using the following pgbench script
with -C via an SSL connection:
\set id random(1,1000)
As the script file is not empty, a connection to the server is opened,
so this goes though secure_read at minimal cost on the backend.

Also, I have change the event ID notation as follows to be consistent
with the routine names:
WAL -> Wal
PG -> Pg
I also found that LATCH_RECOVERY_WAL_ALL was reporting
"XLogAfterAvailable" as name, which was incorrect.

Finally, I have changed the patch so as it does not track "Latch"
events, but "EventSet" events instead, per the suggestion of Thomas.
"WaitEventSet" is too close to wait_event in my taste so I shortened
the suggeston. There were also some conflicts caused by the recent
commit 887feefe, which are fixed.

Attached is an updated patch.
--
Michael

Вложения

Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Tue, Aug 2, 2016 at 10:31 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Attached is an updated patch.

Updated version for 2 minor issues:
1) s/stram/stream/
2) Docs used incorrect number
--
Michael

Вложения

Re: Tracking wait event for latches

От
Alexander Korotkov
Дата:
Hi, Michael!

On Thu, Aug 4, 2016 at 8:26 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Aug 2, 2016 at 10:31 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Attached is an updated patch.

Updated version for 2 minor issues:
1) s/stram/stream/
2) Docs used incorrect number

I took a look at your patch. Couple of notes from me.

const char *
GetEventIdentifier(uint16 eventId)
{
const char *res;
switch (eventId)
{
case EVENT_ARCHIVER_MAIN:
res = "ArchiverMain";
break;
... long long list of events ...
case EVENT_WAL_SENDER_WRITE_DATA:
res = "WalSenderWriteData";
break;
default:
res = "???";
}
return res;
}

Would it be better to use an array here?

typedef enum EventIdentifier
{

EventIdentifier seems too general name for me, isn't it?  Could we name it WaitEventIdentifier? Or WaitEventId for shortcut?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Tracking wait event for latches

От
Alexander Korotkov
Дата:
On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
I took a look at your patch. Couple of notes from me.

const char *
GetEventIdentifier(uint16 eventId)
{
const char *res;
switch (eventId)
{
case EVENT_ARCHIVER_MAIN:
res = "ArchiverMain";
break;
... long long list of events ...
case EVENT_WAL_SENDER_WRITE_DATA:
res = "WalSenderWriteData";
break;
default:
res = "???";
}
return res;
}

Would it be better to use an array here?

typedef enum EventIdentifier
{

EventIdentifier seems too general name for me, isn't it?  Could we name it WaitEventIdentifier? Or WaitEventId for shortcut?

I'm also not sure about handling of secure_read() and secure_write() functions.
In the current patch we're tracking latch event wait inside them.  But we setup latch only for blocking sockets and can do it multiple times in loop.
Would it be better to make separate wait events NETWORK_READ and NETWORK_WRITE and setup them for the whole time spent in secure_read() and secure_write()?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Mon, Aug 22, 2016 at 6:09 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> Hi, Michael!
>
> On Thu, Aug 4, 2016 at 8:26 AM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
> I took a look at your patch. Couple of notes from me.

Thanks!

>> const char *
>> GetEventIdentifier(uint16 eventId)
>> {
>> const char *res;
>> switch (eventId)
>> {
>> case EVENT_ARCHIVER_MAIN:
>> res = "ArchiverMain";
>> break;
>> ... long long list of events ...
>> case EVENT_WAL_SENDER_WRITE_DATA:
>> res = "WalSenderWriteData";
>> break;
>> default:
>> res = "???";
>> }
>> return res;
>> }
>
>
> Would it be better to use an array here?

The reason why I chose this way is that there are a lot of them. It is
painful to maintain the order of the array elements in perfect mapping
with the list of IDs...

>> typedef enum EventIdentifier
>> {
>
>
> EventIdentifier seems too general name for me, isn't it?  Could we name it
> WaitEventIdentifier? Or WaitEventId for shortcut?

OK. So WaitEventIdentifier? The reason to include Identifier is for
consistency with lwlock structure notation.
-- 
Michael



Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Mon, Aug 22, 2016 at 9:49 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> The reason why I chose this way is that there are a lot of them. It is
> painful to maintain the order of the array elements in perfect mapping
> with the list of IDs...

You can use stupid macro tricks to help with that problem...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Tue, Aug 23, 2016 at 12:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Aug 22, 2016 at 9:49 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> The reason why I chose this way is that there are a lot of them. It is
>> painful to maintain the order of the array elements in perfect mapping
>> with the list of IDs...
>
> You can use stupid macro tricks to help with that problem...

Yeah, still after thinking about it I think I would just go with an
array like lock types and be done with it. With a comment to mention
that the order should be respected things would be enough...
-- 
Michael



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Mon, Aug 22, 2016 at 6:46 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
>>
>> I took a look at your patch. Couple of notes from me.
>>
>>> const char *
>>> GetEventIdentifier(uint16 eventId)
>>> {
>>> const char *res;
>>> switch (eventId)
>>> {
>>> case EVENT_ARCHIVER_MAIN:
>>> res = "ArchiverMain";
>>> break;
>>> ... long long list of events ...
>>> case EVENT_WAL_SENDER_WRITE_DATA:
>>> res = "WalSenderWriteData";
>>> break;
>>> default:
>>> res = "???";
>>> }
>>> return res;
>>> }
>>
>>
>> Would it be better to use an array here?

Okay, I have switched to an array....

>>> typedef enum EventIdentifier
>>> {
>>
>>
>> EventIdentifier seems too general name for me, isn't it?  Could we name it
>> WaitEventIdentifier? Or WaitEventId for shortcut?

... And WaitEventIdentifier.

> I'm also not sure about handling of secure_read() and secure_write()
> functions.
> In the current patch we're tracking latch event wait inside them.  But we
> setup latch only for blocking sockets and can do it multiple times in loop.
> Would it be better to make separate wait events NETWORK_READ and
> NETWORK_WRITE and setup them for the whole time spent in secure_read() and
> secure_write()?

The whole point is to track a waiting event when we are sure that it
is going to wait, which is why the patch depends on WaitEventSetWait.
If we would set up those flags at the beginning and reset them of
secure_read and secure_write, we may actually track an event that is
not blocking.
--
Michael

Вложения

Re: Tracking wait event for latches

От
Thomas Munro
Дата:
+         <para>
+          <literal>EventSet</>: The server process is waiting on a socket
+          or a timer. This wait happens in a latch, an inter-process facility
+          using boolean variables letting a process sleep until it is set.
+          Latches support several type of operations, like postmaster death
+          handling, timeout and socket activity lookup, and are a useful
+          replacement for <function>sleep</> where signal handling matters.
+         </para>

This paragraph seems a bit confused.  I suggest something more like
this:  "The server process is waiting for one or more sockets, a timer
or an interprocess latch.  The wait happens in a WaitEventSet,
<productname>PostgreSQL</>'s portable IO multiplexing abstraction."

On Tue, Aug 23, 2016 at 7:01 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
>> On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov
>> <a.korotkov@postgrespro.ru> wrote:
>>> Would it be better to use an array here?
>
> Okay, I have switched to an array....

I looked at various macro tricks[1] but they're all pretty unpleasant!+1 for the simple array with carefully managed
order. About that
 
order...

+const char *const WaitEventNames[] = {
+ "ArchiverMain",
+ "AutoVacuumMain",
+ "BaseBackupThrottle",
+ "BgWorkerStartup",
+ "BgWorkerShutdown",
+ "BgWriterMain",
+ "BgWriterHibernate",
+ "CheckpointerMain",
+ "ExecuteGather",
+ "Extension",
+ "MessageQueuePutMessage",
+ "MessageQueueSend",
+ "MessageQueueReceive",
+ "MessageQueueInternal",
+ "ParallelFinish",
+ "PgStatMain",
+ "ProcSleep",
+ "ProcSignal",
+ "PgSleep",
+ "SecureRead",
+ "SecureWrite",
+ "SSLOpenServer",
+ "SyncRep",
+ "SysLoggerMain",
+ "RecoveryApplyDelay",
+ "RecoveryWalAll",
+ "RecoveryWalStream",
+ "WalReceiverWaitStart",
+ "WalReceiverMain",
+ "WalSenderMain",
+ "WalSenderWaitForWAL",
+ "WalSenderWriteData"
+ "WalWriterMain",
+};

It looks like this array wants to be in alphabetical order, but it
isn't quite.  Also, perhaps a compile time assertion about the size of
the array matching EVENT_LAST_TYPE could be useful?

+typedef enum WaitEventIdentifier
+{
+ EVENT_ARCHIVER_MAIN,
+ EVENT_AUTOVACUUM_MAIN,
+ EVENT_BASEBACKUP_THROTTLE,
+ EVENT_BGWORKER_STARTUP,
+ EVENT_BGWORKER_SHUTDOWN,
+ EVENT_BGWRITER_MAIN,
+ EVENT_BGWRITER_HIBERNATE,
+ EVENT_CHECKPOINTER_MAIN,
+ EVENT_EXECUTE_GATHER,
+ EVENT_EXTENSION,
+ EVENT_MQ_PUT_MESSAGE,
+ EVENT_MQ_SEND_BYTES,
+ EVENT_MQ_RECEIVE_BYTES,
+ EVENT_MQ_WAIT_INTERNAL,
+ EVENT_PARALLEL_WAIT_FINISH,
+ EVENT_PGSTAT_MAIN,
+ EVENT_PROC_SLEEP,
+ EVENT_PROC_SIGNAL,
+ EVENT_PG_SLEEP,
+ EVENT_SECURE_READ,
+ EVENT_SECURE_WRITE,
+ EVENT_SSL_OPEN_SERVER,
+ EVENT_SYNC_REP,
+ EVENT_SYSLOGGER_MAIN,
+ EVENT_RECOVERY_APPLY_DELAY,
+ EVENT_RECOVERY_WAL_ALL,
+ EVENT_RECOVERY_WAL_STREAM,
+ EVENT_WAL_RECEIVER_WAIT_START,
+ EVENT_WAL_RECEIVER_MAIN,
+ EVENT_WAL_SENDER_WRITE_DATA,
+ EVENT_WAL_SENDER_MAIN,
+ EVENT_WAL_SENDER_WAIT_WAL,
+ EVENT_WALWRITER_MAIN
+} WaitEventIdentifier;

This is also nearly but not exactly in alphabetical order
(EVENT_PROC_SIGNAL comes after EVENT_PROC_SLEEP for example).  But
it's not currently possible to have the strings and the enumerators
both in alphabetical order because they're not the same, even
accounting for camel-case to upper-case transliteration.  I think at
least one of them should be sorted.  Shouldn't they match fully and
then *both* be sorted alphabetically?  For example
"MessageQueueInternal" doesn't match EVENT_MQ_WAIT_INTERNAL.  Then
there are some cases where I'd expect underscores for consistency with
other enumerators and with the corresponding camel-case strings: you
have EVENT_WAL_SENDER_MAIN, but EVENT_WALWRITER_MAIN.

The documentation is in a slightly different order again but also not
exactly alphabetical: for example ProcSleep is listed before
ProcSignal.

Sorry if this all sounds terribly picky but I think we should try to
be strictly systematic here.

>>> EventIdentifier seems too general name for me, isn't it?  Could we name it
>>> WaitEventIdentifier? Or WaitEventId for shortcut?
>
> ... And WaitEventIdentifier.

+1 from me too for avoiding the overly general term 'event'.  It does
seem a little odd to leave the enumerators names as EVENT_...  though;
shouldn't these be WAIT_EVENT_... or WE_...?  Or perhaps you could
consider WaitPointIdentifier and WP_SECURE_READ or
WaitEventPointIdentifier and WEP_SECURE_READ, if you buy my earlier
argument that what we are really naming here is point in the code
where we wait, not the events we're waiting for.  Contrast with
LWLocks where we report the lock that you're waiting for, not the
place in the code where you're waiting for that lock.

On Wed, Aug 3, 2016 at 1:31 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Finally, I have changed the patch so as it does not track "Latch"
> events, but "EventSet" events instead, per the suggestion of Thomas.
> "WaitEventSet" is too close to wait_event in my taste so I shortened
> the suggeston.

This is good, because showing "Latch" when we were really waiting for
a socket was misleading.

On the other hand, if we could *accurately* report whether it's a
"Latch", "Socket" or "Latch|Socket" that we're waiting for, it might
be cool to do that instead.  One way to do that would be to use up
several class IDs:  WAIT_EVENT_LATCH, WAIT_EVENT_LATCH_OR_SOCKET,
WAIT_EVENT_SOCKET (or perhaps WAIT_EVENT_LATCH | WAIT_EVENT_SOCKET,
reserving 2 or 3 upper bits from the 8 bit class ID for this).  Then
we could figure out the right class ID by looking at set->latch and
set->nevents (perhaps an extra boolean would be needed to record
whether postmaster death is in there so we could deduce whether there
are any sockets).  It would be interesting specifically for the case
of FDWs where it would be nice to be able to see clearly that it's
waiting for a remote server ("Socket").  It may also be interesting to
know if there is a timeout.  Postmaster death doesn't seem newsworthy,
we're nearly always also waiting for that exceptional event so it'd
just be clutter to report it.

[1] http://stackoverflow.com/questions/147267/easy-way-to-use-variables-of-enum-types-as-string-in-c

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Wed, Sep 21, 2016 at 8:13 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> +         <para>
> +          <literal>EventSet</>: The server process is waiting on a socket
> +          or a timer. This wait happens in a latch, an inter-process facility
> +          using boolean variables letting a process sleep until it is set.
> +          Latches support several type of operations, like postmaster death
> +          handling, timeout and socket activity lookup, and are a useful
> +          replacement for <function>sleep</> where signal handling matters.
> +         </para>
>
> This paragraph seems a bit confused.  I suggest something more like
> this:  "The server process is waiting for one or more sockets, a timer
> or an interprocess latch.  The wait happens in a WaitEventSet,
> <productname>PostgreSQL</>'s portable IO multiplexing abstraction."

OK, I have tweaked the paragraph as follows using your suggestion:
+        <listitem>
+         <para>
+          <literal>EventSet</>: The server process is waiting on one or more
+          sockets, a time or an inter-process latch.  The wait happens in a
+          <function>WaitEventSet</>, <productname>PostgreSQL</>'s portable
+          I/O multiplexing abstraction using boolean variables letting a
+          process sleep until it is set.  It supports several type of
+          operations, like postmaster death handling, timeout and socket
+          activity lookup, and are a useful replacement for <function>sleep</>
+          where signal handling matters.
+         </para>
+        </listitem>

> On Tue, Aug 23, 2016 at 7:01 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>>> On Mon, Aug 22, 2016 at 12:09 PM, Alexander Korotkov
>>> <a.korotkov@postgrespro.ru> wrote:
>>>> Would it be better to use an array here?
>>
>> Okay, I have switched to an array....
>
> I looked at various macro tricks[1] but they're all pretty unpleasant!
>  +1 for the simple array with carefully managed order.  About that
> order...
> [1] http://stackoverflow.com/questions/147267/easy-way-to-use-variables-of-enum-types-as-string-in-c

Yes, I recall bumping on this one, or something really similar to that...

> +const char *const WaitEventNames[] = {
> [...]
> +};
>
> It looks like this array wants to be in alphabetical order, but it
> isn't quite.  Also, perhaps a compile time assertion about the size of
> the array matching EVENT_LAST_TYPE could be useful?

In GetWaitEventIdentifier()? I'd think that just returning ??? would
have been fine if there is a non-matching call.

> +typedef enum WaitEventIdentifier
> +{
> [...]
> +} WaitEventIdentifier;
>
> This is also nearly but not exactly in alphabetical order
> (EVENT_PROC_SIGNAL comes after EVENT_PROC_SLEEP for example).  But
> it's not currently possible to have the strings and the enumerators
> both in alphabetical order because they're not the same, even
> accounting for camel-case to upper-case transliteration.  I think at
> least one of them should be sorted.  Shouldn't they match fully and
> then *both* be sorted alphabetically?  For example
> "MessageQueueInternal" doesn't match EVENT_MQ_WAIT_INTERNAL.  Then
> there are some cases where I'd expect underscores for consistency with
> other enumerators and with the corresponding camel-case strings: you
> have EVENT_WAL_SENDER_MAIN, but EVENT_WALWRITER_MAIN.

Not wrong..

> The documentation is in a slightly different order again but also not
> exactly alphabetical: for example ProcSleep is listed before
> ProcSignal.

Right.

> Sorry if this all sounds terribly picky but I think we should try to
> be strictly systematic here.

No worries about that, it matters a lot for this patch. The user-faced
documentation is what should do the decision-making I think. So let's
order the names, and adapt the enum depending on that. I have done so
after double-checking both lists, and added a comment for anybody
updating that in the fiture.

>>>> EventIdentifier seems too general name for me, isn't it?  Could we name it
>>>> WaitEventIdentifier? Or WaitEventId for shortcut?
>>
>> ... And WaitEventIdentifier.
>
> +1 from me too for avoiding the overly general term 'event'.  It does
> seem a little odd to leave the enumerators names as EVENT_...  though;
> shouldn't these be WAIT_EVENT_... or WE_...?  Or perhaps you could
> consider WaitPointIdentifier and WP_SECURE_READ or
> WaitEventPointIdentifier and WEP_SECURE_READ, if you buy my earlier
> argument that what we are really naming here is point in the code
> where we wait, not the events we're waiting for.  Contrast with
> LWLocks where we report the lock that you're waiting for, not the
> place in the code where you're waiting for that lock.

Well, WE_ if I need make a choice for something else than EVENT_.

> On the other hand, if we could *accurately* report whether it's a
> "Latch", "Socket" or "Latch|Socket" that we're waiting for, it might
> be cool to do that instead.  One way to do that would be to use up
> several class IDs:  WAIT_EVENT_LATCH, WAIT_EVENT_LATCH_OR_SOCKET,
> WAIT_EVENT_SOCKET (or perhaps WAIT_EVENT_LATCH | WAIT_EVENT_SOCKET,
> reserving 2 or 3 upper bits from the 8 bit class ID for this).  Then
> we could figure out the right class ID by looking at set->latch and
> set->nevents (perhaps an extra boolean would be needed to record
> whether postmaster death is in there so we could deduce whether there
> are any sockets).  It would be interesting specifically for the case
> of FDWs where it would be nice to be able to see clearly that it's
> waiting for a remote server ("Socket").  It may also be interesting to
> know if there is a timeout.  Postmaster death doesn't seem newsworthy,
> we're nearly always also waiting for that exceptional event so it'd
> just be clutter to report it.

That's actually pretty close to what I mentioned upthread here:
https://www.postgresql.org/message-id/CAB7nPqQx4OEym9cf22CY%3D5eWqqiAMjij6EBCoNReezt9-NvGkw%40mail.gmail.com
In order to support that adding a column wait_event_details with
text[] makes the most sense I guess. Still I think that's another
discussion, this patch does already a lot.

So I have adjusted the patch in many ways, tweaked the order of the
items, and adjusted some of their names as suggested by Thomas.
Updated version attached.
--
Michael

Вложения

Re: Tracking wait event for latches

От
Thomas Munro
Дата:
On Wed, Sep 21, 2016 at 3:40 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Sep 21, 2016 at 8:13 AM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> It looks like this array wants to be in alphabetical order, but it
>> isn't quite.  Also, perhaps a compile time assertion about the size of
>> the array matching EVENT_LAST_TYPE could be useful?
>
> In GetWaitEventIdentifier()? I'd think that just returning ??? would
> have been fine if there is a non-matching call.

Yeah but that's at run time.  I meant you could help developers
discover ASAP if they add a new item to one place but not the other
with a compile time assertion:
   const char *   GetWaitEventIdentifier(uint16 eventId)   {       StaticAssertStmt(lengthof(WaitEventNames) ==
WE_LAST_TYPE+ 1,                        "WaitEventNames must match WaitEventIdentifiers");       if (eventId >
WE_LAST_TYPE)          return "???";       return WaitEventNames[eventId];   }
 

>> +1 from me too for avoiding the overly general term 'event'.  It does
>> seem a little odd to leave the enumerators names as EVENT_...  though;
>> shouldn't these be WAIT_EVENT_... or WE_...?  Or perhaps you could
>> consider WaitPointIdentifier and WP_SECURE_READ or
>> WaitEventPointIdentifier and WEP_SECURE_READ, if you buy my earlier
>> argument that what we are really naming here is point in the code
>> where we wait, not the events we're waiting for.  Contrast with
>> LWLocks where we report the lock that you're waiting for, not the
>> place in the code where you're waiting for that lock.
>
> Well, WE_ if I need make a choice for something else than EVENT_.

You missed a couple that are hiding inside #ifdef WIN32:

From pgstat.c:
+   EVENT_PGSTAT_MAIN);

From syslogger.c:
+ EVENT_SYSLOGGER_MAIN);

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Wed, Sep 21, 2016 at 1:03 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Yeah but that's at run time.  I meant you could help developers
> discover ASAP if they add a new item to one place but not the other
> with a compile time assertion:
>     const char *
>     GetWaitEventIdentifier(uint16 eventId)
>     {
>         StaticAssertStmt(lengthof(WaitEventNames) == WE_LAST_TYPE + 1,
>                          "WaitEventNames must match WaitEventIdentifiers");
>         if (eventId > WE_LAST_TYPE)
>             return "???";
>         return WaitEventNames[eventId];
>     }

Ah, OK, good idea. I had AssertStmt in mind, not StaticAssertStmt.

> You missed a couple that are hiding inside #ifdef WIN32:
>
> From pgstat.c:
> +   EVENT_PGSTAT_MAIN);
>
> From syslogger.c:
> + EVENT_SYSLOGGER_MAIN);

Oops. Fixed those ones and checked the builds on WIN32.
--
Michael

Вложения

Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Tue, Sep 20, 2016 at 7:13 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> This paragraph seems a bit confused.  I suggest something more like
> this:  "The server process is waiting for one or more sockets, a timer
> or an interprocess latch.  The wait happens in a WaitEventSet,
> <productname>PostgreSQL</>'s portable IO multiplexing abstraction."

I'm worried we're exposing an awful lot of internal detail here.
Moreover, it's pretty confusing that we have this general concept of
wait events in pg_stat_activity, and then here the specific type of
wait event we're waiting for is the ... wait event kind.  Uh, what?

I have to admit that I like the individual event names quite a bit,
and I think the detail will be useful to users.  But I wonder if
there's a better way to describe the class of events that we're
talking about that's not so dependent on internal data structures.
Maybe we could divide these waits into a couple of categories - e.g.
"Socket", "Timeout", "Process" - and then divide these detailed wait
events among those classes.

The "SecureRead" and "SecureWrite" wait events are going to be
confusing, because the connection isn't necessarily secure.  I think
those should be called "ClientRead" and "ClientWrite".
Comprehensibility is more important than absolute consistency with the
C function names.

Another thing to think about is that there's no way to actually see
wait event information for a number of the processes which this patch
instruments, because they don't appear in pg_stat_activity.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Wed, Sep 21, 2016 at 10:23 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I have to admit that I like the individual event names quite a bit,
> and I think the detail will be useful to users.  But I wonder if
> there's a better way to describe the class of events that we're
> talking about that's not so dependent on internal data structures.
> Maybe we could divide these waits into a couple of categories - e.g.
> "Socket", "Timeout", "Process" - and then divide these detailed wait
> events among those classes.

pgstat.h is mentioning that there is 1 byte still free. I did not
notice that until a couple of minutes ago. There are 2 bytes used for
the event ID, and 1 byte for the class ID, but there are 4 bytes
available. Perhaps we could use this extra byte to store this extra
status information, then use it for WaitEventSet to build up a string
that will be stored in classId field? For example if a process is
waiting on a socket and a timeout, we'd write "Socket,Timeout" as a
text field.

> The "SecureRead" and "SecureWrite" wait events are going to be
> confusing, because the connection isn't necessarily secure.  I think
> those should be called "ClientRead" and "ClientWrite".
> Comprehensibility is more important than absolute consistency with the
> C function names.

Noted.

> Another thing to think about is that there's no way to actually see
> wait event information for a number of the processes which this patch
> instruments, because they don't appear in pg_stat_activity.

We could create a new system to track the activity of system-related
processes, for example pg_stat_system_activity, or pg_system_activity,
and list all the processes that are not counted in max_connections...
-- 
Michael



Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Wed, Sep 21, 2016 at 10:02 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Sep 21, 2016 at 10:23 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I have to admit that I like the individual event names quite a bit,
>> and I think the detail will be useful to users.  But I wonder if
>> there's a better way to describe the class of events that we're
>> talking about that's not so dependent on internal data structures.
>> Maybe we could divide these waits into a couple of categories - e.g.
>> "Socket", "Timeout", "Process" - and then divide these detailed wait
>> events among those classes.
>
> pgstat.h is mentioning that there is 1 byte still free. I did not
> notice that until a couple of minutes ago. There are 2 bytes used for
> the event ID, and 1 byte for the class ID, but there are 4 bytes
> available. Perhaps we could use this extra byte to store this extra
> status information, then use it for WaitEventSet to build up a string
> that will be stored in classId field? For example if a process is
> waiting on a socket and a timeout, we'd write "Socket,Timeout" as a
> text field.

No, that's not what I want to do.  I think we should categorize the
events administratively by their main purpose, rather than
technologically by what we're waiting for.

>> Another thing to think about is that there's no way to actually see
>> wait event information for a number of the processes which this patch
>> instruments, because they don't appear in pg_stat_activity.
>
> We could create a new system to track the activity of system-related
> processes, for example pg_stat_system_activity, or pg_system_activity,
> and list all the processes that are not counted in max_connections...

Yes.  Or we could decide to include everything in pg_stat_activity.  I
think those are the two reasonable options.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Wed, Sep 21, 2016 at 11:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> No, that's not what I want to do.  I think we should categorize the
> events administratively by their main purpose, rather than
> technologically by what we're waiting for.

So we'd just have three class IDs instead of one? Well why not.
-- 
Michael



Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Wed, Sep 21, 2016 at 10:23 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Sep 21, 2016 at 11:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> No, that's not what I want to do.  I think we should categorize the
>> events administratively by their main purpose, rather than
>> technologically by what we're waiting for.
>
> So we'd just have three class IDs instead of one? Well why not.

Yeah, or, I mean, it doesn't have to be three precisely, but I'd like
to try to avoid exposing the users to the fact that we have an
internal data structure called a WaitEventSet and instead classify by
function.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking wait event for latches

От
Thomas Munro
Дата:
On Thu, Sep 22, 2016 at 1:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Sep 20, 2016 at 7:13 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> This paragraph seems a bit confused.  I suggest something more like
>> this:  "The server process is waiting for one or more sockets, a timer
>> or an interprocess latch.  The wait happens in a WaitEventSet,
>> <productname>PostgreSQL</>'s portable IO multiplexing abstraction."
>
> I'm worried we're exposing an awful lot of internal detail here.

Ok, maybe it's not a good idea to mention WaitEventSet in the manual.
I was trying to keep the same basic structure of text as Michael
proposed, but fix the bit that implied that waiting happens "in a
latch", even when no latch is involved.  Perhaps only the first
sentence is necessary.  This will all become much clearer if there is
a follow-up patch that shows strings like "Socket", "Socket|Latch",
"Latch" instead of a general catch-all wait_event_type of "EventSet",
as discussed.

> Moreover, it's pretty confusing that we have this general concept of
> wait events in pg_stat_activity, and then here the specific type of
> wait event we're waiting for is the ... wait event kind.  Uh, what?

Yeah, that is confusing.  It comes about because of the coincidence
that pg_stat_activity finished up with a wait_event column, and our IO
multiplexing abstraction finished up with the name WaitEventSet.
<stuck-record-mode>We could instead call these new things "wait
points", because, well, erm, they name points in the code at which we
wait.  They don't name events (they just happen to use the
WaitEventSet mechanism, which itself does involve events: but those
events are things like "hey, this socket is now ready for
writing").</>

> I have to admit that I like the individual event names quite a bit,
> and I think the detail will be useful to users.  But I wonder if
> there's a better way to describe the class of events that we're
> talking about that's not so dependent on internal data structures.
> Maybe we could divide these waits into a couple of categories - e.g.
> "Socket", "Timeout", "Process" - and then divide these detailed wait
> events among those classes.

Well we already discussed a couple of different ways to get "Socket",
"Latch", "Socket|Latch", ... or something like that into the
wait_event_type column or new columns.  Wouldn't that be better, and
automatically fall out of the code rather than needing human curated
categories?  Although Michael suggested that that should be done as a
separate patch on top of the basic feature.

> The "SecureRead" and "SecureWrite" wait events are going to be
> confusing, because the connection isn't necessarily secure.  I think
> those should be called "ClientRead" and "ClientWrite".
> Comprehensibility is more important than absolute consistency with the
> C function names.

Devil's advocate mode:  Then why not improve those function names?
Keeping the wait point names systematically in sync with the actual
code makes things super simple and avoids a whole decision process and
discussion to create new user-friendly obfuscation every time anyone
introduces a new wait point.  This is fundamentally an introspection
mechanism allowing expert users to shine a light on the engine and see
what's going on inside it, so I don't see what's wrong with being
straight up about what is actually going on and using the names for
parts of our code that we already have.  If that leads us to improve
some function names I'm not sure why that's a bad thing.

Obviously this gets complicated by the existence of static functions
whose names are ambiguous and lack context, and multiple wait points
in a single function.  Previously I've suggested a hierarchical
arrangement for these names which might help with that.  Imagine names
like: executor.Hash.<function> (reported by a background worker
executing a hypothetical parallel hash join),
executor.Hash.<function>.<something> to disambiguate multiple wait
points in one function, walsender.<function> etc.  That way we could
have a tidy curated meaningful naming scheme based on modules, but a
no-brainer systematic way to name the most numerous leaf bits of that
hierarchy.  Just an idea...

> Another thing to think about is that there's no way to actually see
> wait event information for a number of the processes which this patch
> instruments, because they don't appear in pg_stat_activity.

Good point.  Perhaps there could be another extended view, or system
process view, or some other mechanism.  That could be material for a
separate patch?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Thu, Sep 22, 2016 at 6:49 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Sep 22, 2016 at 1:23 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Moreover, it's pretty confusing that we have this general concept of
>> wait events in pg_stat_activity, and then here the specific type of
>> wait event we're waiting for is the ... wait event kind.  Uh, what?
>
> Yeah, that is confusing.  It comes about because of the coincidence
> that pg_stat_activity finished up with a wait_event column, and our IO
> multiplexing abstraction finished up with the name WaitEventSet.
> <stuck-record-mode>We could instead call these new things "wait
> points", because, well, erm, they name points in the code at which we
> wait.  They don't name events (they just happen to use the
> WaitEventSet mechanism, which itself does involve events: but those
> events are things like "hey, this socket is now ready for
> writing").</>

What about trying first to come up with a class name generic enough
that would cover the set of events we are trying to cover. One idea is
to simply call the class "Process" ("Point", "Poll"), and mention in
the docs that this can wait for a socket, a timeout, postmaster death
or another process to tell it to move on. The idea is to come with a
name generic enough. In the first patch we don't detail much what a
process is waiting for at a given point, like what has been submitted
does. This would still be helpful for the user, because it would be
possible to look back at the code and guess where this is waiting.

Then comes the interesting bits: I don't think that it is a good idea
to associate only one event for a waiting point in the system views.
Say that at a waiting point WaitLatch is called with
WL_POSTMASTER_DEATH and WL_TIMEOUT, I'd rather let the user know that
both of them have been set up and not choose just one of them using
some hierarchy. But I rather think that we should list all of them
depending on the WL_* flags set up by the caller. Here comes a second
patch: let's use the last byte of the wait events and add this new
text[] column in pg_stat_activity, say wait_details. So for a waiting
point it would be possible to tell to the user that it is waiting on
'{socket,timeout,postmaster_death}' for example.

Then comes a third patch: addition of a system view to list all the
activity happening for processes that are not dependent on
max_connections. pg_stat_activity has the advantage, as Tom mentioned
a couple of days ago, that one can just run count(*) on it to estimate
the number of connections left. I think this is quite helpful. Or we
could just add a column in pg_stat_activity to mark a process type: is
it a system process or not? This deserves its own discussion for sure.

Patches 2 and 3 are independent things. Patch 1 is a requirement for 2 and 3.

>> Another thing to think about is that there's no way to actually see
>> wait event information for a number of the processes which this patch
>> instruments, because they don't appear in pg_stat_activity.
>
> Good point.  Perhaps there could be another extended view, or system
> process view, or some other mechanism.  That could be material for a
> separate patch?

Definitely a separate patch..
-- 
Michael



Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Wed, Sep 21, 2016 at 5:49 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
>> Moreover, it's pretty confusing that we have this general concept of
>> wait events in pg_stat_activity, and then here the specific type of
>> wait event we're waiting for is the ... wait event kind.  Uh, what?
>
> Yeah, that is confusing.  It comes about because of the coincidence
> that pg_stat_activity finished up with a wait_event column, and our IO
> multiplexing abstraction finished up with the name WaitEventSet.
> <stuck-record-mode>We could instead call these new things "wait
> points", because, well, erm, they name points in the code at which we
> wait.  They don't name events (they just happen to use the
> WaitEventSet mechanism, which itself does involve events: but those
> events are things like "hey, this socket is now ready for
> writing").</>

Sure, we could do that, but it means breaking backward compatibility
for pg_stat_activity *again*.  I'd be willing to do it but I don't
think I'd win any popularity contests.

>> I have to admit that I like the individual event names quite a bit,
>> and I think the detail will be useful to users.  But I wonder if
>> there's a better way to describe the class of events that we're
>> talking about that's not so dependent on internal data structures.
>> Maybe we could divide these waits into a couple of categories - e.g.
>> "Socket", "Timeout", "Process" - and then divide these detailed wait
>> events among those classes.
>
> Well we already discussed a couple of different ways to get "Socket",
> "Latch", "Socket|Latch", ... or something like that into the
> wait_event_type column or new columns.  Wouldn't that be better, and
> automatically fall out of the code rather than needing human curated
> categories?  Although Michael suggested that that should be done as a
> separate patch on top of the basic feature.

I think making that a separate patch is just punting the decision down
the field to a day that may never come.  Let's try to agree on
something that we can all live with and implement it now.  In terms of
avoiding human-curated categories, I basically see two options:

1. Find a name that is sufficiently generic that it covers everything
that might reach WaitEventSetWait either now or in the future when it
might wait for even more kinds of things than it does now.  For
example, we could call it "Stuff" or "Thing".  Less tongue-in-cheek
suggestions are welcome, but it's hard to come up with something that
is sufficiently-generic without being tautological.  "Event" is an
example of a name which is general enough to encompass everything but
also stupid: the column is called "wait_event" so everything that
appears in it is an event by definition.

2. Classify the events that fall into this category by some rigid
principle based on the types of things being awaited.  For example, we
could decide that if any sockets are awaited, the event class will be
"Client" if it is connected to a user and "IPC" for auxiliary
processes.

For myself, I don't see any real problem with using humans to classify
things; that's pretty much the one thing humans are much better at
than computers, so we might as well take advantage of it.  I think
that it's useful to try to group together types of waits which the
user will see as logically related to each other, even if that
involves applying some human judgement that might someday lead to some
discussion about what the best categorization for some new thing is.
PostgreSQL is intended to be used by humans, and avoiding discussions
(or even arguments) on pgsql-hackers shouldn't outrank usability on
the list of concerns.

So, I tried to classify these.  Here's what I came up with.

Activity: ArchiverMain, AutoVacuumMain, BgWriterMain,
BgWriterHibernate, CheckpointerMain, PgStatMain, RecoveryWalAll,
RecoveryWalStream, SysLoggerMain, WalReceiverMain, WalWriterMain

Client: SecureRead, SecureWrite, SSLOpenServer, WalSenderMain,
WalSenderWaitForWAL, WalSenderWriteData, WalReceiverWaitStart

Timeout: BaseBackupThrottle, PgSleep, RecoveryApplyDelay

IPC: BgWorkerShutdown, BgWorkerStartup, ExecuteGather,
MessageQueueInternal, MessageQueuePutMessage, MessageQueueReceive,
MessageQueueSend, ParallelFinish, ProcSignal, ProcSleep, SyncRep

Extension: Extension

I classified all of the main loop waits as waiting for activity; all
of those are background processes that are waiting for something to
happen and are more or less happy to sleep forever until it does.  I
also included the RecoveryWalAll and RecoveryWalStream events in
there; those don't have the sort of "main loop" flavor of the others
but they are happy to wait more or less indefinitely for something to
occur.  Likewise, it was pretty easy to find all of the events that
were waiting for client I/O, and I grouped those all under "Client".
A few of the remaining wait events seemed like they were clearly
waiting for a particular timeout to expire, so I gave those their own
"Timeout" category.

I believe these categorizations are actually useful for users.  For
example, you might want to see all of the waits in the system but
exclude the "Client", "Activity", and "Timeout" categories because
those are things that aren't signs of a problem.  A "Timeout" wait is
one that you explicitly requested, a "Client" wait isn't the server's
fault, and an "Activity" wait just means nothing is happening.  In
contrast, a "Lock" or "LWLock" or "IPC" wait shows that something is
actually delaying work that we'd ideally prefer to have get done
sooner.

I grouped the rest of this stuff as "IPC" because all of these events
are cases where one server process is waiting for another server
processes .  That could be further subdivided, of course: most of
those events are only going to occur in relation to parallel query,
but I didn't want to group it that way explicitly because both
background workers and shm_mq have other potential uses.  ProcSignal
and ProcSleep are related to heavyweight locks and SyncRep is of
course related to synchronous replication.   But they're all related
in that one server process is waiting for another server process to
tell it that a certain state has been reached, so IPC seems like a
good categorization.

Finally, extensions got their own category in this taxonomy, though I
wonder if it would be better to instead have
Activity/ExtensionActivity, Client/ExtensionClient,
Timeout/ExtensionTimeout, and IPC/ExtensionIPC instead of making it a
separate toplevel category.

To me, this seems like a pretty solid toplevel categorization and a
lot more useful than just throwing all of these in one bucket and
saying "good luck".  I'm not super-attached to the details but, again,
I think it's worth trying to bin things in a way that will be useful.

>> The "SecureRead" and "SecureWrite" wait events are going to be
>> confusing, because the connection isn't necessarily secure.  I think
>> those should be called "ClientRead" and "ClientWrite".
>> Comprehensibility is more important than absolute consistency with the
>> C function names.
>
> Devil's advocate mode:  Then why not improve those function names?

That'd be fine.

> Obviously this gets complicated by the existence of static functions
> whose names are ambiguous and lack context, and multiple wait points
> in a single function.  Previously I've suggested a hierarchical
> arrangement for these names which might help with that.  Imagine names
> like: executor.Hash.<function> (reported by a background worker
> executing a hypothetical parallel hash join),
> executor.Hash.<function>.<something> to disambiguate multiple wait
> points in one function, walsender.<function> etc.  That way we could
> have a tidy curated meaningful naming scheme based on modules, but a
> no-brainer systematic way to name the most numerous leaf bits of that
> hierarchy.  Just an idea...

Considering we only have a few dozen of those, this feels like it
might be overkill to me, and I suspect we'll end up finding that it's
a bit harder to make it consistent and useful than one might hope.  I
am basically happy with the way Michael named them, but that's not to
say I could never be happy with anything else.

>> Another thing to think about is that there's no way to actually see
>> wait event information for a number of the processes which this patch
>> instruments, because they don't appear in pg_stat_activity.
>
> Good point.  Perhaps there could be another extended view, or system
> process view, or some other mechanism.  That could be material for a
> separate patch?

I agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Thu, Sep 22, 2016 at 1:52 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Then comes the interesting bits: I don't think that it is a good idea
> to associate only one event for a waiting point in the system views.
> Say that at a waiting point WaitLatch is called with
> WL_POSTMASTER_DEATH and WL_TIMEOUT, I'd rather let the user know that
> both of them have been set up and not choose just one of them using
> some hierarchy. But I rather think that we should list all of them
> depending on the WL_* flags set up by the caller. Here comes a second
> patch: let's use the last byte of the wait events and add this new
> text[] column in pg_stat_activity, say wait_details. So for a waiting
> point it would be possible to tell to the user that it is waiting on
> '{socket,timeout,postmaster_death}' for example.

I think this is focusing on the wrong thing.  What we need here is not
infinite detail on exactly what is going on under the hood but useful
classification rules.  For example, as I said in my email to Thomas,
being able to exclude all cases of waiting for client I/O is useful
because those aren't signs of a problem in the way that LWLock or Lock
waits are.  It's better for us to provide that classification using
the existing system than for users to have to work out exactly which
things ought to be excluded on the basis of specific event names.

On the other hand, I submit that knowing which waits will be
interrupted by the death of the postmaster and which will not doesn't
add much.  In fact, I think that's almost an anti-feature, because it
will encourage users to care about details that are very rarely
relevant.

> Then comes a third patch: addition of a system view to list all the
> activity happening for processes that are not dependent on
> max_connections. pg_stat_activity has the advantage, as Tom mentioned
> a couple of days ago, that one can just run count(*) on it to estimate
> the number of connections left. I think this is quite helpful. Or we
> could just add a column in pg_stat_activity to mark a process type: is
> it a system process or not? This deserves its own discussion for sure.

Sure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking wait event for latches

От
Thomas Munro
Дата:
On Fri, Sep 23, 2016 at 1:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Sep 21, 2016 at 5:49 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>>> Moreover, it's pretty confusing that we have this general concept of
>>> wait events in pg_stat_activity, and then here the specific type of
>>> wait event we're waiting for is the ... wait event kind.  Uh, what?
>>
>> Yeah, that is confusing.  It comes about because of the coincidence
>> that pg_stat_activity finished up with a wait_event column, and our IO
>> multiplexing abstraction finished up with the name WaitEventSet.
>> <stuck-record-mode>We could instead call these new things "wait
>> points", because, well, erm, they name points in the code at which we
>> wait.  They don't name events (they just happen to use the
>> WaitEventSet mechanism, which itself does involve events: but those
>> events are things like "hey, this socket is now ready for
>> writing").</>
>
> Sure, we could do that, but it means breaking backward compatibility
> for pg_stat_activity *again*.  I'd be willing to do it but I don't
> think I'd win any popularity contests.

I didn't mean changing the column headings in pg_stat_activity.  I
meant that the enum called WaitEventIdentifier in Michael's v5 patch
should be called WaitPointIdentifier, and if we go with a single name
to appear in the wait_event_type column then it could be "WaitPoint".
But I would also prefer to see something more informative in that
column, as discussed below (and upthread).

>> Well we already discussed a couple of different ways to get "Socket",
>> "Latch", "Socket|Latch", ... or something like that into the
>> wait_event_type column or new columns.  Wouldn't that be better, and
>> automatically fall out of the code rather than needing human curated
>> categories?  Although Michael suggested that that should be done as a
>> separate patch on top of the basic feature.
>
> I think making that a separate patch is just punting the decision down
> the field to a day that may never come.  Let's try to agree on
> something that we can all live with and implement it now.  In terms of
> avoiding human-curated categories, I basically see two options:
>
> 1. Find a name that is sufficiently generic that it covers everything
> that might reach WaitEventSetWait either now or in the future when it
> might wait for even more kinds of things than it does now.  For
> example, we could call it "Stuff" or "Thing".  Less tongue-in-cheek
> suggestions are welcome, but it's hard to come up with something that
> is sufficiently-generic without being tautological.  "Event" is an
> example of a name which is general enough to encompass everything but
> also stupid: the column is called "wait_event" so everything that
> appears in it is an event by definition.
>
> 2. Classify the events that fall into this category by some rigid
> principle based on the types of things being awaited.  For example, we
> could decide that if any sockets are awaited, the event class will be
> "Client" if it is connected to a user and "IPC" for auxiliary
> processes.
>
> [...]
> occur.  Likewise, it was pretty easy to find all of the events that
> were waiting for client I/O, and I grouped those all under "Client".
> A few of the remaining wait events seemed like they were clearly
> waiting for a particular timeout to expire, so I gave those their own
> "Timeout" category.

Interesting.  OK, I agree that it'd be useful to show that we're
waiting because there's nothing happening, or waiting because the user
asked us to sleep, or waiting on IO, or waiting for an IPC response
because something is happening, and that higher level information is
difficult/impossible to extract automatically from the WaitEventSet.

I understand that "Activity" is the category of wait points that are
waiting for activity, but I wonder if it might be clearer to users if
that were called "Idle", because it's the category of idle waits
caused by non-activity.

Why is WalSenderMain not in that category alongside WalReceiverMain?
Hmm, actually it's kind of a tricky one: whether it's really idle or
waiting for IO depends.  It's always ready to wait for clients to send
messages, but I'd say that's part of its "idle" behaviour.  But it's
sometimes waiting for the socket to be writable: if
(pq_is_send_pending()) wakeEvents |= WL_SOCKET_WRITABLE, and that's
when it's definitely not idle, it's actively trying to feed WAL down
the pipe.  Do we want to get into dynamic categories depending on
conditions like that?

I was thinking about suggesting a category "Replication" to cover the
waits for client IO relating to replication, as opposed to client IO
waits relating to regular user connections.  Then you could put sync
rep into that category instead of IPC, even though technically it is
waiting for IPC from walsender process(es), on the basis that it's
more newsworthy to a DBA that it's really waiting for a remote replica
to respond.  But it's probably pretty clear what's going on from the
the wait point names, so maybe it's not worth a category.  Thoughts?

WalSenderWaitForWAL is waiting for both IPC and Client, but if you
have to pick one isn't IPC the main thing it's waiting for?  It'll
stop waiting when it gets IPC telling it about flushed location
reaching some point.  It's multiplexing client IO processing sort of
"incidentally" while it does so.

WalReceiverWaitStart is waiting for IPC, not Client.

>> Obviously this gets complicated by the existence of static functions
>> whose names are ambiguous and lack context, and multiple wait points
>> in a single function.  Previously I've suggested a hierarchical
>> arrangement for these names which might help with that.  Imagine names
>> like: executor.Hash.<function> (reported by a background worker
>> executing a hypothetical parallel hash join),
>> executor.Hash.<function>.<something> to disambiguate multiple wait
>> points in one function, walsender.<function> etc.  That way we could
>> have a tidy curated meaningful naming scheme based on modules, but a
>> no-brainer systematic way to name the most numerous leaf bits of that
>> hierarchy.  Just an idea...
>
> Considering we only have a few dozen of those, this feels like it
> might be overkill to me, and I suspect we'll end up finding that it's
> a bit harder to make it consistent and useful than one might hope.  I
> am basically happy with the way Michael named them, but that's not to
> say I could never be happy with anything else.

Yeah, it's fine as it is.

I do suspect that the set of wait points will grow quite a bit as we
develop more parallel stuff though.  For example, I have been working
on a patch that adds several more wait points, indirectly via
condition variables (using your patch).  Actually in my case it's
BarrierWait -> ConditionVariableWait -> WaitEventSetWait.  I propose
that these higher level wait primitives should support passing a wait
point identifier through to WaitEventSetWait.  Some of my patch's wait
points are in the same function and some are in static functions, so
I'll finish up making up various affixes to disambiguate, and then I
might be tempted to include some separator characters...  I am also
aware of three other hackers who are working on patches that also make
use of condition variables, so (if you agree that condition variable
waits are wait points) they'll be adding more too.  That said, ad-hoc
name construction is fine and there is no reason we couldn't revise
things if names start to look messy when concrete patches that add
more names emerge.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Thu, Sep 22, 2016 at 7:10 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Interesting.  OK, I agree that it'd be useful to show that we're
> waiting because there's nothing happening, or waiting because the user
> asked us to sleep, or waiting on IO, or waiting for an IPC response
> because something is happening, and that higher level information is
> difficult/impossible to extract automatically from the WaitEventSet.

Cool.  :-)

> I understand that "Activity" is the category of wait points that are
> waiting for activity, but I wonder if it might be clearer to users if
> that were called "Idle", because it's the category of idle waits
> caused by non-activity.

I thought about that but figured it would be better to consistently
state the thing *for which* we were waiting.  We wait FOR a client or
a timeout or activity.  We do not wait FOR idle; we wait to be NOT
idle.

> Why is WalSenderMain not in that category alongside WalReceiverMain?
> Hmm, actually it's kind of a tricky one: whether it's really idle or
> waiting for IO depends.  It's always ready to wait for clients to send
> messages, but I'd say that's part of its "idle" behaviour.  But it's
> sometimes waiting for the socket to be writable: if
> (pq_is_send_pending()) wakeEvents |= WL_SOCKET_WRITABLE, and that's
> when it's definitely not idle, it's actively trying to feed WAL down
> the pipe.  Do we want to get into dynamic categories depending on
> conditions like that?

I suspect that's overkill.  I don't want wait-point-naming to make
programming the system noticeably more difficult, so I think it's fine
to pick a categorization of what we think the typical case will be and
call it good.  If we try that and people find it's a nuisance, we can
fix it then.  In the case of WAL sender, I assume it will normally be
waiting for more WAL to be generated; whereas in the case of WAL
receiver, I assume it will normally be waiting for more WAL to be
received from the remote side.  The reverse cases are possible: the
sender could be waiting for the socket buffer to drain so it can push
more WAL onto the wire, and the receiver could likewise be waiting for
buffer space to push out feedback messages.  But probably mostly not.
At least for a first cut, I'd be inclined to handle this fuzziness by
putting weasel-words in the documentation rather than by trying to
make the reporting 100% perfectly accurate.

> I was thinking about suggesting a category "Replication" to cover the
> waits for client IO relating to replication, as opposed to client IO
> waits relating to regular user connections.  Then you could put sync
> rep into that category instead of IPC, even though technically it is
> waiting for IPC from walsender process(es), on the basis that it's
> more newsworthy to a DBA that it's really waiting for a remote replica
> to respond.  But it's probably pretty clear what's going on from the
> the wait point names, so maybe it's not worth a category.  Thoughts?

I thought about a replication category but either it will only have
SyncRep in it, which is odd, or it will pull in other things that
otherwise fit nicely into the Activity category, and then that
boundaries of all the categories become mushy: is the subsystem that
causes the wait that we are trying to document, or the kind of thing
for which we are waiting?

> I do suspect that the set of wait points will grow quite a bit as we
> develop more parallel stuff though.  For example, I have been working
> on a patch that adds several more wait points, indirectly via
> condition variables (using your patch).  Actually in my case it's
> BarrierWait -> ConditionVariableWait -> WaitEventSetWait.  I propose
> that these higher level wait primitives should support passing a wait
> point identifier through to WaitEventSetWait.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking wait event for latches

От
Amit Kapila
Дата:
On Thu, Sep 22, 2016 at 7:19 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Sep 21, 2016 at 5:49 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>
> So, I tried to classify these.  Here's what I came up with.
>
> Activity: ArchiverMain, AutoVacuumMain, BgWriterMain,
> BgWriterHibernate, CheckpointerMain, PgStatMain, RecoveryWalAll,
> RecoveryWalStream, SysLoggerMain, WalReceiverMain, WalWriterMain
>
> Client: SecureRead, SecureWrite, SSLOpenServer, WalSenderMain,
> WalSenderWaitForWAL, WalSenderWriteData, WalReceiverWaitStart
>
> Timeout: BaseBackupThrottle, PgSleep, RecoveryApplyDelay
>
> IPC: BgWorkerShutdown, BgWorkerStartup, ExecuteGather,
> MessageQueueInternal, MessageQueuePutMessage, MessageQueueReceive,
> MessageQueueSend, ParallelFinish, ProcSignal, ProcSleep, SyncRep
>
> Extension: Extension
>

We already call lwlock waits from an extension as "extension", so I
think just naming this an Extension might create some confusion.

> I classified all of the main loop waits as waiting for activity; all
> of those are background processes that are waiting for something to
> happen and are more or less happy to sleep forever until it does.  I
> also included the RecoveryWalAll and RecoveryWalStream events in
> there; those don't have the sort of "main loop" flavor of the others
> but they are happy to wait more or less indefinitely for something to
> occur.  Likewise, it was pretty easy to find all of the events that
> were waiting for client I/O, and I grouped those all under "Client".
> A few of the remaining wait events seemed like they were clearly
> waiting for a particular timeout to expire, so I gave those their own
> "Timeout" category.
>
> I believe these categorizations are actually useful for users.  For
> example, you might want to see all of the waits in the system but
> exclude the "Client", "Activity", and "Timeout" categories because
> those are things that aren't signs of a problem.  A "Timeout" wait is
> one that you explicitly requested, a "Client" wait isn't the server's
> fault, and an "Activity" wait just means nothing is happening.  In
> contrast, a "Lock" or "LWLock" or "IPC" wait shows that something is
> actually delaying work that we'd ideally prefer to have get done
> sooner.
>
> I grouped the rest of this stuff as "IPC" because all of these events
> are cases where one server process is waiting for another server
> processes .  That could be further subdivided, of course: most of
> those events are only going to occur in relation to parallel query,
> but I didn't want to group it that way explicitly because both
> background workers and shm_mq have other potential uses.  ProcSignal
> and ProcSleep are related to heavyweight locks and SyncRep is of
> course related to synchronous replication.   But they're all related
> in that one server process is waiting for another server process to
> tell it that a certain state has been reached, so IPC seems like a
> good categorization.
>
> Finally, extensions got their own category in this taxonomy, though I
> wonder if it would be better to instead have
> Activity/ExtensionActivity, Client/ExtensionClient,
> Timeout/ExtensionTimeout, and IPC/ExtensionIPC instead of making it a
> separate toplevel category.
>

+1. It can avoid confusion.

> To me, this seems like a pretty solid toplevel categorization and a
> lot more useful than just throwing all of these in one bucket and
> saying "good luck".

Agreed.  This categorisation is very good and can help patch author to proceed.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Tracking wait event for latches

От
Amit Kapila
Дата:
On Fri, Sep 23, 2016 at 7:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Sep 22, 2016 at 7:10 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>
>> I was thinking about suggesting a category "Replication" to cover the
>> waits for client IO relating to replication, as opposed to client IO
>> waits relating to regular user connections.  Then you could put sync
>> rep into that category instead of IPC, even though technically it is
>> waiting for IPC from walsender process(es), on the basis that it's
>> more newsworthy to a DBA that it's really waiting for a remote replica
>> to respond.  But it's probably pretty clear what's going on from the
>> the wait point names, so maybe it's not worth a category.  Thoughts?
>
> I thought about a replication category but either it will only have
> SyncRep in it, which is odd, or it will pull in other things that
> otherwise fit nicely into the Activity category, and then that
> boundaries of all the categories become mushy: is the subsystem that
> causes the wait that we are trying to document, or the kind of thing
> for which we are waiting?
>

I also think that it can add some confusion in defining boundaries and
also might not be consistent with current way we have defined the
waits, however there is some value in using subsystem which is that
user can identify the bottlenecks with ease.  I think this applies to
both Replication and Parallel Query.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Fri, Sep 23, 2016 at 10:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Sep 22, 2016 at 7:10 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> I was thinking about suggesting a category "Replication" to cover the
>> waits for client IO relating to replication, as opposed to client IO
>> waits relating to regular user connections.  Then you could put sync
>> rep into that category instead of IPC, even though technically it is
>> waiting for IPC from walsender process(es), on the basis that it's
>> more newsworthy to a DBA that it's really waiting for a remote replica
>> to respond.  But it's probably pretty clear what's going on from the
>> the wait point names, so maybe it's not worth a category.  Thoughts?
>
> I thought about a replication category but either it will only have
> SyncRep in it, which is odd, or it will pull in other things that
> otherwise fit nicely into the Activity category, and then that
> boundaries of all the categories become mushy: is the subsystem that
> causes the wait that we are trying to document, or the kind of thing
> for which we are waiting?

Using category IPC for SyncRep looks fine to me.

>> I do suspect that the set of wait points will grow quite a bit as we
>> develop more parallel stuff though.  For example, I have been working
>> on a patch that adds several more wait points, indirectly via
>> condition variables (using your patch).  Actually in my case it's
>> BarrierWait -> ConditionVariableWait -> WaitEventSetWait.  I propose
>> that these higher level wait primitives should support passing a wait
>> point identifier through to WaitEventSetWait.
>
> +1.

As much as I suspect that inclusion of pgstat.h will become more and
more usual to allow more code paths to access to a given WaitClass.

After digesting all the comments given, I have produced the patch
attached that uses the following categories:
+const char *const WaitEventNames[] = {
+   /* activity */
+   "ArchiverMain",
+   "AutoVacuumMain",
+   "BgWriterHibernate",
+   "BgWriterMain",
+   "CheckpointerMain",
+   "PgStatMain",
+   "RecoveryWalAll",
+   "RecoveryWalStream",
+   "SysLoggerMain",
+   "WalReceiverMain",
+   "WalSenderMain",
+   "WalWriterMain",
+   /* client */
+   "SecureRead",
+   "SecureWrite",
+   "SSLOpenServer",
+   "WalReceiverWaitStart",
+   "WalSenderWaitForWAL",
+   "WalSenderWriteData",
+   /* Extension */
+   "Extension",
+   /* IPC */
+   "BgWorkerShutdown",
+   "BgWorkerStartup",
+   "ExecuteGather",
+   "MessageQueueInternal",
+   "MessageQueuePutMessage",
+   "MessageQueueReceive",
+   "MessageQueueSend",
+   "ParallelFinish",
+   "ProcSignal",
+   "ProcSleep",
+   "SyncRep",
+   /* timeout */
+   "BaseBackupThrottle",
+   "PgSleep",
+   "RecoveryApplyDelay",
+};
I have moved WalSenderMain as it tracks a main loop, so it was strange
to not have it in Activity. I also kept SecureRead and SecureWrite
because this is referring to the functions of the same name. For the
rest, I got convinced by what has been discussed and it makes things
clearer. My head is not spinning anymore when I try to keep in sync
the list of names and its associated enum, which is good. I have as
well rewritten the docs to follow that.
--
Michael

Вложения

Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Thu, Sep 22, 2016 at 10:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Finally, extensions got their own category in this taxonomy, though I
> wonder if it would be better to instead have
> Activity/ExtensionActivity, Client/ExtensionClient,
> Timeout/ExtensionTimeout, and IPC/ExtensionIPC instead of making it a
> separate toplevel category.

I don't think that it is necessary to go up to that level. If you look
at the latest patch, WaitLatch & friends have been extended with two
arguments: classId and eventId, so extensions could just use
WAIT_ACTIVITY as classId and WE_EXTENSION as eventId to do the
equivalent of ExtensionActivity.
-- 
Michael



Re: Tracking wait event for latches

От
Thomas Munro
Дата:
On Sat, Sep 24, 2016 at 1:44 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> After digesting all the comments given, I have produced the patch
> attached that uses the following categories:
> +const char *const WaitEventNames[] = {
> +   /* activity */
> +   "ArchiverMain",
> +   "AutoVacuumMain",
> +   "BgWriterHibernate",
> +   "BgWriterMain",
> +   "CheckpointerMain",
> +   "PgStatMain",
> +   "RecoveryWalAll",
> +   "RecoveryWalStream",
> +   "SysLoggerMain",
> +   "WalReceiverMain",
> +   "WalSenderMain",
> +   "WalWriterMain",
> +   /* client */
> +   "SecureRead",
> +   "SecureWrite",
> +   "SSLOpenServer",
> +   "WalReceiverWaitStart",
> +   "WalSenderWaitForWAL",
> +   "WalSenderWriteData",
> +   /* Extension */
> +   "Extension",
> +   /* IPC */
> +   "BgWorkerShutdown",
> +   "BgWorkerStartup",
> +   "ExecuteGather",
> +   "MessageQueueInternal",
> +   "MessageQueuePutMessage",
> +   "MessageQueueReceive",
> +   "MessageQueueSend",
> +   "ParallelFinish",
> +   "ProcSignal",
> +   "ProcSleep",
> +   "SyncRep",
> +   /* timeout */
> +   "BaseBackupThrottle",
> +   "PgSleep",
> +   "RecoveryApplyDelay",
> +};
> I have moved WalSenderMain as it tracks a main loop, so it was strange
> to not have it in Activity. I also kept SecureRead and SecureWrite
> because this is referring to the functions of the same name. For the
> rest, I got convinced by what has been discussed and it makes things
> clearer. My head is not spinning anymore when I try to keep in sync
> the list of names and its associated enum, which is good. I have as
> well rewritten the docs to follow that.

-extern int WaitEventSetWait(WaitEventSet *set, long timeout,
WaitEvent *occurred_events, int nevents);
-extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout);
+extern int WaitEventSetWait(WaitEventSet *set, long timeout,
+  WaitEvent *occurred_events, int nevents,
+  uint8 classId, uint16 eventId);
+extern int WaitLatch(volatile Latch *latch, int wakeEvents, long timeout,
+  uint8 classId, uint16 eventId);extern int WaitLatchOrSocket(volatile Latch *latch, int wakeEvents,
-  pgsocket sock, long timeout);
+  pgsocket sock, long timeout, uint8 classId,
+  uint16 eventId);

+ WaitLatch(MyLatch, WL_LATCH_SET, 0, WAIT_IPC, WE_MQ_RECEIVE);

If the class really is strictly implied by the WaitEventIdentifier,
then do we really need to supply it everywhere when calling the
various wait functions?  That's going to be quite a few functions:
WaitLatch, WaitLatchOrSocket, WaitEventSetWait for now, and if some
more patches out there have legs then also ConditionVariableWait,
BarrierWait, and possibly further kinds of wait points.  And then all
their call sites will have have to supply the wait class ID, even
though it is implied by the other ID.  Perhaps that array that
currently holds the names should instead hold { classId, displayName }
so we could just look it up?

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Mon, Sep 26, 2016 at 1:46 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> If the class really is strictly implied by the WaitEventIdentifier,
> then do we really need to supply it everywhere when calling the
> various wait functions?  That's going to be quite a few functions:
> WaitLatch, WaitLatchOrSocket, WaitEventSetWait for now, and if some
> more patches out there have legs then also ConditionVariableWait,
> BarrierWait, and possibly further kinds of wait points.  And then all
> their call sites will have have to supply the wait class ID, even
> though it is implied by the other ID.  Perhaps that array that
> currently holds the names should instead hold { classId, displayName }
> so we could just look it up?

I considered this reverse-engineering, but arrived at the conclusion
that having a flexible API mattered more to give more flexibility to
module developers. In short this avoids having extra class IDs like
ActivityExtention, TimeoutExtension, etc. But perhaps that's just a
matter of taste..
-- 
Michael



Re: Tracking wait event for latches

От
Thomas Munro
Дата:
On Mon, Sep 26, 2016 at 7:07 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Mon, Sep 26, 2016 at 1:46 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> If the class really is strictly implied by the WaitEventIdentifier,
>> then do we really need to supply it everywhere when calling the
>> various wait functions?  That's going to be quite a few functions:
>> WaitLatch, WaitLatchOrSocket, WaitEventSetWait for now, and if some
>> more patches out there have legs then also ConditionVariableWait,
>> BarrierWait, and possibly further kinds of wait points.  And then all
>> their call sites will have have to supply the wait class ID, even
>> though it is implied by the other ID.  Perhaps that array that
>> currently holds the names should instead hold { classId, displayName }
>> so we could just look it up?
>
> I considered this reverse-engineering, but arrived at the conclusion
> that having a flexible API mattered more to give more flexibility to
> module developers. In short this avoids having extra class IDs like
> ActivityExtention, TimeoutExtension, etc. But perhaps that's just a
> matter of taste..

Ok, if they really are independent then shouldn't we take advantage of
that at call sites where we might be idle but we might also be waiting
for the network?  For example we could do this:
           /* Sleep until something happens or we time out */           WaitLatchOrSocket(MyLatch, wakeEvents,
                  MyProcPort->sock, sleeptime,                             pq_is_send_pending() ? WAIT_CLIENT :
 
WAIT_ACTIVITY,                             WE_WAL_SENDER_MAIN);

Isn't WE_WAL_SENDER_WAIT_WAL primarily WAIT_IPC, not WAIT_CLIENT?  Or
perhaps "pq_is_send_pending() ? WAIT_CLIENT : WAIT_IPC".

Actually, I'm still not sold on "Activity" and "Client".  I think
"Idle" and "Network" would be better.  Everybody knows intuitively
what "Idle" means.  "Network" is better than "Client" because it
avoids confusion about user applications vs replication connections or
clients vs servers.

+        <listitem>
+         <para>
+          <literal>Activity</>: The server process is waiting for some
+          activity to happen on a socket.  This is mainly used system processes
+          in their main processing loop.  <literal>wait_event</> will identify
+          the type of activity waited for.
+         </para>
+        </listitem>

"The server process is waiting for some activity to happen on a
socket."  Not true: could be a latch, or both.

I think what this category is really getting at is that the server
process is idle.  Everything in it could otherwise go in the IPC or
Client categories on the basis that it's mainly waiting for a socket
or a latch, but we're deciding to separate the wait points
representing "idleness" and put them here.

How about:  "The server process is idle.  This is used by system
processes waiting for activity in their main processing loop.
<literal>wait_event</a> will identify the specific wait point."

+        <listitem>
+         <para>
+          <literal>Client</>: The server process is waiting for some activity
+          on a socket from a client process, and that the server expects
+          something to happen that is independent from its internal processes.
+          <literal>wait_event</> will identify the type of client activity
+          waited for.
+         </para>
+        </listitem>

Is it worth spelling out that "client process" here doesn't just mean
user applications, it's also remote PostgreSQL servers doing
replication?  "wait_event" doesn't really identify the type of client
activity waited for, it identifies the code that is waiting.

+        <listitem>
+         <para>
+          <literal>Extension</>: The server process is waiting for activity
+          in a plugin or an extension.  This category is useful for plugin
+          and module developers to track custom waiting points.
+         </para>
+        </listitem>

Plugin, extension, module?  How about just "extension"?  Why developers?

+        <listitem>
+         <para>
+          <literal>IPC</>: The server process is waiting for some activity
+          from an auxilliary process.  <literal>wait_event</> will identify
+          the type of activity waited for.
+         </para>
+        </listitem>

s/auxilliary/auxiliary/, but I wouldn't it be better to say something
more general like "from another process in the cluster"?  Background
workers are not generally called auxiliary processes, and some of
these wait points are waiting for those.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Wed, Sep 28, 2016 at 9:39 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Ok, if they really are independent then shouldn't we take advantage of
> that at call sites where we might be idle but we might also be waiting
> for the network?  For example we could do this:
>
>             /* Sleep until something happens or we time out */
>             WaitLatchOrSocket(MyLatch, wakeEvents,
>                               MyProcPort->sock, sleeptime,
>                               pq_is_send_pending() ? WAIT_CLIENT :
> WAIT_ACTIVITY,
>                               WE_WAL_SENDER_MAIN);

Yes, we can do fancy things with this API. Extensions could do the
same, the important thing being to have generic enough event
categories (take class).

> Isn't WE_WAL_SENDER_WAIT_WAL primarily WAIT_IPC, not WAIT_CLIENT?  Or
> perhaps "pq_is_send_pending() ? WAIT_CLIENT : WAIT_IPC".
>
> Actually, I'm still not sold on "Activity" and "Client".  I think
> "Idle" and "Network" would be better.  Everybody knows intuitively
> what "Idle" means.  "Network" is better than "Client" because it
> avoids confusion about user applications vs replication connections or
> clients vs servers.

Personally I am buying the argument of Robert instead. The class of an
event means what it is waiting for, not in which state in it waiting
for something. "Idle" means that the process *is* idle, not that it is
waiting for something.

> +        <listitem>
> +         <para>
> +          <literal>Activity</>: The server process is waiting for some
> +          activity to happen on a socket.  This is mainly used system processes
> +          in their main processing loop.  <literal>wait_event</> will identify
> +          the type of activity waited for.
> +         </para>
> +        </listitem>
>
> "The server process is waiting for some activity to happen on a
> socket."  Not true: could be a latch, or both.
>
> I think what this category is really getting at is that the server
> process is idle.  Everything in it could otherwise go in the IPC or
> Client categories on the basis that it's mainly waiting for a socket
> or a latch, but we're deciding to separate the wait points
> representing "idleness" and put them here.
>
> How about:  "The server process is idle.  This is used by system
> processes waiting for activity in their main processing loop.
> <literal>wait_event</a> will identify the specific wait point."

You have a better way to word things than I do.

> +        <listitem>
> +         <para>
> +          <literal>Client</>: The server process is waiting for some activity
> +          on a socket from a client process, and that the server expects
> +          something to happen that is independent from its internal processes.
> +          <literal>wait_event</> will identify the type of client activity
> +          waited for.
> +         </para>
> +        </listitem>
>
> Is it worth spelling out that "client process" here doesn't just mean
> user applications, it's also remote PostgreSQL servers doing
> replication?  "wait_event" doesn't really identify the type of client
> activity waited for, it identifies the code that is waiting.

Okay, switched to "user applications", and precised that this is a
wait point that wait_event tracks.

> +        <listitem>
> +         <para>
> +          <literal>Extension</>: The server process is waiting for activity
> +          in a plugin or an extension.  This category is useful for plugin
> +          and module developers to track custom waiting points.
> +         </para>
> +        </listitem>
>
> Plugin, extension, module?  How about just "extension"?  Why developers?

Let's say "extension module", this is used in a couple of places in the docs.


> +        <listitem>
> +         <para>
> +          <literal>IPC</>: The server process is waiting for some activity
> +          from an auxilliary process.  <literal>wait_event</> will identify
> +          the type of activity waited for.
> +         </para>
> +        </listitem>
>
> s/auxilliary/auxiliary/, but I wouldn't it be better to say something
> more general like "from another process in the cluster"?  Background
> workers are not generally called auxiliary processes, and some of
> these wait points are waiting for those.

There was the same typo in latch.h, still "from another process" looks better.
--
Michael

Вложения

Re: Tracking wait event for latches

От
Thomas Munro
Дата:
On Wed, Sep 28, 2016 at 6:25 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> wait-event-set-v8.patch

Ok, I'm just about ready to mark this as 'Ready for Committer'.  Just
a couple of things:

+ pgstat_report_wait_start((uint8) classId, (uint16) eventId);

Unnecessary casts.

+        <row>
+         <entry morerows="5"><literal>Client</></entry>
+         <entry><literal>SecureRead</></entry>
+         <entry>Waiting to read data from a secure connection.</entry>
+        </row>
+        <row>
+         <entry><literal>SecureWrite</></entry>
+         <entry>Waiting to write data to a secure connection.</entry>
+        </row>

I think we want to drop the word 'secure' from the description lines
in this patch.  Then I think we plan to make a separate patch that
will rename the functions themselves and the corresponding wait points
to something more generic?

I'm assuming that my suggestions for making WE_WAL_SENDER_WAIT_WAL and
WE_WAL_SENDER_MAIN have a dynamically chosen class ID would also be
material for another patch, but it doesn't matter much because those
processes won't show up yet anyway.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Wed, Sep 28, 2016 at 3:40 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Sep 28, 2016 at 6:25 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> wait-event-set-v8.patch
>
> Ok, I'm just about ready to mark this as 'Ready for Committer'.

Thanks.

> Just a couple of things:
> + pgstat_report_wait_start((uint8) classId, (uint16) eventId);
> Unnecessary casts.

Right....

> +        <row>
> +         <entry morerows="5"><literal>Client</></entry>
> +         <entry><literal>SecureRead</></entry>
> +         <entry>Waiting to read data from a secure connection.</entry>
> +        </row>
> +        <row>
> +         <entry><literal>SecureWrite</></entry>
> +         <entry>Waiting to write data to a secure connection.</entry>
> +        </row>
>
> I think we want to drop the word 'secure' from the description lines
> in this patch.  Then I think we plan to make a separate patch that
> will rename the functions themselves and the corresponding wait points
> to something more generic?

Robert mentioned ClientRead/ClientWrite upthread. I still think that
SecureRead/SecureWrite is better as it respects the routine name where
the wait point is, and that's consistent with the rest.

> I'm assuming that my suggestions for making WE_WAL_SENDER_WAIT_WAL and
> WE_WAL_SENDER_MAIN have a dynamically chosen class ID would also be
> material for another patch, but it doesn't matter much because those
> processes won't show up yet anyway.

WAL senders do show up since 8299471 because they are counted as in
max_connections. That's pretty cool combined with this patch.

I am sending a new patch to save 30s to the committer potentially
looking at this patch.
--
Michael

Вложения

Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Tue, Sep 27, 2016 at 8:39 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Ok, if they really are independent then shouldn't we take advantage of
> that at call sites where we might be idle but we might also be waiting
> for the network?

I certainly didn't intend for them to be independent, and I don't
think they should be.  I think it should be a hierarchy - as it is
currently.  I think it's a bad idea to introduce the notational
overhead of having to pass through two integers rather than one
everywhere, and a worse idea to encourage people to think of the
wait_event_type and wait_event are related any way other than
hierarchically.

> Actually, I'm still not sold on "Activity" and "Client".  I think
> "Idle" and "Network" would be better.  Everybody knows intuitively
> what "Idle" means.  "Network" is better than "Client" because it
> avoids confusion about user applications vs replication connections or
> clients vs servers.

Hmm, I could live with that, if other people like it.

> s/auxilliary/auxiliary/, but I wouldn't it be better to say something
> more general like "from another process in the cluster"?  Background
> workers are not generally called auxiliary processes, and some of
> these wait points are waiting for those.

Agreed; or perhaps it could even be waiting for another regular backend.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Wed, Sep 28, 2016 at 9:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Sep 27, 2016 at 8:39 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>> Ok, if they really are independent then shouldn't we take advantage of
>> that at call sites where we might be idle but we might also be waiting
>> for the network?
>
> I certainly didn't intend for them to be independent, and I don't
> think they should be.  I think it should be a hierarchy - as it is
> currently.  I think it's a bad idea to introduce the notational
> overhead of having to pass through two integers rather than one
> everywhere, and a worse idea to encourage people to think of the
> wait_event_type and wait_event are related any way other than
> hierarchically.

So should I change back the patch to have only one argument for the
eventId, and guess classId from it?
-- 
Michael



Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Wed, Sep 28, 2016 at 8:38 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Sep 28, 2016 at 9:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Tue, Sep 27, 2016 at 8:39 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> Ok, if they really are independent then shouldn't we take advantage of
>>> that at call sites where we might be idle but we might also be waiting
>>> for the network?
>>
>> I certainly didn't intend for them to be independent, and I don't
>> think they should be.  I think it should be a hierarchy - as it is
>> currently.  I think it's a bad idea to introduce the notational
>> overhead of having to pass through two integers rather than one
>> everywhere, and a worse idea to encourage people to think of the
>> wait_event_type and wait_event are related any way other than
>> hierarchically.
>
> So should I change back the patch to have only one argument for the
> eventId, and guess classId from it?

Why would you need to guess?  But, yes, I think one argument is much preferable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Python3.4 detection on 9.6 configuration

От
Lou Picciano
Дата:
PostgreSQL Friends:

Trying to build 9.6RC1, with Python3.4, on OpenIndiana (Illumos). It seems the detection of shared library status of the .so has changed. This appears to be related to a different(?) elucidation of python configuration.

A 'hardwired' change to the configure script to trap platform 'solaris' will work, but this seems the inappropriate approach.

Would be happy to work through this here - I'd like to make it a small 'contribution'.

Clipped from configure script:

  1. { $as_echo "$as_me:${as_lineno-$LINENO}: checking how to link an embedded Python application" >&5
  2. $as_echo_n "checking how to link an embedded Python application... " >&6; }
  3.  
  4. python_libdir=`${PYTHON} -c "import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('LIBDIR'))))"`
  5. python_ldlibrary=`${PYTHON} -c "import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('LDLIBRARY'))))"`
  6. python_so=`${PYTHON} -c "import distutils.sysconfig; print(' '.join(filter(None,distutils.sysconfig.get_config_vars('SO'))))"`
  7. echo "-----"
  8. echo "-----  LOU MOD: python_so: $python_so"
  9. echo "-----"
  10.  
  11. configure finds '.so' on Python2.7
  12. configure finds '.cpython-34m.so' on Python3.4
  13.  
  14. ------------- LATER in the config script, the following 'hardwired' change will 'fix' this, but is likely not the right approach:
  15. (our Python _is_ built as a shared lib. So, this is wonky, but it will work: )
  16.  
  17.   # We need libpython as a shared library.  With Python >=2.5, we
  18.   # check the Py_ENABLE_SHARED setting.  On Debian, the setting is not
  19.   # correct before the jessie release (http://bugs.debian.org/695979).
  20.   # We also want to support older Python versions.  So as a fallback
  21.   # we see if there is a file that is named like a shared library.
  22.  
  23.   if test "$python_enable_shared" != 1; then
  24.     if test "$PORTNAME" = darwin; then
  25.       # macOS does supply a .dylib even though Py_ENABLE_SHARED does
  26.       # not get set.  The file detection logic below doesn't succeed
  27.       # on older macOS versions, so make it explicit.
  28.       python_enable_shared=1
  29.     elif test "$PORTNAME" = win32; then
  30.       # Windows also needs an explicit override.
  31.       python_enable_shared=1
  32.     #  ----- MOD BY LOU: ----------------------------------------
  33.     elif test "$PORTNAME" = solaris; then
  34.       # Solaris explicit override.
  35.       python_enable_shared=1
  36.     else
  37.       # We don't know the platform shared library extension here yet,
  38.       # so we try some candidates.
  39.       for dlsuffix in .so .sl; do
  40.         if ls "$python_libdir"/libpython*${dlsuffix}* >/dev/null 2>&1; then
  41.           python_enable_shared=1
  42.           break
  43.         fi
  44.       done
  45.     fi
  46.   fi

Re: Python3.4 detection on 9.6 configuration

От
Tom Lane
Дата:
Lou Picciano <loupicciano@comcast.net> writes:
> Trying to build 9.6RC1, with Python3.4, on OpenIndiana (Illumos). It
> seems the detection of shared library status of the .so has changed.

Changed from what?  I don't recall that we've touched that code in quite
some time.  What was the last version that worked for you?  What
exactly is failing?

I'm having a hard time following your not-really-a-patch, but it looks
like you're proposing forcing python_enable_shared=1 on Solaris, which
sounds like a pretty bad idea.  AFAIK the shared-ness of libpython is
up to whoever built it.
        regards, tom lane



Re: Python3.4 detection on 9.6 configuration

От
Lou Picciano
Дата:



> From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Lou Picciano" <loupicciano@comcast.net>
Cc: pgsql-hackers@postgresql.org
Sent: Wednesday, September 28, 2016 9:33:06 AM
Subject: Re: [HACKERS] Python3.4 detection on 9.6 configuration

Lou Picciano <loupicciano@comcast.net> writes:
> Trying to build 9.6RC1, with Python3.4, on OpenIndiana (Illumos). It
> seems the detection of shared library status of the .so has changed.

Specifically, configure is not finding the .so. It's not that the so isn't built 'shared'; it is.

Changed from what?  I don't recall that we've touched that code in quite
some time.  What was the last version that worked for you?  What
exactly is failing?

Core bit seems to be the python3.4-config behavior:

/usr/bin/python3.4-config --extension-suffix

.cpython-34m.so

I don't know if this is designed behavior for Python3.4 - or if it's a bug there? I'm working this with the Python gang as well.

Of course, this option doesn't exist under Python2.7.

I'm having a hard time following your not-really-a-patch, but it looks
like you're proposing forcing python_enable_shared=1 on Solaris,

Certainly not! I was simply offering this as evidence that PostgreSQL will build just fine, against Python3.4, using this hack. (It's useful in getting us a working build in situ o continue other testing - even before the more elegant fix - whatever that turns out to be!)

> which sounds like a pretty bad idea.  AFAIK the shared-ness of libpython is
up to whoever built it.

Indeed. As I mentioned, our Python3.4 is built shared.

                        regards, tom lane

Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Wed, Sep 28, 2016 at 9:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Sep 28, 2016 at 8:38 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> So should I change back the patch to have only one argument for the
>> eventId, and guess classId from it?
>
> Why would you need to guess?

Incorrect wording from me perhaps? i just meant that processing needs
to know what is the classId coming for a specific eventId.

> But, yes, I think one argument is much preferable.

OK. Here is the wanted patch. I have reduced the routines of WaitLatch
& friends to use only one argument, and added what is the classId
associated with a given eventId in an array of multiple fields, giving
something like that:
+ const struct wait_event_entry WaitEventEntries[] = {
+   /* Activity */
+   {WAIT_ACTIVITY, "ArchiverMain"},
[...]

I have cleaned up as well the inclusions of pgstat.h that I added
previously. Patch is attached.
--
Michael

Вложения

Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Wed, Sep 28, 2016 at 9:40 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Wed, Sep 28, 2016 at 9:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Wed, Sep 28, 2016 at 8:38 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> So should I change back the patch to have only one argument for the
>>> eventId, and guess classId from it?
>>
>> Why would you need to guess?
>
> Incorrect wording from me perhaps? i just meant that processing needs
> to know what is the classId coming for a specific eventId.
>
>> But, yes, I think one argument is much preferable.
>
> OK. Here is the wanted patch. I have reduced the routines of WaitLatch
> & friends to use only one argument, and added what is the classId
> associated with a given eventId in an array of multiple fields, giving
> something like that:
> + const struct wait_event_entry WaitEventEntries[] = {
> +   /* Activity */
> +   {WAIT_ACTIVITY, "ArchiverMain"},
> [...]
>
> I have cleaned up as well the inclusions of pgstat.h that I added
> previously. Patch is attached.

It seems to me that you haven't tested this patch very carefully,
because as far as I can see it breaks wait event reporting for both
heavyweight locks and buffer pins - or in other words two out of the
three existing cases covered by the mechanism you are patching.

The heavyweight lock portion is broken because WaitOnLock() reports a
Lock wait before calling ProcSleep(), which now clobbers it. Instead
of reporting that we're waiting for Lock/relation, a LOCK TABLE
statement that hangs now reports IPC/ProcSleep.  Similarly, a conflict
over a tuple lock is now also reported as IPC/ProcSleep, and ditto for
any other kind of lock, which were all reported separately before.
Obviously, that's no good.  I think it would be just fine to push down
setting the wait event into ProcSleep(), doing it when we actually
WaitLatch() or ResolveRecoveryConflictWithLock(), but we ought to
report that we're waiting for the heavyweight lock, not ProcSleep().

As for buffer pins, note that LockBufferForCleanup() calls
ProcWaitForSignal(), which now overwrites the wait event set in by its
caller.  I think we could just make ProcWaitForSignal() take a wait
event as an argument; then LockBufferForCleanup() could pass an
appropriate constant and other callers of ProcWaitForSignal() could
pass their own wait events.

The way that we're constructing the wait event ID that ultimately gets
advertised in pg_stat_activity is a bit silly in this patch.  We start
with an event ID (which is an integer) and we then do an array lookup
(via GetWaitEventClass) to get a second integer which is then XOR'd
back into the first integer (via pgstat_report_wait_start) before
storing it in the PGPROC.  New idea: let's change
pgstat_report_wait_start() to take a single integer argument which it
stores without interpretation.  Let's separate the construction of the
wait event into a separate macro, say make_wait_event().  Then
existing callers of pgstat_report_wait_start(a, b) will instead do
pgstat_report_wait_start(make_wait_event(a, b)), but callers that want
to construct the wait event and push it through a few levels of the
call stack before advertising it only need to pass one integer.  If
done correctly, this should be cleaner and faster than what you've got
right now.

I am not a huge fan of the "WE_" prefix you've used.  It's not
terrible, but "we" doesn't obviously stand for "wait event" and it's
also a word itself.  I think a little more verbosity would add
clarity.  Maybe we could go with WAIT_EVENT_ instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Fri, Sep 30, 2016 at 1:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> It seems to me that you haven't tested this patch very carefully,
> because as far as I can see it breaks wait event reporting for both
> heavyweight locks and buffer pins - or in other words two out of the
> three existing cases covered by the mechanism you are patching.

Oops. The WaitLatch calls overlap the other things if another event is reported.

> The heavyweight lock portion is broken because WaitOnLock() reports a
> Lock wait before calling ProcSleep(), which now clobbers it. Instead
> of reporting that we're waiting for Lock/relation, a LOCK TABLE
> statement that hangs now reports IPC/ProcSleep.  Similarly, a conflict
> over a tuple lock is now also reported as IPC/ProcSleep, and ditto for
> any other kind of lock, which were all reported separately before.
> Obviously, that's no good.  I think it would be just fine to push down
> setting the wait event into ProcSleep(), doing it when we actually
> WaitLatch() or ResolveRecoveryConflictWithLock(), but we ought to
> report that we're waiting for the heavyweight lock, not ProcSleep().

Somewhat similar problem with ResolveRecoveryConflictWithBufferPin(),
per this reasoning what we should wait for here is a buffer pin and
not a IPC/WaitForSignal.

> As for buffer pins, note that LockBufferForCleanup() calls
> ProcWaitForSignal(), which now overwrites the wait event set in by its
> caller.  I think we could just make ProcWaitForSignal() take a wait
> event as an argument; then LockBufferForCleanup() could pass an
> appropriate constant and other callers of ProcWaitForSignal() could
> pass their own wait events.

Agreed. So changed the patch this way.

> The way that we're constructing the wait event ID that ultimately gets
> advertised in pg_stat_activity is a bit silly in this patch.  We start
> with an event ID (which is an integer) and we then do an array lookup
> (via GetWaitEventClass) to get a second integer which is then XOR'd
> back into the first integer (via pgstat_report_wait_start) before
> storing it in the PGPROC.  New idea: let's change
> pgstat_report_wait_start() to take a single integer argument which it
> stores without interpretation.  Let's separate the construction of the
> wait event into a separate macro, say make_wait_event().  Then
> existing callers of pgstat_report_wait_start(a, b) will instead do
> pgstat_report_wait_start(make_wait_event(a, b)), but callers that want
> to construct the wait event and push it through a few levels of the
> call stack before advertising it only need to pass one integer.  If
> done correctly, this should be cleaner and faster than what you've got
> right now.

Hm, OK.... So ProcSleep() and ProcWaitForSignal() need an additional
argument in the shape of a uint32 wait_event. So we need to change the
shape of WaitLatch&friends to have also just a uint32 as an extra
argument. This has as result to force all the callers of WaitLatch to
use the new routine pgstat_make_wait_event() (renamed it), so pgstat.h
needs to be included where WaitLatch() is called.

And this has as consequence to make the addition of classId in
WaitEventEntries completely useless, because including them has the
advantage to reduce the places where pgstat.h is included, but as
make_wait_event is needed to the wait_event value...

> I am not a huge fan of the "WE_" prefix you've used.  It's not
> terrible, but "we" doesn't obviously stand for "wait event" and it's
> also a word itself.  I think a little more verbosity would add
> clarity.  Maybe we could go with WAIT_EVENT_ instead.

OK. Switched that back. Hopefully you find the result cleaner.
--
Michael

Вложения

Re: Tracking wait event for latches

От
Thomas Munro
Дата:
On Fri, Sep 30, 2016 at 5:47 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Sep 30, 2016 at 1:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> The way that we're constructing the wait event ID that ultimately gets
>> advertised in pg_stat_activity is a bit silly in this patch.  We start
>> with an event ID (which is an integer) and we then do an array lookup
>> (via GetWaitEventClass) to get a second integer which is then XOR'd
>> back into the first integer (via pgstat_report_wait_start) before
>> storing it in the PGPROC.  New idea: let's change
>> pgstat_report_wait_start() to take a single integer argument which it
>> stores without interpretation.  Let's separate the construction of the
>> wait event into a separate macro, say make_wait_event().  Then
>> existing callers of pgstat_report_wait_start(a, b) will instead do
>> pgstat_report_wait_start(make_wait_event(a, b)), but callers that want
>> to construct the wait event and push it through a few levels of the
>> call stack before advertising it only need to pass one integer.  If
>> done correctly, this should be cleaner and faster than what you've got
>> right now.
>
> Hm, OK.... So ProcSleep() and ProcWaitForSignal() need an additional
> argument in the shape of a uint32 wait_event. So we need to change the
> shape of WaitLatch&friends to have also just a uint32 as an extra
> argument. This has as result to force all the callers of WaitLatch to
> use the new routine pgstat_make_wait_event() (renamed it), so pgstat.h
> needs to be included where WaitLatch() is called.

Hmm.  I like the use of pgstat in that name.  It helps with the
confusion created by the overloading of the term 'wait event' in the
pg_stat_activity view and the WaitEventSet API, by putting it into a
different pseudo-namespace.

+ uint32 wait_event;

How about a typedef for that instead of using raw uint32 everywhere?
Something like pgstat_wait_descriptor?  Then a variable name like
pgstat_desc, since this is most definitely not just a wait_event
anymore.

+ /* Define event to wait for */

It's not defining the event to wait for at all, it's building a
description for pgstat.

+ wait_event = pgstat_make_wait_event(WAIT_EXTENSION,
+ WAIT_EVENT_EXTENSION);

It's not making a wait event, it's combining a class and an event.
How about something like this:

pgstat_desc = pgstat_make_wait_descriptor(WAIT_CLASS_EXTENSION,
WAIT_EVENT_EXTENSION)?
 /* Sleep until there's something to do */ wc = WaitLatchOrSocket(MyLatch,   WL_LATCH_SET | WL_SOCKET_READABLE,
PQsocket(conn),
-   -1L);
+   -1L,
+   wait_event);

... then use 'pgstat_desc' here.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Mon, Oct 3, 2016 at 12:35 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Hmm.  I like the use of pgstat in that name.  It helps with the
> confusion created by the overloading of the term 'wait event' in the
> pg_stat_activity view and the WaitEventSet API, by putting it into a
> different pseudo-namespace.
>
> + uint32 wait_event;
>
> How about a typedef for that instead of using raw uint32 everywhere?
> Something like pgstat_wait_descriptor?  Then a variable name like
> pgstat_desc, since this is most definitely not just a wait_event
> anymore.

We cannot do that because of latch.h, which now only includes
<signal.h> and it would be a bad idea to add more dependencies to
PG-specific headers.

> + /* Define event to wait for */
>
> It's not defining the event to wait for at all, it's building a
> description for pgstat.
>
> + wait_event = pgstat_make_wait_event(WAIT_EXTENSION,
> + WAIT_EVENT_EXTENSION);
>
> It's not making a wait event, it's combining a class and an event.
> How about something like this:
>
> pgstat_desc = pgstat_make_wait_descriptor(WAIT_CLASS_EXTENSION,
> WAIT_EVENT_EXTENSION)?

Maybe I sound stupid here, but I am trying to keep the same of this
macro short so I'd go for pgstat_make_wait_desc().

>   /* Sleep until there's something to do */
>   wc = WaitLatchOrSocket(MyLatch,
>     WL_LATCH_SET | WL_SOCKET_READABLE,
>     PQsocket(conn),
> -   -1L);
> +   -1L,
> +   wait_event);
>
> ... then use 'pgstat_desc' here.

For this one I agree, your naming is better. It is kind of good to let
callers of WaitLatch know where this is actually used. I have added as
well comments on top of WaitLatch & friends to mention what
pgstat_desc does, that's useful for the user.
--
Michael

Вложения

Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Mon, Oct 3, 2016 at 3:30 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> [ new patch ]

I think this is unnecessarily awkward for callers; the attached
version takes a different approach which I think will be more
convenient.  The attached version also (1) moves a lot more of the
logic from latch.c/h to pgstat.c/h, which I think is more appropriate;
(2) more thoroughly separates the wait events by class; (3) renames
SecureRead/SecureWrite to ClientRead/ClientWrite (whether to also
rename the C functions is an interesting question, but not the most
pressing one IMHO), (4) creates a real wait event for GetSafeSnapshot
and removes the unnecessary and overly generic ProcSleep and
ProcSignal wait events, and (5) incorporates a bit of copy editing.

I've tested that this seems to work in basic cases, but more testing
is surely welcome.  If there are no major objections, I will commit
this version.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Tue, Oct 4, 2016 at 1:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Oct 3, 2016 at 3:30 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> [ new patch ]
>
> I think this is unnecessarily awkward for callers; the attached
> version takes a different approach which I think will be more
> convenient.  The attached version also (1) moves a lot more of the
> logic from latch.c/h to pgstat.c/h, which I think is more appropriate;
> (2) more thoroughly separates the wait events by class; (3) renames
> SecureRead/SecureWrite to ClientRead/ClientWrite (whether to also
> rename the C functions is an interesting question, but not the most
> pressing one IMHO), (4) creates a real wait event for GetSafeSnapshot
> and removes the unnecessary and overly generic ProcSleep and
> ProcSignal wait events, and (5) incorporates a bit of copy editing.

OK with that.

> I've tested that this seems to work in basic cases, but more testing
> is surely welcome.  If there are no major objections, I will commit
> this version.

In pgstat_get_wait_event_type you are forgetting WAIT_IPC.

+        <row>
+         <entry morerows="10"><literal>IPC</></entry>
+         <entry><literal>BgWorkerShutdown</></entry>
+         <entry>Waiting for background worker to shut down.</entry>
+        </row>
Here this should be morerows=9. You removed two entries, and added one
with SafeSnapshot.

The rest looks good to me. Thanks for the feedback and the time!
--
Michael

Вложения

Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Mon, Oct 3, 2016 at 8:43 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> The rest looks good to me. Thanks for the feedback and the time!

Thanks for the fixes.  I committed this with an additional compile
fix, but the buildfarm turned up a few more problems that my 'make
check-world' didn't find.  Hopefully those are fixed now, but we'll
see.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking wait event for latches

От
Thomas Munro
Дата:
On Wed, Oct 5, 2016 at 4:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Oct 3, 2016 at 8:43 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> The rest looks good to me. Thanks for the feedback and the time!
>
> Thanks for the fixes.  I committed this with an additional compile
> fix, but the buildfarm turned up a few more problems that my 'make
> check-world' didn't find.  Hopefully those are fixed now, but we'll
> see.

Nitpicking: the includes in bgworker.c weren't sorted properly, and
then this patch added "pgstat.h" in the wrong position.  See attached
suggestion.

--
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Wed, Oct 5, 2016 at 4:28 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Oct 5, 2016 at 4:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Oct 3, 2016 at 8:43 PM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> The rest looks good to me. Thanks for the feedback and the time!
>>
>> Thanks for the fixes.  I committed this...

Yeh!

>> ... with an additional compile
>> fix, but the buildfarm turned up a few more problems that my 'make
>> check-world' didn't find.  Hopefully those are fixed now, but we'll
>> see.

I saw that after waking up... As usual the buildfarm is catching up
many of the things I missed..

> Nitpicking: the includes in bgworker.c weren't sorted properly, and
> then this patch added "pgstat.h" in the wrong position.  See attached
> suggestion.

Yes, that should be fixed.

And for the rest, sorry for the delay. Timezones...

More seriously, the Windows animals have been complaining about
pg_sleep() crashing the system:
  SELECT pg_sleep(0.1);
! server closed the connection unexpectedly
! This probably means the server terminated abnormally
! before or while processing the request.
! connection to server was lost

And I think that the answer to this crash is in WaitForSingleObject(),
where the macro WAIT_TIMEOUT is already defined, so there is an
overlap with the new declarations in pgstat.h:
https://msdn.microsoft.com/en-us/library/aa450988.aspx
This is also generating a bunch of build warnings now that I compile
HEAD on Windows. Regression tests are not crashing here, but I am
getting a failure in stats.sql and pg_sleep is broken. I swear I
tested that at some point and did not see a crash or those warnings...
But well what's done is done.

It seems to me that a correct answer would be to rename this class ID.
But instead I'd suggest to append the prefix PG_* to all the class
events like in the attached, that passes make-check, contrib-check,
modules-check and builds without warnings on Windows. A more simple
fix would be just to rename WAIT_TIMEOUT to something else but
appending PG_ looks better in the long term.
--
Michael

Вложения

Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Tue, Oct 4, 2016 at 3:28 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Nitpicking: the includes in bgworker.c weren't sorted properly, and
> then this patch added "pgstat.h" in the wrong position.  See attached
> suggestion.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking wait event for latches

От
Robert Haas
Дата:
On Tue, Oct 4, 2016 at 5:07 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> More seriously, the Windows animals have been complaining about
> pg_sleep() crashing the system:
>   SELECT pg_sleep(0.1);
> ! server closed the connection unexpectedly
> ! This probably means the server terminated abnormally
> ! before or while processing the request.
> ! connection to server was lost
>
> And I think that the answer to this crash is in WaitForSingleObject(),
> where the macro WAIT_TIMEOUT is already defined, so there is an
> overlap with the new declarations in pgstat.h:
> https://msdn.microsoft.com/en-us/library/aa450988.aspx
> This is also generating a bunch of build warnings now that I compile
> HEAD on Windows. Regression tests are not crashing here, but I am
> getting a failure in stats.sql and pg_sleep is broken. I swear I
> tested that at some point and did not see a crash or those warnings...
> But well what's done is done.
>
> It seems to me that a correct answer would be to rename this class ID.
> But instead I'd suggest to append the prefix PG_* to all the class
> events like in the attached, that passes make-check, contrib-check,
> modules-check and builds without warnings on Windows. A more simple
> fix would be just to rename WAIT_TIMEOUT to something else but
> appending PG_ looks better in the long term.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking wait event for latches

От
Michael Paquier
Дата:
On Wed, Oct 5, 2016 at 9:25 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Oct 4, 2016 at 5:07 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> More seriously, the Windows animals have been complaining about
>> pg_sleep() crashing the system:
>>   SELECT pg_sleep(0.1);
>> ! server closed the connection unexpectedly
>> ! This probably means the server terminated abnormally
>> ! before or while processing the request.
>> ! connection to server was lost
>>
>> And I think that the answer to this crash is in WaitForSingleObject(),
>> where the macro WAIT_TIMEOUT is already defined, so there is an
>> overlap with the new declarations in pgstat.h:
>> https://msdn.microsoft.com/en-us/library/aa450988.aspx
>> This is also generating a bunch of build warnings now that I compile
>> HEAD on Windows. Regression tests are not crashing here, but I am
>> getting a failure in stats.sql and pg_sleep is broken. I swear I
>> tested that at some point and did not see a crash or those warnings...
>> But well what's done is done.
>>
>> It seems to me that a correct answer would be to rename this class ID.
>> But instead I'd suggest to append the prefix PG_* to all the class
>> events like in the attached, that passes make-check, contrib-check,
>> modules-check and builds without warnings on Windows. A more simple
>> fix would be just to rename WAIT_TIMEOUT to something else but
>> appending PG_ looks better in the long term.
>
> Committed.

Thanks.
-- 
Michael