Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0
От | Michael Paquier |
---|---|
Тема | Re: [HACKERS] Hang in pldebugger after git commit : 98a64d0 |
Дата | |
Msg-id | CAB7nPqSs-1FVUG7he-Lp8M0j8HdkxD-W4CzxzuPemprKZZRh8g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Hang in pldebugger after git commit : 98a64d0 (Ashutosh Sharma <ashu.coek88@gmail.com>) |
Список | pgsql-hackers |
On Fri, Dec 9, 2016 at 2:38 AM, Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > Well, we are currently passing INFINITE timeout value to > WaitForMultipleObjects() API which is hanging. I thought of changing this > INIFINTE timeout interval to some finite value like say 1 sec and could not > reproduce the issue. But, having said that i checked the older code where > this issue does not exists and found here as well we are passing INFINTE > timeout value to WaitForMultipleObjects(). So not sure if this passing > INFINITE timeout is really an issue. Attached is a small patch that has my > changes. That's obviously incorrect to me. This should not wait for a timeout if the caller does not want to. So you are likely breaking a bunch of code by doing so, including many extensions on Windows. The pre-WaitEventSet code uses that: - if (wakeEvents & WL_TIMEOUT) - { - INSTR_TIME_SET_CURRENT(start_time); - Assert(timeout >= 0 && timeout <= INT_MAX); - cur_timeout = timeout; - } - else - cur_timeout = INFINITE; INFINITE maps to -1 by looking at the MS docs, and that's as well what the new code does so the inconsistency is not there. And the new code does not bother about setting INFINITE and just uses -1. I have tried to compile pldebugger with MSVC but gave up at the end, so I am falling back to some code review for the time being. Andres mentioned me that this Windows code was in need of an extra lookup. And from what I can see, the logic that we have before 98a64d0 is a set of handles using WaitForMultipleObjects with a pre-allocated position in the handle array. The new code made things more generic by allocating the events depending on what the user has set up. pgwin32_signal_event is correctly using the first handle, and other events registered correctly adapt to this position. I have spent some time reviewing the code, and I think that I have spotted one problem. The event set that the code you are triggering is waiting for is FeBeWaitSet, which gets initialized in pq_init()@pqcomm.c. 3 events are being set there. This goes through CreateWaitEventSet, that sets up pgwin32_signal_event as first handle. So far so good. Then a bunch of events are added with AddWaitEventToSet. Which is fine as well as the handling of handles with WSA_INVALID_EVENT looks correct, because its value is NULL and there is a static assertion to check things. One place in the code has spotted my attention: +#elif defined(WAIT_USE_WIN32) + set->handles = (HANDLE) data; + data += sizeof(HANDLE) * nevents; +#endif This should be actually (nevents + 1) to take into account pgwin32_signal_event. That's harmless now but it would if new fields are added to WaitEventSet in the future. A second thing that I am noticing is that in the new code: + * Note: we assume that the kernel calls involved in latch management + * will provide adequate synchronization on machines with weak memory + * ordering, so that we cannot miss seeing is_set if a notification + * has already been queued. + */ + if (set->latch && set->latch->is_set) + { + occurred_events->fd = PGINVALID_SOCKET; + occurred_events->pos = set->latch_pos; + occurred_events->user_data = + set->events[set->latch_pos].user_data; + occurred_events->events = WL_LATCH_SET; + occurred_events++; + returned_events++; + + break; + } This basically means that if the latch is set, we don't wait at all and drop the ball. I am wondering: isn't that a problem even if WL_LATCH_SET is *not* set? If I read this code correctly, even if caller has not set WL_LATCH_SET and the latch is set, then the wait will stop. Still reviewing the code... -- Michael
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Venkata B NagothiДата:
Сообщение: Re: [HACKERS] Declarative partitioning - another take
Следующее
От: Masahiko SawadaДата:
Сообщение: Re: [HACKERS] Transactions involving multiple postgres foreign servers