Re: BUG #17619: AllocSizeIsValid violation in parallel hash join

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #17619: AllocSizeIsValid violation in parallel hash join
Дата
Msg-id CA+hUKGJV54w8jVqdBcpP7LaCL8PhcEhT97-nfrTcD2rdKCcteA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17619: AllocSizeIsValid violation in parallel hash join  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: BUG #17619: AllocSizeIsValid violation in parallel hash join  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-bugs
On Wed, Sep 28, 2022 at 7:33 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Tue, Sep 27, 2022 at 9:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Right, the missing piece is the intentional clobber.
>
> That does seem like the best place to start. The attached patch adds
> clobbering that works exactly as you'd expect. This approach is
> obviously correct. It also doesn't require any reasoning about
> Valgrind's treatment of memory mappings for shared memory, which is
> quite complicated given the inconsistent rules about who initializes
> what memory (if it's leader or workers).
>
> I find that the tests pass with this patch -- so it probably won't
> catch the bug that Thomas mentioned via running the tests (at least
> not reliably). However, if I revert parallel VACUUM bugfix commit
> 662ba729 and then run the tests, they fail very reliably, in several
> places. That seems like a big improvement.

The reason it doesn't catch that bug on master is because that npages
shmem variable is only used to prevent further reading once a scan
hits the end of a shared tuplestore chunk and needs to decide whether
to read a new one, but if a chunk is partially filled then we end the
scan sooner because there's a number-of-items counter in the chunk
header.  I noticed because the test module I wrote to study Dmitry's
report fills chunks exactly to the end, so I assume the clobber patch
+ that test module patch would reveal the problem.

I was assuming it didn't break the case you mentioned because that's
just stats counters (maybe those finish up wrong but that's probably
not a failure), but now it sounds like you've seen another reason.

> I believe that Thomas was going to do something like this anyway. I'm
> happy to leave it up to him, but I can pursue this separately if that
> makes sense.

Why not clobber "lower down" in dsm_create(), as I showed?  You don't
have to use the table-of-contents mechanism to use DSM memory.



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17619: AllocSizeIsValid violation in parallel hash join
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17619: AllocSizeIsValid violation in parallel hash join