Re: Support Parallel Query Execution in Executor

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Support Parallel Query Execution in Executor
Дата
Msg-id 9234.1144600050@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Support Parallel Query Execution in Executor  (Myron Scott <lister@sacadia.com>)
Ответы Re: Support Parallel Query Execution in Executor  ("Jonah H. Harris" <jonah.harris@gmail.com>)
Re: Support Parallel Query Execution in Executor  ("Luke Lonergan" <llonergan@greenplum.com>)
Re: Support Parallel Query Execution in Executor  (Bruce Momjian <pgman@candle.pha.pa.us>)
Список pgsql-hackers
Myron Scott <lister@sacadia.com> writes:
> Gregory Maxwell wrote:
>> There are other cases where it is useful to perform parallel I/O
>> without parallel processing..

> I have done some testing more along these lines with an old fork of
> postgres code (2001).  In my tests, I used a thread to delegate out
> the actual heap scan of the SeqScan.  The job of the "slave" thread
> the was to fault in buffer pages and determine the time validity of
> the tuples.  ItemPointers are passed back to the "master" thread via a
> common memory area guarded by mutex locking.

I was considering a variant idea in the shower this morning: suppose
that we invent one or more "background reader" processes that have
basically the same infrastructure as the background writer, but have
the responsibility of causing buffer reads to happen at useful times
(whereas the writer causes writes to happen at useful times).  The
idea would be for backends to signal the readers when they know they
will need a given block soon, and then hopefully when they need it
it'll already be in shared buffers.  For instance, in a seqscan it'd be
pretty trivial to request block N+1 just after reading block N, and then
doing our actual processing on block N while (we hope) some reader
process is faulting in N+1.  Bitmap indexscans could use this structure
too; I'm less sure about whether plain indexscans could do much with it
though.

The major issues I can see are:

1. We'd need a shared-memory queue of read requests, probably much like
the queue of fsync requests.  We've already seen problems with
contention for the fsync queue, IIRC, and that's used much less heavily
than the read request queue would be.  So there might be some
performance issues with getting the block requests sent over to the
readers.

2. There are some low-level assumptions that no one reads in pages of
a relation without having some kind of lock on the relation (consider
eg the case where the relation is being dropped).  A bgwriter-like
process wouldn't be able to hold lmgr locks, and we wouldn't really want
it to be thrashing the lmgr shared data structures for each read anyway.
So you'd have to design some interlock to guarantee that no backend
abandons a query (and releases its own lmgr locks) while an async read
request it made is still pending.  Ugh.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Support Parallel Query Execution in Executor
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Support Parallel Query Execution in Executor