Re: Parallel Foreign Scans - need advice

Поиск
Список
Период
Сортировка
От Korry Douglas
Тема Re: Parallel Foreign Scans - need advice
Дата
Msg-id 598BD6D6-B5CF-450D-A0D1-5886602FF0AA@me.com
обсуждение исходный текст
Ответ на Re: Parallel Foreign Scans - need advice  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: Parallel Foreign Scans - need advice  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers
> That's only a superficial problem.  You don't even know if or when the
> workers that are launched will all finish up running your particular
> node, because (for example) they might be sent to different children
> of a Parallel Append node above you (AFAICS there is no way for a
> participant to indicate "I've finished all the work allocated to me,
> but I happen to know that some other worker #3 is needed here" -- as
> soon as any participant reports that it has executed the plan to
> completion, pa_finished[] will prevent new workers from picking that
> node to execute).  Suppose we made a rule that *every* worker must
> visit *every* partial child of a Parallel Append and run it to
> completion (and any similar node in the future must do the same)...
> then I think there is still a higher level design problem: if you do
> allocate work up front rather than on demand, then work could be
> unevenly distributed, and parallel query would be weakened.

What I really need (for the scheme I’m using at the moment) is to know how many workers will be used to execute my
particularPlan.  I understand that some workers will naturally end up idle while the last (busy) worker finishes up.
I’mdividing the workload (the number of row groups to scan) by the number of workers to get an even distribution.    

I’m willing to pay that price (at least, I haven’t seen a problem so far… famous last words)

I do plan to switch over to get-next-chunk allocator as you mentioned below, but I’d like to get the minimized-seek
mechanismworking first. 

It sounds like there is no reliable way to get the information that I’m looking for, is that right?

> So I think you ideally need a simple get-next-chunk work allocator
> (like Parallel Seq Scan and like the file_fdw patch I posted[1]), or a
> pass-the-baton work allocator when there is a dependency between
> chunks (like Parallel Index Scan for btrees), or a more complicated
> multi-phase system that counts participants arriving and joining in
> (like Parallel Hash) so that participants can coordinate and wait for
> each other in controlled circumstances.

I haven’t looked at Parallel Hash - will try to understand that next.

> If this compressed data doesn't have natural chunks designed for this
> purpose (like, say, ORC stripes), perhaps you could have a dedicated
> workers streaming data (compressed? decompressed?) into shared memory,
> and parallel query participants coordinating to consume chunks of
> that?


I’ll give that some thought.  Thanks for the ideas.

                    — Korry




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: PSA: New intel MDS vulnerability mitigations cause measurable slowdown
Следующее
От: Dean Rasheed
Дата:
Сообщение: Re: Multivariate MCV stats can leak data to unprivileged users