Re: Parallel Foreign Scans - need advice

Поиск
Список
Период
Сортировка
От Korry Douglas
Тема Re: Parallel Foreign Scans - need advice
Дата
Msg-id 17F38B78-CF4D-426E-BAC8-41626ED150AC@me.com
обсуждение исходный текст
Ответ на Re: Parallel Foreign Scans - need advice  (Andres Freund <andres@anarazel.de>)
Ответы Re: Parallel Foreign Scans - need advice  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Thanks for the quick answer Andres.  You’re right - it was parallel_tuple_cost that was getting in my way; my query
returnsabout 6 million rows  so I guess that can add up. 

If I change parallel_tuple_scan from 0.1 to 0.0001, I get a parallel foreign scan.

With 4 workers, that reduces my execution time by about half.

But, nworkers_launched is always set to 0 in InitializeDSMForeignScan(), so that won’t work.  Any other ideas?

                — Korry

> On May 15, 2019, at 1:08 PM, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-05-15 12:55:33 -0400, Korry Douglas wrote:
>> Hi all, I’m working on an FDW that would benefit greatly from parallel foreign scan.  I have implemented the
callbacksdescribed here:https://www.postgresql.org/docs/devel/fdw-callbacks.html#FDW-CALLBACKS-PARALLEL. and I see a
bigimprovement in certain plans. 
>>
>> My problem is that I can’t seem to get a parallel foreign scan in a query that does not contain an aggregate.
>>
>> For example:
>>   SELECT count(*) FROM foreign table;
>> Gives me a parallel scan, but
>>   SELECT * FROM foreign table;
>> Does not.
>
> Well, that'd be bound by the cost of transferring tuples between workers
> and leader. You don't get, unless you fiddle heavily with the cost, a
> parallel scan for the equivalent local table scan either. You can
> probably force the planner's hand by setting parallel_setup_cost,
> parallel_tuple_cost very low - but it's unlikely to be beneficial.
>
> If you added a where clause that needs to be evaluated outside the FDW,
> you'd probably see parallel scans without fiddling with the costs.
>
>
>> A second related question - how can I find the actual number of
>> workers chose for my ForeignScan?  At the moment, I looking at
>> ParallelContext->nworkers (inside of the InitializeDSMForeignScan()
>> callback) because that seems to be the first callback function that
>> might provide the worker count.  I need the *actual* worker count in
>> order to evenly distribute my workload.  I can’t use the usual trick
>> of having each worker grab the next available chunk (because I have to
>> avoid seek operations on compressed data). In other words, it is of
>> great advantage for each worker to read contiguous chunks of data -
>> seeking to another part of the file is prohibitively expensive.
>
> Don't think - but am not sure - that there's a nicer way
> currently. Although I'd use nworkers_launched, rather than nworkers.
>
> Greetings,
>
> Andres Freund




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Parallel Foreign Scans - need advice
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Parallel Foreign Scans - need advice