Re: fix cost subqueryscan wrong parallel cost

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: fix cost subqueryscan wrong parallel cost
Дата
Msg-id CA+Tgmoab-_Ci_qJdQj-0Cy2B2Ht-e0fkVL0AWbajmu_H_kqU1g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: fix cost subqueryscan wrong parallel cost  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: fix cost subqueryscan wrong parallel cost  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Fri, Apr 29, 2022 at 3:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > So perhaps we should do it more like the attached, which produces
> > this plan for the UNION case:
>
> sigh ... actually attached this time.

I am not sure whether this is actually correct, but it seems a lot
more believable than the previous proposals. The problem might be more
general, though. I think when I developed this parallel query stuff I
modeled a lot of it on what you did for parameterized paths. Both
parameterized paths and parallelism can create situations where
executing a path to completion produces fewer rows than you would
otherwise get. In the case of parameterized paths, this happens
because we enforce the parameterization we've chosen on top of the
user-supplied quals. In the case of parallelism, it happens because
the rows are split up across the different workers. I think I intended
that the "rows" field of RelOptInfo should be the row count for the
relation in total, and that the "rows" field of the Path should be the
number of rows we expect to get for one execution of the path. But it
seems like this problem is good evidence that I didn't find all the
places that need to be adjusted for parallelism, and I wouldn't be
very surprised if there are a bunch of others that I overlooked.

It's not actually very nice that we end up having to call
clauselist_selectivity() here. We've already called
set_baserel_size_estimates() to figure out how many rows we expect to
have been filtered out by the quals, and it sucks to have to do it
again. Brainstorming wildly and maybe stupidly, I wonder if the whole
model is wrong here. Maybe a path shouldn't have a row count; instead,
maybe it should have a multiplier that it applies to the relation's
row count. Then, if X is parameterized in the same way as its subpath
Y, we can just copy the multiplier up, but now it will be applied to
the new rel's "rows" value, which will have already been adjusted
appropriately by set_baserel_size_estimates().

And having thrown out that wild and crazy idea, I will now run away
quickly and hide someplace.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: bogus: logical replication rows/cols combinations
Следующее
От: Tom Lane
Дата:
Сообщение: Re: fix cost subqueryscan wrong parallel cost