Re: FETCH FIRST clause WITH TIES option

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: FETCH FIRST clause WITH TIES option
Дата
Msg-id e5c5129c-2518-4a5d-fc28-0a080a08526f@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: FETCH FIRST clause WITH TIES option  (Surafel Temesgen <surafel3000@gmail.com>)
Ответы Re: FETCH FIRST clause WITH TIES option  (Surafel Temesgen <surafel3000@gmail.com>)
Список pgsql-hackers

On 1/2/19 11:51 AM, Surafel Temesgen wrote:
> 
> 
> On Tue, Jan 1, 2019 at 8:38 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
>     >
>     >     The attached patch include all the comment given by Tomas and i
>     >     check sql standard about LIMIT and this feature
>     >
> 
>     Unfortunately, it seems the "limit" regression tests fail for some
>     reason - the output mismatches the expected results for some reason. It
>     seems as if the WITH TIES code affects ordering of the results within
>     the group. See the attached file.
> 
> 
> Yes the reason is the order of returned row is not always the same. I
> remove other columns from the result set to get constant result
> 

Thanks, that seems reasonable.

After looking at the "FETCH FIRST ... PERCENT" patch, I wonder if this
patch should tweak estimates in some way. Currently, the cardinality
estimate is the same as for plain LIMIT, using the requested number of
rows. But let's say there are very few large groups - that will
naturally increase the number of rows produced.

As an example, let's say the subplan produces 1M rows, and there are
1000 groups (when split according to the ORDER BY clause). Consider a
query with "FETCH FIRST 10 ROWS WITH TIES". AFAICS the current estimate
will be 10, but in fact we know that it's likely to produce ~1000 rows
(because that's the average group size).

So I think the patch should tweak the estimate to be

  limitCount + (avgGroupSize/2);

or perhaps

  Max(avgGroupSize, limitCount + (avgGroupSize/2))

The 1/2 is there because we don't know where the group starts.

Of course, using average group size like this is rather crude, but it's
about the best thing we can do. In principle, increasing the cardinality
estimate is the right thing to do.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: chained transactions
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: explain plans with information about (modified) gucs