Re: FETCH FIRST clause PERCENT option

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: FETCH FIRST clause PERCENT option
Дата
Msg-id 6a7304ee-a145-9ca8-9834-45466ced840b@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: FETCH FIRST clause PERCENT option  (Surafel Temesgen <surafel3000@gmail.com>)
Список pgsql-hackers
On 1/4/19 8:08 AM, Surafel Temesgen wrote:
> 
> 
> 
> On Tue, Jan 1, 2019 at 10:08 PM Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> wrote:
> 
>     The execution part of the patch seems to be working correctly, but I
>     think there's an improvement - we don't need to execute the outer plan
>     to completion before emitting the first row. For example, let's say the
>     outer plan produces 10000 rows in total and we're supposed to return the
>     first 1% of those rows. We can emit the first row after fetching the
>     first 100 rows, we don't have to wait for fetching all 10k rows.
> 
> 
> my other concern is in this case we are going to determine limit and
> safe row to return on the fly
> which have additional computation and i fair that it will not perform
> better on most of the case
> 

I very much doubt recomputing the number of rows to return will be
measurable at all, considering those are int64 values. It's likely
negligible compared to all the other overhead (reading rows from the
subplan, etc.).

And even if it turns out expensive, there are ways to make it less
expensive (e.g. not computing it every time, but only when it actually
can increment the value - for example with 1% rows to return, it's
enough to recompute it every 100 rows).

It also very much depends on which use cases you consider important.
Clearly, you're only thinking about the case that actually requires
fetching everything. But there's also the use case where you have
another limit node somewhere above.

So let's do a simple "cost" analysis of these two use cases:

1) case #1 (no LIMIT above, requires fetching everything)

Let's assume the "cost" of the current approach (fetching everything in
advance, computing node->count once) is the baseline here.

With the incremental approach, there likely is some extra overhead.
Let's overshoot it a bit - with 5% overhead, it's 1.05 of the baseline.

2) case #2 (LIMIT/EXISTS above, only needs to fetch the first few rows,
perhaps just a single one)

Here, we take the incremental approach as the baseline. With the "fetch
everything first" approach we need to evaluate the whole subplan and
fetch all the results, instead of just some small fraction. But the
exact ratio is pretty much arbitrary - I can easily construct queries
that need to fetch 100x or 100000x more data from the subplan.

So essentially using the "incremental fetch" approach means we may pay
5% overhead in cases where we need to fetch everything, while the "fetch
everything first" approach means we'll add arbitrary amount of overhead
to cases where we don't need to fetch everything.

Of course, you one way to deal with this would be to argue the cases
benefiting from incremental approach are very rare / non-existent.
Perhaps that's the case, although I don't see why it would be.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: START/END line number for COPY FROM
Следующее
От: Tom Lane
Дата:
Сообщение: Re: commit fest app: Authors