Re: FETCH FIRST clause PERCENT option

Поиск
Список
Период
Сортировка
От Mark Dilger
Тема Re: FETCH FIRST clause PERCENT option
Дата
Msg-id C33C2805-2664-4687-B554-3372157F0A59@gmail.com
обсуждение исходный текст
Ответ на Re: FETCH FIRST clause PERCENT option  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers

> On Sep 20, 2018, at 5:29 PM, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2018-09-20 17:06:36 -0700, Mark Dilger wrote:
>> I should think that spilling anything to a tuplestore would only be needed
>> if the query contains an ORDER BY expression.  If you query
>>
>>     FETCH FIRST 50 PERCENT * FROM foo;
>>
>> you should just return every other row, discarding the rest, right?  It's
>> only when an explicit ordering is given that the need to store the results
>> arises.  Even with
>>
>>     FETCH FIRST 50 PERCENT name FROM foo ORDER BY name;
>>
>> you can return one row for every two rows that you get back from the
>> sort node, reducing the maximum number you need to store at any time to
>> no more than 25% of all rows.
>
> I'm doubtful about the validity of these optimizations, particularly
> around being surprising. But I think more importantly, we should focus
> on the basic implementation that's needed anyway.

You may be right that getting the basic implementation finished first
is better than optimizing at this stage.  So the rest of what I'm going
to say is just in defense of the optimization, and not an argument for
needing to optimize right away.

As for reducing the surprise factor, I think that it would be surprising
if I ask for a smallish percentage of rows and it takes significantly longer
and significantly more memory or disk than asking for all the rows takes.
If I'm including an explicit ORDER BY, then that explains it, but otherwise,
I'd be surprised.  Note that I'm not saying I'd be surprised by it taking
roughly the same length of time / memory / disk.  I'd only be surprised if
it took a lot more.

There are plenty of SQL generation engines that people put in their software.
I'd expect something like

    sprintf("FETCH FIRST %d PERCENT %s FROM %s", percentage, columns, tablename)

to show up in such engines, and percentage to sometimes be 100.  At least
in that case you should just return all rows rather than dumping them into
a tuplestore.  Likewise, if the percentage is 0, you'll want to finish quickly.
Actually, I don't know if the SQL spec would require side effects to still
happen, in which case you'd still have to generate all rows for their side
effects to happen, and then just not return them.  But still, no tuplestore.
So the implementation of FETCH FIRST would at least need to think about what
percentage is being requested, rather than just mindlessly adding a node to
the tree for storing everything, then computing the LIMIT based on the number
of rows stored, and then returning that number of rows.

mark

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hubert Zhang
Дата:
Сообщение: Re: Proposal for disk quota feature
Следующее
От: Gilles Darold
Дата:
Сообщение: [patch] Bug in pg_dump/pg_restore using --no-publication