Обсуждение: Can simplify 'limit 1' with slow function?

Поиск
Список
Период
Сортировка

Can simplify 'limit 1' with slow function?

От
"gotoschool6g"
Дата:
The simplified scene: 
select slowfunction(s) from a order by b limit 1;
is slow than
select slowfunction(s) from (select s from a order by b limit 1) as z;
if there are many records in table 'a'.


The real scene. Function  ST_Distance_Sphere is slow, the query:
SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road order by c limit 1;
is slow than:
select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s from road order by c limit 1) as a;
There are about 7000 records in 'road'.  

Re: Can simplify 'limit 1' with slow function?

От
Martijn van Oosterhout
Дата:
On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote:
> The simplified scene:
> select slowfunction(s) from a order by b limit 1;
> is slow than
> select slowfunction(s) from (select s from a order by b limit 1) as z;
> if there are many records in table 'a'.
>
>
> The real scene. Function  ST_Distance_Sphere is slow, the query:
> SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road order by c limit 1;
> is slow than:
> select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s from road order by c limit 1) as a;
> There are about 7000 records in 'road'.

I think to help here I think we need the EXPLAIN ANALYSE output for
both queries.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: Can simplify 'limit 1' with slow function?

От
Merlin Moncure
Дата:
On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout
<kleptog@svana.org> wrote:
> On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote:
>> The simplified scene:
>> select slowfunction(s) from a order by b limit 1;
>> is slow than
>> select slowfunction(s) from (select s from a order by b limit 1) as z;
>> if there are many records in table 'a'.
>>
>>
>> The real scene. Function  ST_Distance_Sphere is slow, the query:
>> SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road order by c limit 1;
>> is slow than:
>> select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s from road order by c limit 1) as a;
>> There are about 7000 records in 'road'.
>
> I think to help here I think we need the EXPLAIN ANALYSE output for
> both queries.

Well, I think the problem is a well understood one: there is no
guarantee that functions-in-select-list are called exactly once per
output row.  This is documented -- for example see here:
http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS.
In short, if you want very precise control of function evaluation use
a subquery, or, if you're really paranoid, a CTE.

merlin



Re: Can simplify 'limit 1' with slow function?

От
David G Johnston
Дата:
Merlin Moncure-2 wrote
> On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout
> <

> kleptog@

> > wrote:
>> On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote:
>>> The simplified scene:
>>> select slowfunction(s) from a order by b limit 1;
>>> is slow than
>>> select slowfunction(s) from (select s from a order by b limit 1) as z;
>>> if there are many records in table 'a'.
>>>
>>>
>>> The real scene. Function  ST_Distance_Sphere is slow, the query:
>>> SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road
>>> order by c limit 1;
>>> is slow than:
>>> select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT
>>> s from road order by c limit 1) as a;
>>> There are about 7000 records in 'road'.
>>
>> I think to help here I think we need the EXPLAIN ANALYSE output for
>> both queries.
> 
> Well, I think the problem is a well understood one: there is no
> guarantee that functions-in-select-list are called exactly once per
> output row.  This is documented -- for example see here:
> http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS.
> In short, if you want very precise control of function evaluation use
> a subquery, or, if you're really paranoid, a CTE.
> 
> merlin

I would have to disagree on the "this is documented" comment - the linked
section on advisory locks does not constitute documentation of the fact that
limit can be applied after expressions in the select-list are evaluated.

http://www.postgresql.org/docs/9.3/static/sql-select.html

In the select command documentation item 5 covers select-list evaluation
while item 9 covers limit thus implying what we are saying - though keep in
mind each select statement gets processed independently and possibly in a
correlated fashion (i.e. potentially multiple times).

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Can-simplify-limit-1-with-slow-function-tp5809997p5810061.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Can simplify 'limit 1' with slow function?

От
Merlin Moncure
Дата:
On Tue, Jul 1, 2014 at 3:06 PM, David G Johnston
<david.g.johnston@gmail.com> wrote:
> Merlin Moncure-2 wrote
>> On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout
>> <
>
>> kleptog@
>
>> > wrote:
>>> On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote:
>>>> The simplified scene:
>>>> select slowfunction(s) from a order by b limit 1;
>>>> is slow than
>>>> select slowfunction(s) from (select s from a order by b limit 1) as z;
>>>> if there are many records in table 'a'.
>>>>
>>>>
>>>> The real scene. Function  ST_Distance_Sphere is slow, the query:
>>>> SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road
>>>> order by c limit 1;
>>>> is slow than:
>>>> select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT
>>>> s from road order by c limit 1) as a;
>>>> There are about 7000 records in 'road'.
>>>
>>> I think to help here I think we need the EXPLAIN ANALYSE output for
>>> both queries.
>>
>> Well, I think the problem is a well understood one: there is no
>> guarantee that functions-in-select-list are called exactly once per
>> output row.  This is documented -- for example see here:
>> http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS.
>> In short, if you want very precise control of function evaluation use
>> a subquery, or, if you're really paranoid, a CTE.
>>
>> merlin
>
> I would have to disagree on the "this is documented" comment - the linked
> section on advisory locks does not constitute documentation of the fact that
> limit can be applied after expressions in the select-list are evaluated.
>
> http://www.postgresql.org/docs/9.3/static/sql-select.html
>
> In the select command documentation item 5 covers select-list evaluation
> while item 9 covers limit thus implying what we are saying - though keep in
> mind each select statement gets processed independently and possibly in a
> correlated fashion (i.e. potentially multiple times).

Sure, although I did not claim that..the select documentation *does*
cover this behavior but I find the syntax driven doc pages to be
fairly arcane and unhelpful -- they don't say (for the most part)
"avoid this" or "do that".  I pointed out this particular section
because it proved an example that matched the OP's problem case.

merlin



Re: Can simplify 'limit 1' with slow function?

От
Martijn van Oosterhout
Дата:
On Tue, Jul 01, 2014 at 02:36:55PM -0500, Merlin Moncure wrote:
> On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout
> <kleptog@svana.org> wrote:
> > On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote:
> >> The simplified scene:
> >> select slowfunction(s) from a order by b limit 1;
> >> is slow than
> >> select slowfunction(s) from (select s from a order by b limit 1) as z;
> >> if there are many records in table 'a'.
> >>
> >>
> >> The real scene. Function  ST_Distance_Sphere is slow, the query:
> >> SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road order by c limit 1;
> >> is slow than:
> >> select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s from road order by c limit 1) as a;
> >> There are about 7000 records in 'road'.
> >
> > I think to help here I think we need the EXPLAIN ANALYSE output for
> > both queries.
>
> Well, I think the problem is a well understood one: there is no
> guarantee that functions-in-select-list are called exactly once per
> output row.  This is documented -- for example see here:
> http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS.
> In short, if you want very precise control of function evaluation use
> a subquery, or, if you're really paranoid, a CTE.

I'm probably dense, but I'm not sure I understand. Or it is that the
slowfunction() is called prior to the sort? That seems insane.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: Can simplify 'limit 1' with slow function?

От
David G Johnston
Дата:
Martijn van Oosterhout wrote
> On Tue, Jul 01, 2014 at 02:36:55PM -0500, Merlin Moncure wrote:
>> On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout
>> <

> kleptog@

> > wrote:
>> > On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote:
>> >> The simplified scene:
>> >> select slowfunction(s) from a order by b limit 1;
>> >> is slow than
>> >> select slowfunction(s) from (select s from a order by b limit 1) as z;
>> >> if there are many records in table 'a'.
>> >>
>> >>
>> >> The real scene. Function  ST_Distance_Sphere is slow, the query:
>> >> SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road
>> order by c limit 1;
>> >> is slow than:
>> >> select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from
>> (SELECT s from road order by c limit 1) as a;
>> >> There are about 7000 records in 'road'.
>> >
>> > I think to help here I think we need the EXPLAIN ANALYSE output for
>> > both queries.
>> 
>> Well, I think the problem is a well understood one: there is no
>> guarantee that functions-in-select-list are called exactly once per
>> output row.  This is documented -- for example see here:
>> http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS.
>> In short, if you want very precise control of function evaluation use
>> a subquery, or, if you're really paranoid, a CTE.
> 
> I'm probably dense, but I'm not sure I understand. Or it is that the
> slowfunction() is called prior to the sort? That seems insane.

The basic reality is that limit applies to the final set of rows that could
be output.  Since stuff like group by and distinct require knowledge of the
exact values of every output column all expressions must necessarily be
evaluated before limit.

If you want to pick just 10 rows and then process them you need a subquery.

David J.






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Can-simplify-limit-1-with-slow-function-tp5809997p5810297.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.



Re: Can simplify 'limit 1' with slow function?

От
Tom Lane
Дата:
David G Johnston <david.g.johnston@gmail.com> writes:
> Martijn van Oosterhout wrote
>> I'm probably dense, but I'm not sure I understand. Or it is that the
>> slowfunction() is called prior to the sort? That seems insane.

> The basic reality is that limit applies to the final set of rows that could
> be output.

It's not so much the limit as that the sort has to happen before the
limit, and yes, evaluation of the targetlist happens before the sort.

This is fundamental to the SQL conceptual model; remember that SQL92 had
"SELECT slowfunction(), ... ORDER BY 1", which certainly requires the
function to be evaluated before the sort happens.  And there's nothing in
the conceptual model suggesting that different targetlist entries should
be evaluated at different times, so just ordering by something other than
the slowfunction() entry doesn't get you out of that.

I'm not sure how much of this there is chapter and verse for in the
SQL standard, but ISTM the stage sequencing we lay out in our SELECT
reference page is pretty much forced by the standard.
        regards, tom lane



Re: Can simplify 'limit 1' with slow function?

От
Martijn van Oosterhout
Дата:
On Wed, Jul 02, 2014 at 04:17:13PM -0400, Tom Lane wrote:
> David G Johnston <david.g.johnston@gmail.com> writes:
> > Martijn van Oosterhout wrote
> >> I'm probably dense, but I'm not sure I understand. Or it is that the
> >> slowfunction() is called prior to the sort? That seems insane.
>
> > The basic reality is that limit applies to the final set of rows that could
> > be output.
>
> It's not so much the limit as that the sort has to happen before the
> limit, and yes, evaluation of the targetlist happens before the sort.

I guess I assumed the column c was indexable, and it that case I
beleive the slowfunction() would indeed only be called once.

> This is fundamental to the SQL conceptual model; remember that SQL92 had
> "SELECT slowfunction(), ... ORDER BY 1", which certainly requires the
> function to be evaluated before the sort happens.  And there's nothing in
> the conceptual model suggesting that different targetlist entries should
> be evaluated at different times, so just ordering by something other than
> the slowfunction() entry doesn't get you out of that.
>
> I'm not sure how much of this there is chapter and verse for in the
> SQL standard, but ISTM the stage sequencing we lay out in our SELECT
> reference page is pretty much forced by the standard.

In the conceptual model the limit must happen after the select. But as
an optimisation moving the evaluation above the limit node (when
possible) should always be a win.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: Can simplify 'limit 1' with slow function?

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Wed, Jul 02, 2014 at 04:17:13PM -0400, Tom Lane wrote:
>> It's not so much the limit as that the sort has to happen before the
>> limit, and yes, evaluation of the targetlist happens before the sort.

> I guess I assumed the column c was indexable, and it that case I
> beleive the slowfunction() would indeed only be called once.

There are cases where we can avoid an explicit sort step by relying on
some earlier phase of the processing pipeline to generate the rows in the
right order to begin with.  Evidently this wasn't one of them though :-(.
        regards, tom lane



Re: Can simplify 'limit 1' with slow function?

От
"gotoschool6g"
Дата:
slow query(8531 ms):
SELECT ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 40.12211338311868)')) FROM road order by id
LIMIT1;
 

explain output:
"Limit  (cost=4653.48..4653.48 rows=1 width=3612)"
"  ->  Sort  (cost=4653.48..4683.06 rows=11832 width=3612)"
"        Sort Key: id"
"        ->  Seq Scan on road  (cost=0.00..4594.32 rows=11832 width=3612)"

fast query(16ms):
select ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 40.12211338311868)')) from (SELECT shape FROM
roadorder by id  LIMIT 1) a
 

explain output:
"Subquery Scan on a  (cost=1695.48..1695.74 rows=1 width=3608)"
"  ->  Limit  (cost=1695.48..1695.48 rows=1 width=3612)"
"        ->  Sort  (cost=1695.48..1725.06 rows=11832 width=3612)"
"              Sort Key: road.id"
"              ->  Seq Scan on road  (cost=0.00..1636.32 rows=11832 width=3612)"

CREATE TABLE road
( shape geometry, id integer
)
WITH ( OIDS=FALSE
);

There are redundant call when sorting?


> On Tue, Jul 1, 2014 at 2:16 PM, Martijn van Oosterhout 
> <kleptog@svana.org> wrote: 
> > On Sun, Jun 29, 2014 at 10:05:50PM +0800, gotoschool6g wrote: 
> >> The simplified scene: 
> >> select slowfunction(s) from a order by b limit 1; 
> >> is slow than 
> >> select slowfunction(s) from (select s from a order by b limit 1) as z; 
> >> if there are many records in table 'a'. 
> >> 
> >> 
> >> The real scene. Function  ST_Distance_Sphere is slow, the query: 
> >> SELECT ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from road order by c limit 1; 
> >> is slow than: 
> >> select ST_Distance_Sphere(s, ST_GeomFromText('POINT(1 1)')) from (SELECT s from road order by c limit 1) as a; 
> >> There are about 7000 records in 'road'. 
> > 
> > I think to help here I think we need the EXPLAIN ANALYSE output for 
> > both queries. 
>  
> Well, I think the problem is a well understood one: there is no 
> guarantee that functions-in-select-list are called exactly once per 
> output row.  This is documented -- for example see here: 
> http://www.postgresql.org/docs/9.1/static/explicit-locking.html#ADVISORY-LOCKS. 
> In short, if you want very precise control of function evaluation use 
> a subquery, or, if you're really paranoid, a CTE. 

I'm probably dense, but I'm not sure I understand. Or it is that the 
slowfunction() is called prior to the sort? That seems insane. 

Have a nice day, 
--  
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/ 
> He who writes carelessly confesses thereby at the very outset that he does 
> not attach much importance to his own thoughts.   -- Arthur Schopenhauer

Re: Can simplify 'limit 1' with slow function?

От
Martijn van Oosterhout
Дата:
Fascinating.

On Fri, Jul 04, 2014 at 10:47:06AM +0800, gotoschool6g wrote:
> slow query(8531 ms):
> SELECT ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 40.12211338311868)')) FROM road order by id
LIMIT1; 
>
> explain output:
> "Limit  (cost=4653.48..4653.48 rows=1 width=3612)"
> "  ->  Sort  (cost=4653.48..4683.06 rows=11832 width=3612)"
> "        Sort Key: id"
> "        ->  Seq Scan on road  (cost=0.00..4594.32 rows=11832 width=3612)"
>
> fast query(16ms):
> select ST_Distance_Sphere(shape,ST_GeomFromText('POINT(116.41386186784513 40.12211338311868)')) from (SELECT shape
FROMroad order by id  LIMIT 1) a 
>
> explain output:
> "Subquery Scan on a  (cost=1695.48..1695.74 rows=1 width=3608)"
> "  ->  Limit  (cost=1695.48..1695.48 rows=1 width=3612)"
> "        ->  Sort  (cost=1695.48..1725.06 rows=11832 width=3612)"
> "              Sort Key: road.id"
> "              ->  Seq Scan on road  (cost=0.00..1636.32 rows=11832 width=3612)"

So Postgres knows perfectly well that it's expensive, it just doesn't
appear to understand it has the option of moving the calculation above
the limit.

In this case though, it seems an index on road(id) would make it
instant in any case.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer