Обсуждение: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?

Поиск
Список
Период
Сортировка

SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?

От
Grzegorz Jaskiewicz
Дата:
so I wonder, quite few times ppl asked me about my-word database TOP %  
PERCENT (and I guess ms-word db has it too).
Now that postgres has limit(subselect) - postgresql can do the same  
thing. But, using a count(*) in subselects isn't very efficient. The  
backend gets information from limit X, so I figure it shouldn't be a  
very big deal to add LIMIT %d PERCENT , and pass it on as percentage,  
instead of just number of rows.
I can give it a go and try to implement it myself, but wanted to ask  
here first for opinion on syntax (I have no idea what would be right  
syntax by sql standards), and how keen would you guys be on accepting  
such patch.

thanks.


Re: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?

От
"Hitoshi Harada"
Дата:
2008/10/12 Grzegorz Jaskiewicz <gj@pointblue.com.pl>:
> so I wonder, quite few times ppl asked me about my-word database TOP %
> PERCENT (and I guess ms-word db has it too).
> Now that postgres has limit(subselect) - postgresql can do the same thing.
> But, using a count(*) in subselects isn't very efficient. The backend gets
> information from limit X, so I figure it shouldn't be a very big deal to add
> LIMIT %d PERCENT , and pass it on as percentage, instead of just number of
> rows.
> I can give it a go and try to implement it myself, but wanted to ask here
> first for opinion on syntax (I have no idea what would be right syntax by
> sql standards), and how keen would you guys be on accepting such patch.
>
> thanks.
>
>

I'm afraid you will not succeed it because PostgreSQL doesn't know how
many rows are totally returned until it returns the last row, which is
as ineficient as count(*) as you point out.

Regards,


-- 
Hitoshi Harada


Re: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?

От
Grzegorz Jaskiewicz
Дата:
that's a shame.
I figure, with the "WITH ... AS .." you can kind of speed up counts,  
just by doing it once - now. But still, it would be great to be able  
to use PERCENT, against estimate.
After all, imagine a table FOO with 5 rows, and something like this:

WITH c AS (select count(*)/4 as n from FOO) select * from FOO limit  
(select n from c) offset (select n*X from c);

now X is for every 25%, but you wan't get to see last row if you use  
offset 75 percent (X=3). Which kind of puts me off the idea of using  
PERCENT anyway, because it isn't quite accurate.

So thanks for answer, I'll pass on this one. Hopefully there would be  
another chance for me to write some patch in future.

ta.


Re: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?

От
"Robert Haas"
Дата:
I wonder if this could be implemented using the window-function
infrastructure...

...Robert

On Sun, Oct 12, 2008 at 9:07 AM, Grzegorz Jaskiewicz
<gj@pointblue.com.pl> wrote:
> that's a shame.
> I figure, with the "WITH ... AS .." you can kind of speed up counts, just by
> doing it once - now. But still, it would be great to be able to use PERCENT,
> against estimate.
> After all, imagine a table FOO with 5 rows, and something like this:
>
> WITH c AS (select count(*)/4 as n from FOO) select * from FOO limit (select
> n from c) offset (select n*X from c);
>
> now X is for every 25%, but you wan't get to see last row if you use offset
> 75 percent (X=3). Which kind of puts me off the idea of using PERCENT
> anyway, because it isn't quite accurate.
>
> So thanks for answer, I'll pass on this one. Hopefully there would be
> another chance for me to write some patch in future.
>
> ta.
>
>


Re: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?

От
"Hitoshi Harada"
Дата:
2008/10/12 Robert Haas <robertmhaas@gmail.com>:
> I wonder if this could be implemented using the window-function
> infrastructure...
>

Yeah, actually window functions I am working now has percent_rank() or
something he wants. That is better than WITH clauses but still we must
see the last row before determine the percent. Better than count(*) of
course, but I'm not sure it meets his needs.

Regards,

-- 
Hitoshi Harada


Re: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?

От
Grzegorz Jaskiewicz
Дата:
On 2008-10-12, at 16:22, Hitoshi Harada wrote:

> 2008/10/12 Robert Haas <robertmhaas@gmail.com>:
>> I wonder if this could be implemented using the window-function
>> infrastructure...
>>
>
> Yeah, actually window functions I am working now has percent_rank() or
> something he wants. That is better than WITH clauses but still we must
> see the last row before determine the percent. Better than count(*) of
> course, but I'm not sure it meets his needs.
>
excuse my ignorance, apparently I don't recall that patches discussion  
in here. If it pops out, and gets into cvs, can someone remind me of  
that - please ?
Also, will that patch also be able to speed up count(*) , or is it for  
different purpose ? Sorry if I am repeating the subject, if so -  
please just respond in private.

thanks folks.



Re: SELECT TOP %d PERCENT, or SELECT ... LIMIT %d PERCENT ?

От
"Hitoshi Harada"
Дата:
2008/10/13 Grzegorz Jaskiewicz <gj@pointblue.com.pl>:
>
> On 2008-10-12, at 16:22, Hitoshi Harada wrote:
>
>> 2008/10/12 Robert Haas <robertmhaas@gmail.com>:
>>>
>>> I wonder if this could be implemented using the window-function
>>> infrastructure...
>>>
>>
>> Yeah, actually window functions I am working now has percent_rank() or
>> something he wants. That is better than WITH clauses but still we must
>> see the last row before determine the percent. Better than count(*) of
>> course, but I'm not sure it meets his needs.
>>
> excuse my ignorance, apparently I don't recall that patches discussion in
> here. If it pops out, and gets into cvs, can someone remind me of that -
> please ?

You can track pgsql news on David Fetter's blog:
http://people.planetpostgresql.org/dfetter/

> Also, will that patch also be able to speed up count(*) , or is it for
> different purpose ? Sorry if I am repeating the subject, if so - please just
> respond in private.

I guess cume_dist() will do for you:

SELECT id, cume_dist
FROM( SELECT id, cume_dist() OVER (ORDER BY id) FROM foo
)s
WHERE cume_dist < 0.3

will produce a result with id of 30% ascending order. I am working on
this window functions infrastructure to get into 8.4 but we're still
not sure if it will.

You can google "window functions" or my design docs on
http://umitanuki.net/pgsql/wfv06/design.html to see what it is for.


Regards,


-- 
Hitoshi Harada