Обсуждение: Can I get the number of results plus the results with a single query?

Поиск
Список
Период
Сортировка

Can I get the number of results plus the results with a single query?

От
Perry Smith
Дата:
I like to have what I call “baby sitting” messages such as “Completed 15 out of 1023”.  To do this, I need the number
ofresults a query returns but I also need the results. 

Currently I’m doing this with two queries such as:

    SELECT COUNT(*) FROM table WHERE …. expression …
    SELECT * FROM table WHERE …. expression …

But this requires two queries.  Is there a way to do the same thing with just one query somehow?

I’ve been toying with row_number() and then sort by row_number descending and pick off the first row as the total
number. The problem is that mucks with the order of the original query because of the sort on row_number.  I’ve even
thoughtabout reversing the order and then reversing the order a second time but that seems to be getting more than a
littleabsurd. 

Thank you for your time,
Perry


Вложения

Re: Can I get the number of results plus the results with a single query?

От
"David G. Johnston"
Дата:
On Monday, August 15, 2022, Perry Smith <pedz@easesoftware.com> wrote:
I’ve been toying with row_number() and then sort by row_number descending and pick off the first row as the total number. 

Use count as a window function.

David J.
 

Re: Can I get the number of results plus the results with a single query?

От
"Peter J. Holzer"
Дата:
On 2022-08-15 08:49:33 -0500, Perry Smith wrote:
> I like to have what I call “baby sitting” messages such as “Completed
> 15 out of 1023”.  To do this, I need the number of results a query
> returns but I also need the results.
>
> Currently I’m doing this with two queries such as:
>
>     SELECT COUNT(*) FROM table WHERE …. expression …
>     SELECT * FROM table WHERE …. expression …
>
> But this requires two queries.

And unless evaluating/transmitting the select expressions is very
expensive the first one is unlikely to be much faster than the second.
So you've just halved your performance.

> Is there a way to do the same thing with just one query somehow?

I don't think there can be. You don't know how many results there will
be until you have evaluated the query, but when you have done that you
already have all the results so displaying the baby sitting messages or
a progress bar becomes pointless.


> I’ve been toying with row_number() and then sort by row_number
> descending and pick off the first row as the total number.

I think that this will usually result in a long pause at the start
(while the query is evaluated) followed by a very quick count up (while
the result is transmitted to the client). Probably not what you want.

        hp


--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Can I get the number of results plus the results with a single query?

От
Bryn Llewellyn
Дата:
pedz@easesoftware.com wrote:

Currently I’m doing this with two queries such as:

    SELECT COUNT(*) FROM table WHERE …. expression …
    SELECT * FROM table WHERE …. expression …

But this requires two queries. Is there a way to do the same thing with just one quer?


Use count as a window function.


I don't think there can be [a way to do the same thing with just one query].

How about this:

create table t(k serial primary key, v int not null);
insert into t(v) values (7), (19), (42), (57), (100), (200), (300);

create function f()
  returns table(z text)
  language plpgsql
  stable
as $body$
declare
  r int not null := 0;
  results constant int[] :=
    (
      select array_agg(v order by v) from t where v < 100
    );
begin
  z := 'Count(*): '||cardinality(results);    return next;

  foreach r in array results loop
    z := r::int;                              return next;
  end loop;
end;
$body$;

select f();

It suns without error and gets this result:

 Count(*): 4
 7
 19
 42
 57

Is this what you want, Perry? B.t.w., your second "count(*)" might give a misleading answer unless you use "repeatable read" or "serializable"—and it's generally a good plan to avoid those isolation levels unless your use case forces no other choice.

I've never heard anybody say that this "bulk collect" approach (using Oracle Database's vocabulary) harms performance. Does anybody think that it might?


Re: ***SPAM*** Re: Can I get the number of results plus the results with a single query?

От
Perry Smith
Дата:


On Aug 15, 2022, at 08:55, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Monday, August 15, 2022, Perry Smith <pedz@easesoftware.com> wrote:
I’ve been toying with row_number() and then sort by row_number descending and pick off the first row as the total number. 

Use count as a window function.

I see others are commenting after David’s update so:

Thank you David.

This seems to work for me:

SELECT count(*) OVER (), id, basename, sha1 FROM dateien WHERE (lower(ext) in ( 'pxd' ) and ftype = 'file') ORDER BY sha1;

This has, e.g. 73, in the first column for all of the rows.

Вложения

Re: ***SPAM*** Re: Can I get the number of results plus the results with a single query?

От
Rob Sargent
Дата:
On 8/15/22 14:37, Perry Smith wrote:


On Aug 15, 2022, at 08:55, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Monday, August 15, 2022, Perry Smith <pedz@easesoftware.com> wrote:
I’ve been toying with row_number() and then sort by row_number descending and pick off the first row as the total number. 

Use count as a window function.

I see others are commenting after David’s update so:

Thank you David.

This seems to work for me:

SELECT count(*) OVER (), id, basename, sha1 FROM dateien WHERE (lower(ext) in ( 'pxd' ) and ftype = 'file') ORDER BY sha1;

This has, e.g. 73, in the first column for all of the rows.

Any comparative timing statistics on that?  Especially on more than 73 records returned, because with that few just grab them all and get size() or length of what ever collection mechanism you're playing with.

Re: ***SPAM*** Re: Can I get the number of results plus the results with a single query?

От
"Walter Dörwald"
Дата:

On 16 Aug 2022, at 0:13, Rob Sargent wrote:

On 8/15/22 14:37, Perry Smith wrote:


On Aug 15, 2022, at 08:55, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Monday, August 15, 2022, Perry Smith <pedz@easesoftware.com> wrote:
I’ve been toying with row_number() and then sort by row_number descending and pick off the first row as the total number. 

Use count as a window function.

I see others are commenting after David’s update so:

Thank you David.

This seems to work for me:

SELECT count(*) OVER (), id, basename, sha1 FROM dateien WHERE (lower(ext) in ( 'pxd' ) and ftype = 'file') ORDER BY sha1;

This has, e.g. 73, in the first column for all of the rows.

Any comparative timing statistics on that?  Especially on more than 73 records returned, because with that few just grab them all and get size() or length of what ever collection mechanism you're playing with.

I tried with a larger table (739951 records):

select e.* from email.email e;

takes 50 seconds (as displayed by TablePlus).

select count(*) over (), e.* from email.email e;

takes 58 seconds.

And doing select count(*) from email.email e; takes 2-3 seconds.

Note that in this example the records where fetched over the internet (i.e. not from a local Postgres installation) and there is no where condition that must be evaluated repeatedly, so other variants might give better numbers.

Servus,
Walter

Re: ***SPAM*** Re: Can I get the number of results plus the results with a single query?

От
David Rowley
Дата:
On Tue, 16 Aug 2022 at 21:15, Walter Dörwald <walter@livinglogic.de> wrote:
> select count(*) over (), e.* from email.email e;

Depending on the complexity of the query, putting the count(*) as a
subquery in the SELECT clause might execute more quickly. i.e. select
(select count(*) from email.email) c, * from email.emails; A
non-correlated subquery will become an "initplan", which will execute
only once.

The reason the window aggregate might be slower is due to the fact
that internally PostgreSQL will store tuples in the window frame in a
tuplestore.  In this example, all rows will be put in that tuple store
at once. These tuple stores will spill to disk when they exceed
work_mem.  On the other hand, the window aggregate version could be
faster if fetching each tuple from the outer query was sufficiently
more expensive than storing it in a tuple store. That could easily
happen if the query contained expensive joins or many rows were
filtered out. That could be more expensive as the subquery version
would have to do that twice whereas the window agg version would only
have to do that once.

David



Re: Can I get the number of results plus the results with a single query?

От
"Peter J. Holzer"
Дата:
On 2022-08-15 12:20:44 -0700, Bryn Llewellyn wrote:
>
>             pedz@easesoftware.com wrote:
>
>             Currently I’m doing this with two queries such as:
>
>                 SELECT COUNT(*) FROM table WHERE …. expression …
>                 SELECT * FROM table WHERE …. expression …
>
>             But this requires two queries. Is there a way to do the same thing
>             with just one quer?
>
>
>         david.g.johnston@gmail.com wrote:
>
>         Use count as a window function.
>
>
>     hjp-pgsql@hjp.at wrote:
>
>     I don't think there can be [a way to do the same thing with just one
>     query].

That's not quite what I meant. I meant "I don't think there can be what
you want with just one query",


> How about this:
>
> create table t(k serial primary key, v int not null);
> insert into t(v) values (7), (19), (42), (57), (100), (200), (300);
>
> create function f()
>   returns table(z text)
>   language plpgsql
>   stable
> as $body$
> declare
>   r int not null := 0;
>   results constant int[] :=
>     (
>       select array_agg(v order by v) from t where v < 100
>     );
> begin
>   z := 'Count(*): '||cardinality(results);    return next;

I may be missing something but I don't see how this solves the problem.

The OP wants some kind of progress indicator. To be useful, such an
indicator should be approximately linear in time. I.e. if your query
returns 10000 rows in 5 minutes (yes, that's slow, but you don't need a
progress bar for fast queries), it should display "0/10000" after 0
seconds, "33/10000" after 1 second, "2000/10000" after 1 minute, etc.
That ideal is of course unrealistic, it's quite ok if it displays
"0/unknown" fpr a few seconds and speeds up and slows down during
execution. But if it displays "0/unknown" for 4 minutes and 55 seconds
and then counts up to 10000 during the last 5 seconds, the progress
indicator is useless.

You are stuffing the whole result into an array and THEN counting the
number of elements. So when you get to the count all of the work (except
sending the result to the client) is already done, so there is little
point in displaying a progress indicator.

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Can I get the number of results plus the results with a single query?

От
Bryn Llewellyn
Дата:
hjp-pgsql@hjp.at wrote:

That's not quite what I meant. I meant "I don't think there can be what you want with just one query",

The OP wants some kind of progress indicator. To be useful, such an indicator should be approximately linear in time. I.e. if your query returns 10000 rows in 5 minutes (yes, that's slow, but you don't need a progress bar for fast queries), it should display "0/10000" after 0 seconds, "33/10000" after 1 second, "2000/10000" after 1 minute, etc. That ideal is of course unrealistic, it's quite ok if it displays "0/unknown" fpr a few seconds and speeds up and slows down during execution. But if it displays "0/unknown" for 4 minutes and 55 seconds and then counts up to 10000 during the last 5 seconds, the progress
indicator is useless.

You are stuffing the whole result into an array and THEN counting the number of elements. So when you get to the count all of the work (except sending the result to the client) is already done, so there is little point in displaying a progress indicator.

I see, Peter. You’d read the OP’s mind. But I’d failed to. I saw the subject, you I assumed that the OP wanted the entire result set together with the count of the results. (After all, there’s no inflexions of “page” in the OP’s question.)

It seems, then, that the requirement is to page through results like a Google search for « SQL injection » does. The first page shows this

« About 29,400,000 results (0.45 seconds) »

And then it lets you choose other pages to see. That result of about 30 million is famously approximate. After all, nobody, would able be to disprove its correctness. And paging through the results from a SQL query in a stateless browser client has its own (in)famous characteristics.

It sounds like the OP wants a fast approximate count for a query whose restriction isn’t known until runtime. (His example uses just a single table). The planner makes estimates like this. Might something along these lines provide what’s being sought:

create table t(k int primary key);
insert into t(k) select generate_series(1, 1000000);
explain (costs true) select * from t where k between 20000 and 20050;

This is what it produced for me:

 Index Only Scan using t_pkey on t  (cost=0.42..5.39 rows=48 width=4)
   Index Cond: ((k >= 20000) AND (k <= 20050))

Seems to me that "rows=48” is a perfectly reasonable estimate…

Re: Can I get the number of results plus the results with a single query?

От
"Peter J. Holzer"
Дата:
On 2022-08-16 14:42:48 -0700, Bryn Llewellyn wrote:
>     hjp-pgsql@hjp.at wrote:
>     The OP wants some kind of progress indicator. To be useful, such
>     an indicator should be approximately linear in time. I.e. if your
[...]
>
>
> I see, Peter. You’d read the OP’s mind.

Not much mind-reading involved, I hope. The first sentence in the
message was:

| I like to have what I call “baby sitting” messages such as “Completed 15 out of 1023”.

That's what I would call a "progress indicator". Such things were
already common when I first got involved with computers almost 40 years
ago, so there isn't much to guess or to invent.

> But I’d failed to. I saw the subject, you I assumed that the OP wanted
> the entire result set together with the count of the results. (After
> all, there’s no inflexions of “page” in the OP’s question.)

I don't think his question was about paging. That's a different although
related topic.

> It sounds like the OP wants a fast approximate count for a query whose
> restriction isn’t known until runtime.

Maybe an approximate count would be enough, but he didn't say so.

(He did later clarify that he's fetching a lot of data for each row, so
«select count(*) ,,,» is indeed much faster than «select * ...» due to
the sheer amount of data to be transferred. That wasn't obvious from his
first message, but I hedged against the possibility in my answer.)

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Can I get the number of results plus the results with a single query?

От
Wim Bertels
Дата:
Perry Smith schreef op ma 15-08-2022 om 08:49 [-0500]:
> I like to have what I call “baby sitting” messages such as “Completed
> 15 out of 1023”.  To do this, I need the number of results a query
> returns but I also need the results.
> 
> Currently I’m doing this with two queries such as:
> 
>     SELECT COUNT(*) FROM table WHERE …. expression …
>     SELECT * FROM table WHERE …. expression …
> 
> But this requires two queries.  Is there a way to do the same thing
> with just one query somehow?
> 
> I’ve been toying with row_number() and then sort by row_number
> descending and pick off the first row as the total number.  

if i understand your question:

count(*) over ()

(u can use aggregate functions as well, not only window functions)



mvg,
Wim