Обсуждение: Returning the total number of rows as a separate column when using limit

Поиск
Список
Период
Сортировка

Returning the total number of rows as a separate column when using limit

От
Andreas Joseph Krogh
Дата:
This is a re-post as I didn't get any response last time
========================================================
Hi.

AFAICS the information about the *total* number of rows is in the "result"
somehow. When I execute a "limit 1" query with EXPLAIN ANALYZE, I se the
total number of columns in "rows=200819", so the information is there.

andreak=# EXPLAIN ANALYZE select p.id from onp_crm_person p order by p.created
DESC limit 1;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------Limit
(cost=0.00..0.04 rows=1 width=12) (actual time=0.046..0.048 rows=1  
loops=1)  ->  Index Scan Backward using origo_person_created_idx on onp_crm_person p
(cost=0.00..8396.45 rows=200819 width=12) (actual time=0.041..0.041 rows=1
loops=1)Total runtime: 0.104 ms
(3 rows)

Is it possible to use some sort of "magic" function to get this number out as
a separate column? And is this number accurate? It has to be the same as
running a separate "count(*)"-query to count the totals, which is exactly
what I'm trying to avoid.

Oracle has a special rownum and over() which can be used to accomplish this:

SELECT tmp.*, max(rownum) over() as total_countFROM (subquery) tmp

Does PG have any equivalent way?

--
Andreas Joseph Krogh <andreak@officenet.no>
Senior Software Developer / Manager
------------------------+---------------------------------------------+
OfficeNet AS            | The most difficult thing in the world is to |
Karenslyst Allé 11      | know how to do a thing and to watch         |
PO. Box 529 Skøyen      | somebody else doing it wrong, without       |
0214 Oslo               | comment.                                    |
NORWAY                  |                                             |
Tlf:    +47 24 15 38 90 |                                             |
Fax:    +47 24 15 38 91 |                                             |
Mobile: +47 909  56 963 |                                             |
------------------------+---------------------------------------------+


Re: Returning the total number of rows as a separate column when using limit

От
Tom Lane
Дата:
Andreas Joseph Krogh <andreak@officenet.no> writes:
> AFAICS the information about the *total* number of rows is in the "result" 
> somehow. When I execute a "limit 1" query with EXPLAIN ANALYZE, I se the 
> total number of columns in "rows=200819", so the information is there.

That's only an estimate.  Since the query doesn't get executed to
completion thanks to the LIMIT, Postgres really has no idea whether
the estimate is accurate.
        regards, tom lane


Re: Returning the total number of rows as a separate column when using limit

От
Andreas Joseph Krogh
Дата:
On Monday 05 November 2007 15:18:22 Tom Lane wrote:
> Andreas Joseph Krogh <andreak@officenet.no> writes:
> > AFAICS the information about the *total* number of rows is in the
> > "result" somehow. When I execute a "limit 1" query with EXPLAIN ANALYZE,
> > I se the total number of columns in "rows=200819", so the information is
> > there.
>
> That's only an estimate.  Since the query doesn't get executed to
> completion thanks to the LIMIT, Postgres really has no idea whether
> the estimate is accurate.

Ok. The query is ORDER-ed, but you're saying that it doesn't matter and PG
still doesn't have to know the total numbers even if it has to sort the
result?

Is there a way to perform the LIMIT-query only once and still be able to
extract the total number of rows from some "magic function", like
Oracle's "over()" analytic function? I will accept a simple "no" if that is
the case, but if it is possible (like it is in Oracle) I would appreciate to
know how to do it.

--
Andreas Joseph Krogh <andreak@officenet.no>
Senior Software Developer / Manager
------------------------+---------------------------------------------+
OfficeNet AS            | The most difficult thing in the world is to |
Karenslyst Allé 11      | know how to do a thing and to watch         |
PO. Box 529 Skøyen      | somebody else doing it wrong, without       |
0214 Oslo               | comment.                                    |
NORWAY                  |                                             |
Tlf:    +47 24 15 38 90 |                                             |
Fax:    +47 24 15 38 91 |                                             |
Mobile: +47 909  56 963 |                                             |
------------------------+---------------------------------------------+


Re: Returning the total number of rows as a separate column when using limit

От
Tom Lane
Дата:
Andreas Joseph Krogh <andreak@officenet.no> writes:
> On Monday 05 November 2007 15:18:22 Tom Lane wrote:
>> That's only an estimate.  Since the query doesn't get executed to
>> completion thanks to the LIMIT, Postgres really has no idea whether
>> the estimate is accurate.

> Ok. The query is ORDER-ed, but you're saying that it doesn't matter and PG 
> still doesn't have to know the total numbers even if it has to sort the 
> result?

If there were a sort then the sort node would know how many rows it had
sorted, but if you've got a small limit that's certainly not the plan
type you'd prefer.

The bottom line is that there is no free lunch.  If you want an exact
row count you have to execute the whole query, and it's gonna cost you.
If you're willing to settle for an approximation, the usual thing is
to EXPLAIN the query and dredge the row estimate out of that.

create function estimate_rows(qry text) returns float8 as $$
declare r text;
begin for r in execute 'explain ' || qry loop   if substring(r from 'rows=[0-9]') is not null then     return
substring(rfrom 'rows=([0-9]+)');   end if; end loop; return null;
 
end$$ language plpgsql strict;
        regards, tom lane


Re: Returning the total number of rows as a separate column when using limit

От
Gregory Stark
Дата:
> Andreas Joseph Krogh <andreak@officenet.no> writes:
>> On Monday 05 November 2007 15:18:22 Tom Lane wrote:
>>> That's only an estimate.  Since the query doesn't get executed to
>>> completion thanks to the LIMIT, Postgres really has no idea whether
>>> the estimate is accurate.
>
>> Ok. The query is ORDER-ed, but you're saying that it doesn't matter and PG 
>> still doesn't have to know the total numbers even if it has to sort the 
>> result?

At a guess you're displaying pages of information and want to display
something like "displaying 1-10 of 150" ?

Postgres is kind of lacking a solution for this problem. The last time I
needed to do this I bit the bullet and ran the query twice, once with a
"select count(*) from (...)" around it and once with "select * from (...)
order by x offset n limit m" around it. The second time runs relatively
quickly since all the raw data is in cache.

The "right" way to do this would probably be to have a temporary table which
you populate in one step, perform the count(*) on in a second query, then
select the page of data with the ordering in a third query. Then you can keep
the data around for some limited amount of time in case the user accesses a
second page. But this requires some infrastructure to keep track of the cached
data and what query it corresponded to and determine when to replace it with
new data or drop it.

However Postgres temporary tables are fairly expensive and if you're creating
them for every web access you're going to have to vacuum the system catalogs
quite frequently. They're not really well suited for this task.

Alternatively you could create a cursor and play with that. But I don't think
that's a great solution either. (yet? I think cursors are getting more useful
in Postgres, perhaps it will be eventually.)

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!


Re: Returning the total number of rows as a separate column when using limit

От
Andreas Joseph Krogh
Дата:
On Monday 05 November 2007 16:27:03 Gregory Stark wrote:
> > Andreas Joseph Krogh <andreak@officenet.no> writes:
> >> On Monday 05 November 2007 15:18:22 Tom Lane wrote:
> >>> That's only an estimate.  Since the query doesn't get executed to
> >>> completion thanks to the LIMIT, Postgres really has no idea whether
> >>> the estimate is accurate.
> >>
> >> Ok. The query is ORDER-ed, but you're saying that it doesn't matter and
> >> PG still doesn't have to know the total numbers even if it has to sort
> >> the result?
>
> At a guess you're displaying pages of information and want to display
> something like "displaying 1-10 of 150" ?

Exactly:-)

> Postgres is kind of lacking a solution for this problem. The last time I
> needed to do this I bit the bullet and ran the query twice, once with a
> "select count(*) from (...)" around it and once with "select * from (...)
> order by x offset n limit m" around it. The second time runs relatively
> quickly since all the raw data is in cache.

That's what I'm doing now. I run the query with "limit+1" as limit and if it
results in more than limit, I know there is more data and I run count(*) to
count them all. But count(*) cannot use indices in PG so it's limited in
speed anyway AFAICS.

> The "right" way to do this would probably be to have a temporary table
> which you populate in one step, perform the count(*) on in a second query,
> then select the page of data with the ordering in a third query. Then you
> can keep the data around for some limited amount of time in case the user
> accesses a second page. But this requires some infrastructure to keep track
> of the cached data and what query it corresponded to and determine when to
> replace it with new data or drop it.
>
> However Postgres temporary tables are fairly expensive and if you're
> creating them for every web access you're going to have to vacuum the
> system catalogs quite frequently. They're not really well suited for this
> task.
>
> Alternatively you could create a cursor and play with that. But I don't
> think that's a great solution either. (yet? I think cursors are getting
> more useful in Postgres, perhaps it will be eventually.)

I really hoped there was an "Oracle over()" equivalent way in PG. I understand
that Oracle's LIMIT-hack with "3 subselects and rownum between 1 AND 20" is
rather expensive compared to PG's implementation of LIMIT. Oralce keeps
snapshot-info in the index, so counting only involves the index AFAIK.

--
Andreas Joseph Krogh <andreak@officenet.no>
Senior Software Developer / Manager
------------------------+---------------------------------------------+
OfficeNet AS            | The most difficult thing in the world is to |
Karenslyst Allé 11      | know how to do a thing and to watch         |
PO. Box 529 Skøyen      | somebody else doing it wrong, without       |
0214 Oslo               | comment.                                    |
NORWAY                  |                                             |
Tlf:    +47 24 15 38 90 |                                             |
Fax:    +47 24 15 38 91 |                                             |
Mobile: +47 909  56 963 |                                             |
------------------------+---------------------------------------------+


Re: Returning the total number of rows as a separate column when using limit

От
Gregory Stark
Дата:
"Andreas Joseph Krogh" <andreak@officenet.no> writes:

> That's what I'm doing now. I run the query with "limit+1" as limit and if it 
> results in more than limit, I know there is more data and I run count(*) to 
> count them all. But count(*) cannot use indices in PG so it's limited in 
> speed anyway AFAICS.

Well count(*) can use indexes the same as the query can.

> I really hoped there was an "Oracle over()" equivalent way in PG. I understand 
> that Oracle's LIMIT-hack with "3 subselects and rownum between 1 AND 20" is 
> rather expensive compared to PG's implementation of LIMIT. Oralce keeps 
> snapshot-info in the index, so counting only involves the index AFAIK.

Well that's only going to be true if the index satisfies the whole query which
is not going to be true for the simplest cases.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!


Re: Returning the total number of rows as a separate column when using limit

От
Gregory Stark
Дата:
"Gregory Stark" <stark@enterprisedb.com> writes:

> "Andreas Joseph Krogh" <andreak@officenet.no> writes:
>
>> That's what I'm doing now. I run the query with "limit+1" as limit and if it 
>> results in more than limit, I know there is more data and I run count(*) to 
>> count them all. But count(*) cannot use indices in PG so it's limited in 
>> speed anyway AFAICS.
>
> Well count(*) can use indexes the same as the query can.
>
>> I really hoped there was an "Oracle over()" equivalent way in PG. I understand 
>> that Oracle's LIMIT-hack with "3 subselects and rownum between 1 AND 20" is 
>> rather expensive compared to PG's implementation of LIMIT. Oralce keeps 
>> snapshot-info in the index, so counting only involves the index AFAIK.
>
> Well that's only going to be true if the index satisfies the whole query which
> is not going to be true for the simplest cases.

er, *except* for the simplest cases.


--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!