Обсуждение: How to get the total number of rows returned by query when using a cursor?

Поиск
Список
Период
Сортировка

How to get the total number of rows returned by query when using a cursor?

От
"Egor Shipovalov"
Дата:
I'm implementing paging through search results using cursors. Is there a
better way to know total number of rows under a cursor than running a
separate COUNT(*) query? I think PostgreSQL is bound to know this number
after the first FETCH, isn't it?

On a side note, why queries using LIMIT are SO terribly slow, compared to
cursors and sometimes even ones without LIMIT? Shouldn't LIMIT be internally
implemented using cursor mechanism then?

Best regards,
Egor Shipovalov.


Re: How to get the total number of rows returned by query

От
Shridhar Daithankar
Дата:
Egor Shipovalov wrote:

> I'm implementing paging through search results using cursors. Is there a
> better way to know total number of rows under a cursor than running a
> separate COUNT(*) query? I think PostgreSQL is bound to know this number
> after the first FETCH, isn't it?

Using libpq, yes. You can use PQntuples. See
http://developer.postgresql.org/docs/postgres/libpq-exec.html#LIBPQ-EXEC-SELECT-INFO

Unfortunately this will retrieve the entire resultselt before returning the
control. When libpq will have chuncking of data, this will be much more
flexible, as you are expecting it now, I guess.

  Shridhar



Re: How to get the total number of rows returned by query

От
"Egor Shipovalov"
Дата:
Thanks. In fact, I've thought about that too, but it doesn't seem to work.
I'm using Perl Pg module which, I'm guessing, is just a wrapper around
libpq. This is the code I used to test it:

---------------
use Pg;

$conn = Pg::connectdb('user=my_user dbname=my_db');
$conn->exec('BEGIN');
$r = $conn->exec("DECLARE my_cursor CURSOR FOR SELECT nick FROM users ORDER
BY nick DESC");

print $r->ntuples; # prints 0

$r = $conn->exec('FETCH 10 FROM my_cursor');

print $r->ntuples; # prints 10
--------------

Too bad we seem to be stuck with COUNT(*)  for now. Are there any
fundamental problems getting to that cursor count data?
Judging from the lack of absolute cursor positioning it appears somewhat
like so ;(

Best regards,
Egor Shipovalov.


> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Shridhar
> Daithankar
> Sent: Tuesday, September 23, 2003 18:59
> To: Pgsql-General
> Subject: Re: [GENERAL] How to get the total number of rows
> returned by query
>
>
> Egor Shipovalov wrote:
>
> > I'm implementing paging through search results using cursors. Is there a
> > better way to know total number of rows under a cursor than running a
> > separate COUNT(*) query? I think PostgreSQL is bound to know this number
> > after the first FETCH, isn't it?
>
> Using libpq, yes. You can use PQntuples. See
> http://developer.postgresql.org/docs/postgres/libpq-exec.html#LIBP
> Q-EXEC-SELECT-INFO
>
> Unfortunately this will retrieve the entire resultselt before
> returning the
> control. When libpq will have chuncking of data, this will be much more
> flexible, as you are expecting it now, I guess.
>
>   Shridhar
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


Re: How to get the total number of rows returned by query when using a cursor?

От
Tom Lane
Дата:
"Egor Shipovalov" <pgsql_list@eonline.ru> writes:
> I'm implementing paging through search results using cursors. Is there a
> better way to know total number of rows under a cursor than running a
> separate COUNT(*) query? I think PostgreSQL is bound to know this number
> after the first FETCH, isn't it?

Why would you think that?  In general Postgres doesn't know the number
of rows until it reaches the end of the query.

> On a side note, why queries using LIMIT are SO terribly slow, compared to
> cursors and sometimes even ones without LIMIT?

Generally they're not particularly slow.  Perhaps you should show us the
EXPLAIN ANALYZE results for your problem case.

            regards, tom lane

Re: How to get the total number of rows returned by query when using a cursor?

От
"Egor Shipovalov"
Дата:
> > I'm implementing paging through search results using cursors. Is there a
> > better way to know total number of rows under a cursor than running a
> > separate COUNT(*) query? I think PostgreSQL is bound to know this number
> > after the first FETCH, isn't it?
>
> Why would you think that?  In general Postgres doesn't know the number
> of rows until it reaches the end of the query.

Sorry, I thought that because my query was sorted. For unsorted queries,
yes, there's usually no way to know until you reach the end.

> > On a side note, why queries using LIMIT are SO terribly slow,
> compared to
> > cursors and sometimes even ones without LIMIT?
>
> Generally they're not particularly slow.  Perhaps you should show us the
> EXPLAIN ANALYZE results for your problem case.

I did just that, but the plan was too complicated for me to comprehend  ;)
The interesting thing is that some queries are tens (!) of times slower with
LIMIT that without. These contain a number of joins on subselects. If you're
interested, I'll post examples.

Best regards,
Egor Shipovalov.


Re: How to get the total number of rows returned by query when using a cursor?

От
Tom Lane
Дата:
"Egor Shipovalov" <pgsql_list@eonline.ru> writes:
> The interesting thing is that some queries are tens (!) of times slower with
> LIMIT that without. These contain a number of joins on subselects. If you're
> interested, I'll post examples.

Sounds like the planner is misestimating costs and choosing the wrong plan.
Please show examples (EXPLAIN ANALYZE for same query with and without
LIMIT would be useful).

            regards, tom lane

Re: How to get the total number of rows returned by query when using a cursor?

От
"Egor Shipovalov"
Дата:
> Sounds like the planner is misestimating costs and choosing the
> wrong plan.
> Please show examples (EXPLAIN ANALYZE for same query with and without
> LIMIT would be useful).

Here it goes. This query is 6 times slower with LIMIT 10 added (I had slower
ones, but those are complex). I must say I rewrote it using subselects, thus
speeding it up in general and getting rid of the problem, but this is the
form I used on MySQL and it worked fast.

The purpose is to find users that list users 'asta' and 'furrr' as their
friends and are themselves listed as friends by the user 'chalky'.

SELECT
   users.*
  FROM
   users, friends AS f0, friends AS f1, friends AS fo0
  WHERE
   f0.nick = users.nick AND
   f1.nick = f0.nick AND
   fo0.friend_nick = f1.nick AND
   f0.friend_nick = 'asta' AND
   f1.friend_nick = 'furrr' AND
   fo0.nick = 'chalky'
;

Without LIMIT:
---------------------
 Merge Join  (cost=43404.20..45067.61 rows=1460751 width=660) (actual
time=4617.53..4633.82 rows=5 loops=1)
   Merge Cond: ("outer".nick = "inner".nick)
   ->  Merge Join  (cost=26432.36..27412.27 rows=75288 width=38) (actual
time=125.43..136.41 rows=17 loops=1)
         Merge Cond: ("outer".nick = "inner".friend_nick)
         ->  Sort  (cost=13216.18..13225.88 rows=3880 width=19) (actual
time=115.29..115.98 rows=388 loops=1)
               Sort Key: f0.nick
               ->  Index Scan using friend_nick__nick on friends f0
(cost=0.00..12984.87 rows=3880 width=19) (actual time=1.31..102.35 rows=391
loops=1)
                     Index Cond: (friend_nick = 'asta'::character varying)
         ->  Sort  (cost=13216.18..13225.88 rows=3880 width=19) (actual
time=9.76..9.94 rows=87 loops=1)
               Sort Key: fo0.friend_nick
               ->  Index Scan using nick__friend_nick on friends fo0
(cost=0.00..12984.87 rows=3880 width=19) (actual time=1.42..7.18 rows=87
loops=1)
                     Index Cond: (nick = 'chalky'::character varying)
   ->  Materialize  (cost=17089.36..17089.36 rows=3880 width=622) (actual
time=4491.16..4491.82 rows=337 loops=1)
         ->  Merge Join  (cost=16971.84..17089.36 rows=3880 width=622)
(actual time=4037.07..4485.64 rows=360 loops=1)
               Merge Cond: ("outer".nick = "inner".nick)
               ->  Sort  (cost=13216.18..13225.88 rows=3880 width=19)
(actual time=133.72..135.02 rows=448 loops=1)
                     Sort Key: f1.nick
                     ->  Index Scan using friend_nick__nick on friends f1
(cost=0.00..12984.87 rows=3880 width=19) (actual time=1.30..118.64 rows=448
loops=1)
                           Index Cond: (friend_nick = 'furrr'::character
varying)
               ->  Sort  (cost=3755.66..3780.47 rows=9923 width=603) (actual
time=3902.67..4109.18 rows=9921 loops=1)
                     Sort Key: users.nick
                     ->  Seq Scan on users  (cost=0.00..1134.23 rows=9923
width=603) (actual time=14.61..1839.57 rows=9923 loops=1)
Total runtime: 4643.07 msec
(23 rows)

---------------------

With LIMIT 10:

 Limit  (cost=0.00..24.79 rows=10 width=660) (actual time=8380.84..23224.93
rows=5 loops=1)
   ->  Merge Join  (cost=0.00..3620915.28 rows=1460751 width=660) (actual
time=8380.82..23224.88 rows=5 loops=1)
         Merge Cond: ("outer".nick = "inner".nick)
         ->  Nested Loop  (cost=0.00..2820642.36 rows=75288 width=38)
(actual time=234.64..15567.35 rows=17 loops=1)
               ->  Index Scan using nick__friend_nick on friends f0
(cost=0.00..2509966.71 rows=3880 width=19) (actual time=5.92..15360.73
rows=391 loops=1)
                     Filter: (friend_nick = 'asta'::character varying)
               ->  Index Scan using nick__friend_nick on friends fo0
(cost=0.00..79.82 rows=19 width=19) (actual time=0.48..0.48 rows=0
loops=391)
                     Index Cond: ((fo0.nick = 'chalky'::character varying)
AND (fo0.friend_nick = "outer".nick))
         ->  Materialize  (cost=799561.12..799561.12 rows=3880 width=622)
(actual time=7650.62..7651.35 rows=337 loops=1)
               ->  Nested Loop  (cost=0.00..799561.12 rows=3880 width=622)
(actual time=10.42..7640.60 rows=360 loops=1)
                     ->  Index Scan using journals_0_pkey on users
(cost=0.00..5102.15 rows=9923 width=603) (actual time=1.03..2548.42
rows=9923 loops=1)
                     ->  Index Scan using nick__friend_nick on friends f1
(cost=0.00..79.82 rows=19 width=19) (actual time=0.47..0.47rows=0
loops=9923)
                           Index Cond: ((f1.nick = "outer".nick) AND
(f1.friend_nick = 'furrr'::character varying))
Total runtime: 23227.58 msec
(14 rows)

-------------------

Relevant table descriptions:

                         Table "public.users"
      Column       |            Type             |     Modifiers
-------------------+-----------------------------+--------------------
 id                | integer                     |
 nick              | character varying(15)       | not null
 name              | character varying           |
 website           | character varying(1024)     |
 website_name      | character varying           |
 country           | character varying           |
 city              | character varying           |
 birth_date        | date                        |
 email             | character varying(50)       |
 icq               | bigint                      |
 bio               | text                        |
 account_type      | character varying           |
 is_community      | boolean                     | not null
 friends_num       | integer                     | not null default 0
 friend_of_num     | integer                     | not null default 0
 member_of_num     | integer                     | not null default 0
 memories_num      | integer                     | not null default 0
 time_created      | timestamp without time zone | not null
 time_updated      | timestamp without time zone |
 journal_entries   | integer                     | not null default 0
 comments_received | integer                     | not null default 0
 comments_posted   | integer                     | not null default 0
 picture           | character varying           |
 interests_num     | integer                     | not null default 0
 memored_num       | integer                     | not null default 0
 age               | smallint                    |
 rank_pos          | integer                     |
 rank              | numeric(10,3)               |
 location          | character varying(255)      |
Indexes: journals_0_pkey primary key btree (nick),
         age btree (age),
         country btree (country),
         is_community btree (is_community),
         location_info btree (country, city),
         picture btree (picture),
         rank btree (rank),
         time_updated btree (time_updated)



             Table "public.friends"
   Column    |         Type          | Modifiers
-------------+-----------------------+-----------
 nick        | character varying(15) | not null
 friend_nick | character varying(15) | not null
Indexes: friends2_pkey primary key btree (nick, friend_nick),
         friend_nick__nick unique btree (friend_nick, nick),
         nick__friend_nick unique btree (nick, friend_nick)



> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, September 23, 2003 19:43
> To: Egor Shipovalov
> Cc: Pgsql-General
> Subject: Re: [GENERAL] How to get the total number of rows returned by
> query when using a cursor?
>
>
> "Egor Shipovalov" <pgsql_list@eonline.ru> writes:
> > The interesting thing is that some queries are tens (!) of
> times slower with
> > LIMIT that without. These contain a number of joins on
> subselects. If you're
> > interested, I'll post examples.
>
>
>             regards, tom lane


Re: How to get the total number of rows returned by query

От
Manfred Koizar
Дата:
Egor Shipovalov wrote:
>  Limit  (cost=0.00..24.79 rows=10 width=660) (actual time=8380.84..23224.93
> rows=5 loops=1)
>[...]
>                ->  Index Scan using nick__friend_nick on friends f0
> (cost=0.00..2509966.71 rows=3880 width=19) (actual time=5.92..15360.73
> rows=391 loops=1)
 >                      Filter: (friend_nick = 'asta'::character varying)

AFAICS this estimated rows=3880 vs. actual rows=391 misestimation is the
source of your problem.  If you can convince the planner that
friend_nick='asta' gives only 400 rows, it probably will switch to an
index scan using friend_nick__nick with an estimated cost of ~ 1600 and
an actual time of ~ 100.

>                ->  Nested Loop  (cost=0.00..799561.12 rows=3880 width=622)
> (actual time=10.42..7640.60 rows=360 loops=1)
>                      ->  Index Scan using journals_0_pkey on users
> (cost=0.00..5102.15 rows=9923 width=603) (actual time=1.03..2548.42
> rows=9923 loops=1)
>                      ->  Index Scan using nick__friend_nick on friends f1
> (cost=0.00..79.82 rows=19 width=19) (actual time=0.47..0.47rows=0
> loops=9923)
>                            Index Cond: ((f1.nick = "outer".nick) AND
> (f1.friend_nick = 'furrr'::character varying))

With better statistics this might change to a much cheaper
-> Nested Loop
    -> Index Scan using friend_nick__nick on f1 (cost=..1600 rows=400)
    -> Index Scan using u_pkey on users ( ... loops=448)

... unless the planner finds an ever faster plan.

So try
    ALTER TABLE friends
    ALTER COLUMN friend_nick SET STATISTICS 100;
    ANALYSE friends;

and let us know how this affects your query.


>              Table "public.friends"
>    Column    |         Type          | Modifiers
> -------------+-----------------------+-----------
>  nick        | character varying(15) | not null
>  friend_nick | character varying(15) | not null
> Indexes: friends2_pkey primary key btree (nick, friend_nick),
>          friend_nick__nick unique btree (friend_nick, nick),
>          nick__friend_nick unique btree (nick, friend_nick)

BTW, this last index is useless because it duplicates the primary key.

Servus
  Manfred