Обсуждение: selects with large offset really slow

Поиск
Список
Период
Сортировка

selects with large offset really slow

От
John Smith
Дата:

There are 90K-100K records in each of two tables. This simple join is really slow and the larger the offset, the longer it takes. Anything I can do to speed it up (a lot)? I've double-checked and there are indexes on everything used for joins and ordering.

############################

explain analyze select l.id, l.url
from links l
inner join stats s
on l.id = s.link_id
 and s.referrer_id = 1
order by l.url
limit 100
offset 90000;

QUERY PLAN
--------------------------------------------------------------------------------------------------
Limit  (cost=19546.62..19546.87 rows=100 width=62) (actual time=20557.00..20558.00 rows=100 loops=1)
 -> Sort  (cost=19321.62..19571.32 rows=99881 width=62) (actual time=19775.00..20410.00 rows=90101 loops=1)
   Sort Key: l.url
  -> Hash Join  (cost=2471.00..7662.54 rows=99881 width=62) (actual time=3013.00..12002.00 rows=100000 loops=1)
      Hash Cond: ("outer".id = "inner".link_id)
      ->  Seq Scan on links l  (cost=0.00..2444.81 rows=99881 width=42) (actual time=65.00..1790.00 rows=99881 loops=1)
        ->  Hash  (cost=2221.00..2221.00 rows=100000 width=20) (actual time=2946.00..2946.00 rows=0 loops=1)
        ->  Seq Scan on stats s  (cost=0.00..2221.00 rows=100000 width=20) (actual time=36.00..1936.00 rows=100000 loops=1)
        Filter: (referrer_id = 1)

Total runtime: 20571.00 msec
(10 rows)



Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now

Re: selects with large offset really slow

От
Richard Huxton
Дата:
On Friday 07 Feb 2003 5:01 am, John Smith wrote:
> There are 90K-100K records in each of two tables. This simple join is
> really slow and the larger the offset, the longer it takes. Anything I can
> do to speed it up (a lot)? I've double-checked and there are indexes on
> everything used for joins and ordering.

> QUERY PLAN
> ---------------------------------------------------------------------------
>-----------------------
Limit  (cost=19546.62..19546.87 rows=100 width=62)
> (actual time=20557.00..20558.00 rows=100 loops=1)
> -> Sort  (cost=19321.62..19571.32 rows=99881 width=62) (actual
> time=19775.00..20410.00 rows=90101 loops=1)
> Sort Key: l.url
>   -> Hash Join  (cost=2471.00..7662.54 rows=99881 width=62) (actual
> time=3013.00..12002.00 rows=100000 loops=1) Hash Cond: ("outer".id =
> "inner".link_id)

It's the join and sort that's getting you. PG has to fetch and sort all the
rows so it can discard 90,000 of them. I can't think of a good way for it to
optimise this, though you might want to check your sort_mem is set high
enough.

> explain analyze select l.id, l.url
> from links l
> inner join stats s
> on l.id = s.link_id
>  and s.referrer_id = 1
> order by l.url
> limit 100
> offset 90000;

There are three options you might want to look at:

1. Use a temporary table, then select from that for each page.
2. Use a cursor, and just fetch 100 records at a time from it.
3. Cheat and fetch where l.url>=X, remembering X as the highest url from the
last set of results. This of course means pages of results will overlap.

--
  Richard Huxton

Re: selects with large offset really slow

От
greg@turnstep.com
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> There are 90K-100K records in each of two tables. This simple join
> is really slow and the larger the offset, the longer it takes.
> Anything I can do to speed it up (a lot)? I've double-checked and
> there are indexes on everything used for joins and ordering.

Indexes won't really help here, and it's not the offset that is
killing you, it's the sort. There are some possibilities however,
depending on the nature of the tables. If they are fairly static,
you can speed it up drastically by removing the WHERE clause
and the ORDER BY clause.

Since all we care about is if referrer_id is "1", we can do this:

CREATE INDEX stats_id ON stats(link_id);
ALTER TABLE links ADD ref BOOL;
BEGIN;
UPDATE links SET ref='true' WHERE EXISTS
  (SELECT 1 FROM stats WHERE link_id=links.id AND referrer_id=1);
COMMIT;

If you don't care about the "non 1" referrer_ids, you can do this instead:

CREATE INDEX stats_id ON stats(link_id);
BEGIN;
DELETE FROM links WHERE NOT EXISTS
  (SELECT 1 FROM stats WHERE link_id=links.id AND referrer_id=1);
COMMIT;


Now we can do something like this:

CREATE INDEX links_url on links(url);
SELECT id, url FROM links ORDER BY url LIMIT 100 OFFSET 80000;

This is pretty fast, as it uses the links_url index.

We can tweak a little more speed out of it by making the
ORDER BY permanent with a CLUSTER command:

CLUSTER links_url ON links;

Now as long as you don't change that table, you can do this:

SELECT id, url FROM links LIMIT 100 OFFSET 90000;

Very fast, and probably worth all the overhead if you are
making multiple queries. Moving referrer_id into the links
table would be a good thing as well, as it would allow you
to use the pre-cluster SELECT above with the links_url
index and still get a good response.

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200302071246

-----BEGIN PGP SIGNATURE-----
Comment: http://www.turnstep.com/pgp.html

iD8DBQE+RAAlvJuQZxSWSsgRAqM+AKDX5d7vCxFDRDybNgXinuq9coF/SgCg4OFy
qRQKb6w693Yyt1dZfCFKCpQ=
=q0og
-----END PGP SIGNATURE-----



Re: selects with large offset really slow

От
John Smith
Дата:

Thanks, I'll try those suggestions. But...

Why can't PG just use an index? Say, look at the index for 'url', go to entry 90000, then get the next 100 entries? I was suprised that it retrieves *all* records then sorts them (when there's already a sorted index). I'm trying to switch from mysql - the same exact query with it is very fast with 100-500K+ rows, and a large offset doesn't seem to affect the query's speed.

John

 Richard Huxton <dev@archonet.com> wrote:

On Friday 07 Feb 2003 5:01 am, John Smith wrote:
> There are 90K-100K records in each of two tables. This simple join is
> really slow and the larger the offset, the longer it takes. Anything I can
> do to speed it up (a lot)? I've double-checked and there are indexes on
> everything used for joins and ordering.

> QUERY PLAN
> ---------------------------------------------------------------------------
>-----------------------
Limit (cost=19546.62..19546.87 rows=100 width=62)
> (actual time=20557.00..20558.00 rows=100 loops=1)
> -> Sort (cost=19321.62..19571.32 rows=99881 width=62) (actual
> time=19775.00..20410.00 rows=90101 loops=1)
> Sort Key: l.url
> -> Hash Join (cost=2471.00..7662.54 rows=99881 width=62) (actual
> time=3013.00..12002.00 rows=100000 loops=1) Hash Cond: ("outer".id =
> "in ner".link_id)

It's the join and sort that's getting you. PG has to fetch and sort all the
rows so it can discard 90,000 of them. I can't think of a good way for it to
optimise this, though you might want to check your sort_mem is set high
enough.

> explain analyze select l.id, l.url
> from links l
> inner join stats s
> on l.id = s.link_id
> and s.referrer_id = 1
> order by l.url
> limit 100
> offset 90000;

There are three options you might want to look at:

1. Use a temporary table, then select from that for each page.
2. Use a cursor, and just fetch 100 records at a time from it.
3. Cheat and fetch where l.url>=X, remembering X as the highest url from the
last set of results. This of course means pages of results will overlap.

--
Richard Huxton



Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now

Re: selects with large offset really slow

От
John Smith
Дата:

  Richard Huxton <dev@archonet.com> wrote:

> explain analyze select l.id, l.url
> from links l
> inner join stats s
> on l.id = s.link_id
> and s.referrer_id = 1
> order by l.url
> limit 100
> offset 90000;

There are three options you might want to look at:

1. Use a temporary table, then select from that for each page.
2. Use a cursor, and just fetch 100 records at a time from it.
3. Cheat and fetch where l.url>=X, remembering X as the highest url from the
last set of results. This of course means pages of results will overlap.

I tried 1 & 2 - both take about the same about of time as the original query :(. How do people deal with paging results from large tables? As is, web site pages take around 30 seconds to load (often timing out).

John



Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now

Re: selects with large offset really slow

От
wsheldah@lexmark.com
Дата:
If PG uses the url index before it does the join, it may be fetching rows
that won't satisfy the join criteria; to be accurate, it really needs to do
the join first, before doing the limit and offset. Since the index is on
the whole column and not just on the join results, I don't think it can be
used the way you're thinking. Does this make sense at all?

Wes Sheldahl



John Smith <john_smith_45678@yahoo.com>@postgresql.org on 02/07/2003
03:54:21 PM

Sent by:    pgsql-general-owner@postgresql.org


To:    pgsql-general@postgresql.org
cc:
Subject:    Re: [GENERAL] selects with large offset really slow



Thanks, I'll try those suggestions. But...
Why can't PG just use an index? Say, look at the index for 'url', go to
entry 90000, then get the next 100 entries? I was suprised that it
retrieves *all* records then sorts them (when there's already a sorted
index). I'm trying to switch from mysql - the same exact query with it is
very fast with 100-500K+ rows, and a large offset doesn't seem to affect
the query's speed.
John
 Richard Huxton <dev@archonet.com> wrote:On Friday 07 Feb 2003 5:01 am,
John Smith wrote:
> There are 90K-100K records in each of two tables. This simple join is
> really slow and the larger the offset, the longer it takes. Anything I
can
> do to speed it up (a lot)? I've double-checked and there are indexes on
> everything used for joins and ordering.

> QUERY PLAN
>
---------------------------------------------------------------------------
>-----------------------
Limit (cost=19546.62..19546.87 rows=100 width=62)
> (actual time=20557.00..20558.00 rows=100 loops=1)
> -> Sort (cost=19321.62..19571.32 rows=99881 width=62) (actual
> time=19775.00..20410.00 rows=90101 loops=1)
> Sort Key: l.url
> -> Hash Join (cost=2471.00..7662.54 rows=99881 width=62) (actual
> time=3013.00..12002.00 rows=100000 loops=1) Hash Cond: ("outer".id =
> "inner".link_id)

It's the join and sort that's getting you. PG has to fetch and sort all the
rows so it can discard 90,000 of them. I can't think of a good way for it
to
optimise this, though you might want to check your sort_mem is set high
enough.

> explain analyze select l.id, l.url
> from links l
> inner join stats s
> on l.id = s.link_id
> and s.referrer_id = 1
> order by l.url
> limit 100
> offset 90000;

There are three options you might want to look at:

1. Use a temporary table, then select from that for each page.
2. Use a cursor, and just fetch 100 records at a time from it.
3. Cheat and fetch where l.url>=X, remembering X as the highest url from
the
last set of results. This of course means pages of results will overlap.

--
Richard Huxton


---------------------------------
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now
(See attached file: C.htm)



Вложения

Re: selects with large offset really slow

От
Stephan Szabo
Дата:
On Thu, 6 Feb 2003, John Smith wrote:

>
> There are 90K-100K records in each of two tables. This simple join is
> really slow and the larger the offset, the longer it takes. Anything I
> can do to speed it up (a lot)? I've double-checked and there are
> indexes on everything used for joins and ordering.

>
> ############################
>
> explain analyze select l.id, l.url
> from links l
> inner join stats s
> on l.id = s.link_id
>  and s.referrer_id = 1
> order by l.url
> limit 100
> offset 90000;

Hmm, do you have an index on the combination of (url, id)?  I don't know
if that'd result in a better plan or not, but it may be able to use that
to help.


Re: selects with large offset really slow

От
John Smith
Дата:

This is interesting:

explain analyze select * from links order by id limit 100 offset 90000;
Index Scan...Total runtime: 2951.00 msec

explain analyze select * from links order by id limit 100 offset 9000;
Index Scan...Total runtime: 232.00 msec

explain analyze select * from links order by id limit 100 offset 900;
Index Scan...Total runtime: 39.00 msec

explain analyze select * from links order by id limit 100 offset 90;
Index Scan...Total runtime: 6.00 msec

I don't get it - why's it take longer the larger the offset is? Disk seeks to find index entries?

John

 Dennis Gearon <gearond@cvc.net> wrote:

Seems like it should do intelligent retrieval of records for that stage of processing.

2/7/2003 12:54:21 PM, John Smith wrote:

>
>
> Date: Fri, 7 Feb 2003 12:54:21 -0800 (PST)
>
> From: John Smith
> Subject:Re: [GENERAL] selects with large offset really slow
> To: pgsql-general@postgresql.org
>
>
>
>
> Thanks, I'll try those suggestions. But...
>
>
> Why can't PG just use an index? Say, look at the index for 'url', go to
> entry 90000, then get the next 100 entries? I was suprised that it retrieves
> *all* records then sorts them (when there's already a sorted index). I'm
> trying to switch from mysql - the same exact query with it is very fast with
> 100-500K+ rows, and a large offset doesn't seem to affect t he query's speed.
>
> John
>
>
> Richard Huxton wrote:
>
> On Friday 07 Feb 2003 5:01 am, John Smith wrote:
> > There are 90K-100K records in each of two tables. This simple join is
> > really slow and the larger the offset, the longer it takes. Anything I
> can
> > do to speed it up (a lot)? I've double-checked and there are indexes on
> > everything used for joins and ordering.
>
> > QUERY PLAN
> > -------------------------------------------------------------------------
> --
> >-----------------------
> Limit (cost=19546.62..19546.87 rows=100 width=62)
> > (actual time=20557.00..20558.00 rows=100 loops=1)
> > -> Sort (cost=19321.62..19571.32 rows=99881 width=62) (actual
> > time=19775.00..20410.00 rows=90101 loops=1)
> > Sort Key: l.url
> > -> Hash Join (cost=2471.00..7662.54 rows=99881 width=62) (actual
> > time=3013.00..12002.00 rows=100000 loops=1) Hash Cond: ("outer".id =
> > "in ner".link_id)
>
> It's the join and sort that's getting you. PG has to fetch and sort all the
>
> rows so it can discard 90,000 of them. I can't think of a good way for it
> to
> optimise this, though you might want to check your sort_mem is set high
> enough.
>
> > explain analyze select l.id, l.url
> > from links l
> > inner join stats s
> > on l.id = s.link_id
> > and s.referrer_id = 1
> > order by l.url
> > limit 100
> > offset 90000;
>
> There are three options you might want to look at:
>
> 1. Use a temporary table, then select from that for each page.
> 2. Use a cursor, and just fetch 100 records at a time from it.
> 3. Cheat and fetch where l.url>=X, remembering X as the highest url f rom
> the
> last set of results. This of course means pages of results will overlap.
>
> --
> Richard Huxton
>
>
>
>
>
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now




Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now

Re: selects with large offset really slow

От
John Smith
Дата:

 Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote:

On Thu, 6 Feb 2003, John Smith wrote:

>
> There are 90K-100K records in each of two tables. This simple join is
> really slow and the larger the offset, the longer it takes. Anything I
> can do to speed it up (a lot)? I've double-checked and there are
> indexes on everything used for joins and ordering.

>
> ############################
>
> explain analyze select l.id, l.url
> from links l
> inner join stats s
> on l.id = s.link_id
> and s.referrer_id = 1
> order by l.url
> limit 100
> offset 90000;

Hmm, do you have an index on the combination of (url, id)? I don't know
if that'd result in a better plan or not, but it may be able to use that
to help.


Just tried it - no speed improvement I'm afraid :(.

John

 



Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now

Re: selects with large offset really slow

От
Jean-Luc Lachance
Дата:
Why don't you use a temp table with a serial field and use is instead of
offset?

Re: selects with large offset really slow

От
John Smith
Дата:

how?

 Jean-Luc Lachance <jllachan@nsd.ca> wrote:

Why don't you use a temp table with a serial field and use is instead of
offset?



Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now

Re: selects with large offset really slow

От
Jean-Luc Lachance
Дата:
John,

CREATE TEMP TABLE results (
  ndx SERIAL,
  { plus all the fields you get from your query}
  );

INSERT INTO results( {all your fields except ndx})
  SELECT ...
  ;

then

select ... from results where dnx >= start limit 100;

JLL




John Smith wrote:

> how?
>
>  Jean-Luc Lachance <jllachan@nsd.ca> wrote:
>
>      Why don't you use a temp table with a serial field and use is instead of
>      offset?