Re: Performance on large, append-only tables
От | David Yeu |
---|---|
Тема | Re: Performance on large, append-only tables |
Дата | |
Msg-id | 4205A216-5AD0-4F23-826F-542537B27E62@skype.net обсуждение исходный текст |
Ответ на | Performance on large, append-only tables (David Yeu <david.yeu@skype.net>) |
Ответы |
Re: Performance on large, append-only tables
(Claudio Freire <klaussfreire@gmail.com>)
|
Список | pgsql-performance |
Yeah, Reply-All... Begin forwarded message: > From: David Yeu <david.yeu@skype.net> > Subject: Re: [PERFORM] Performance on large, append-only tables > Date: February 10, 2012 10:59:04 AM EST > To: Merlin Moncure <mmoncure@gmail.com> > > On Feb 10, 2012, at 10:19 AM, Merlin Moncure wrote: > >> You can probably significantly optimize this. But first, can we see >> some explain analyze for the affected queries? > > Sorry, we should have included these in the original post. Here's the EXPLAIN output for a "id < ?" query: > > > => EXPLAIN ANALYZE SELECT "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) AND (id <?) ORDER BY id DESC LIMIT 20 OFFSET 0; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=9267.44..9267.45 rows=20 width=1321) (actual time=348.844..348.877 rows=20 loops=1) > -> Sort (cost=9267.44..9269.76 rows=4643 width=1321) (actual time=348.840..348.852 rows=20 loops=1) > Sort Key: id > Sort Method: top-N heapsort Memory: 29kB > -> Index Scan using index_lines_on_group_id on lines (cost=0.00..9242.73 rows=4643 width=1321) (actual time=6.131..319.835rows=23038 loops=1) > Index Cond: (group_id = ?) > Filter: ((deleted_at IS NULL) AND (id < ?)) > Total runtime: 348.987 ms > > > A quick suggestion from Heroku yesterday was a new index on (group_id, id). After adding it to a database fork, we endedup with: > > > => EXPLAIN ANALYZE SELECT "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) AND (id <?) ORDER BY id DESC LIMIT 20 OFFSET 0; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------ > Limit (cost=0.00..28.88 rows=20 width=1321) (actual time=17.216..109.905 rows=20 loops=1) > -> Index Scan Backward using index_lines_on_group_id_and_id on lines (cost=0.00..6416.04 rows=4443 width=1321) (actualtime=17.207..109.867 rows=20 loops=1) > Index Cond: ((group_id = ?) AND (id < ?)) > Filter: (deleted_at IS NULL) > Total runtime: 110.039 ms > > > The result has been pretty dramatic for the "id <> ?" queries, which make up the bulk of the queries. Running a whole bunchof EXPLAIN ANAYLZE queries also showed that some queries were actually choosing to use the index on `id' instead of`group_id', and that performed about as poorly as expected. Thankfully, the new index on (group_id, id) seems to be preferablenearly always. > > And for reference, here's the EXPLAIN for the LIMIT, OFFSET query: > > > => EXPLAIN ANALYZE SELECT "lines".* FROM "lines" WHERE (lines.deleted_at IS NULL) AND ("lines".group_id = ?) ORDER BYid DESC LIMIT 20 OFFSET 60; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=9274.45..9274.46 rows=20 width=1321) (actual time=109.674..109.708 rows=20 loops=1) > -> Sort (cost=9274.42..9276.75 rows=4646 width=1321) (actual time=109.606..109.657 rows=80 loops=1) > Sort Key: id > Sort Method: top-N heapsort Memory: 43kB > -> Index Scan using index_lines_on_group_id on lines (cost=0.00..9240.40 rows=4646 width=1321) (actual time=0.117..98.905rows=7999 loops=1) > Index Cond: (group_id = ?) > Filter: (deleted_at IS NULL) > Total runtime: 109.753 ms > > > - Dave >
В списке pgsql-performance по дате отправления: