Обсуждение: Changing ORDER BY column slows query dramatically

Поиск
Список
Период
Сортировка

Changing ORDER BY column slows query dramatically

От
brick pglists
Дата:
Hi,

(I have put the contents of this email at http://pastebin.com/6h49zrYE
for better readibility)

I have been circling around the following problem for a few hours and
I am wondering if anyone has any suggestions about why the the second
query is taking so much longer than the first.
Both tables have similar row counts, all tables are partitions, though
the queries are querying the partitions directly, not the parent
tables.
The 'DELIVERED' status comprises almost 100% of the rows in the
notification_xxx tables.
ANALYZE was run right before each query was issued.
This test database is on EC2. PG 9.2.4, CentOS release 5.9, 70GB RAM.
GUC changes listed at the bottom. Are there any other details I can supply?
Note that the same behaviour occurred on 9.2.3, but I cannot say for
sure for whether it occurred on 9.2.2.
Dropping the random_page_cost to 1 helped the slow query (after it is
cached, drops down to about 1.7s).
Setting enable_mergejoin to false causes the slow query to choose a
Nested Loop and shaves off about 30% of the execution time.



-- Fast query, first run, uncached, ORDER BY nearly-unique tstamp_utc
explain (analyze, buffers)
SELECT e.id,e.device,e.engine,e.form,e.device_type_id,ncbs.delivery_tstamp_utc
FROM  event_20130404 e
INNER JOIN notifications_20130404 ncbs ON (e.id = ncbs.event_id)
WHERE TRUE
AND e.date_utc = '2013-04-04'
AND e.tstamp_utc BETWEEN '2013-04-04 10:00:00' AND '2013-04-04 18:00:00'
AND e.org_id = 216471
AND ncbs.event_creation_tstamp_utc BETWEEN '2013-04-04 10:00:00' AND
'2013-04-04 18:00:00'
AND ncbs.status = 'DELIVERED'
ORDER BY e.tstamp_utc desc
offset 10000 limit 100;

 Limit  (cost=29639.24..29935.63 rows=100 width=50) (actual
time=159.103..167.637 rows=100 loops=1)
   Buffers: shared hit=46588 read=1055
   I/O Timings: read=63.416
   ->  Nested Loop  (cost=0.00..1882500.73 rows=635138 width=50)
(actual time=0.240..159.693 rows=10100 loops=1)
         Buffers: shared hit=46588 read=1055
         I/O Timings: read=63.416
         ->  Index Scan Backward using
event_20130404_tstamp_utc_org_id_idx on event_20130404 e
(cost=0.00..315412.34 rows=1876877 width=42) (actual
time=0.129..61.981 rows=10100 loops=1)
               Index Cond: ((tstamp_utc >= '2013-04-04
10:00:00'::timestamp without time zone) AND (tstamp_utc <= '2013-04-04
18:00:00'::timestamp without time zone) AND (org_id = 216471))
               Filter: (date_utc = '2013-04-04'::date)
               Buffers: shared hit=6309 read=833
               I/O Timings: read=40.380
         ->  Index Scan using
notifications_20130404_event_id_org_id_pk on notifications_20130404
ncbs  (cost=0.00..0.82 rows=1 width=16) (actual time=0.006..0.006
rows=1 loops=10100)
               Index Cond: (event_id = e.id)
               Filter: ((event_creation_tstamp_utc >= '2013-04-04
10:00:00'::timestamp without time zone) AND (event_creation_tstamp_utc
<= '2013-04-04 18:00:00'::timestamp without time zone) AND (status =
'DELIVERED'::text))
               Buffers: shared hit=40279 read=222
               I/O Timings: read=23.036
 Total runtime: 170.436 ms
(17 rows)




-- Slow query, uncached, ORDER BY primary key
explain (analyze,buffers)
SELECT e.device,e.id,e.engine,e.form,e.device_type_id,ncbs.delivery_tstamp_utc
FROM  event_20130405 e
INNER JOIN notifications_20130405 ncbs ON (e.id = ncbs.event_id)
WHERE TRUE
AND e.date_utc = '2013-04-05'
AND e.tstamp_utc BETWEEN '2013-04-05 10:00:00' AND '2013-04-05 18:00:00'
AND e.org_id = 216471
AND ncbs.event_creation_tstamp_utc BETWEEN '2013-04-05 10:00:00' AND
'2013-04-05 18:00:00'
AND ncbs.status = 'DELIVERED'
ORDER BY e.id desc
OFFSET 10000 LIMIT 100;

 Limit  (cost=14305.61..14447.99 rows=100 width=42) (actual
time=13949.028..13950.353 rows=100 loops=1)
   Buffers: shared hit=2215465 read=141796 written=12128
   I/O Timings: read=11308.341 write=116.673
   ->  Merge Join  (cost=67.31..887063.35 rows=622965 width=42)
(actual time=13761.135..13933.879 rows=10100 loops=1)
         Merge Cond: (e.id = ncbs.event_id)
         Buffers: shared hit=2215465 read=141796 written=12128
         I/O Timings: read=11308.341 write=116.673
         ->  Index Scan Backward using event_20130405_id_pk on
event_20130405 e  (cost=0.00..612732.34 rows=1889715 width=34) (actual
time=2076.812..2111.274 rows=10100 loops=1)
               Filter: ((tstamp_utc >= '2013-04-05
10:00:00'::timestamp without time zone) AND (tstamp_utc <= '2013-04-05
18:00:00'::timestamp without time zone) AND (date_utc =
'2013-04-05'::date) AND (org_id = 216471))
               Rows Removed by Filter: 1621564
               Buffers: shared hit=1176391 read=113344 written=11918
               I/O Timings: read=774.769 write=113.095
         ->  Index Scan Backward using
notifications_20130405_event_id_org_id_pk on notifications_20130405
ncbs  (cost=0.00..258079.61 rows=2135847 width=16) (actual
time=11684.312..11784.738 rows=11653 loops=1)
               Filter: ((event_creation_tstamp_utc >= '2013-04-05
10:00:00'::timestamp without time zone) AND (event_creation_tstamp_utc
<= '2013-04-05 18:00:00'::timestamp without time zone) AND (status =
'DELIVERED'::text))
               Rows Removed by Filter: 1620011
               Buffers: shared hit=1039074 read=28452 written=210
               I/O Timings: read=10533.572 write=3.578
 Total runtime: 13950.458 ms
(18 rows)


select * from admin.changed_guc_settings ;
              name               |
           current_setting

---------------------------------+---------------------------------------------------------------------------------------------------------------
 version                         | PostgreSQL 9.2.4 on
x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red
Hat 4.1.2-52), 64-bit
 autovacuum_analyze_scale_factor | 0.01
 autovacuum_naptime              | 20s
 autovacuum_vacuum_cost_delay    | 5ms
 autovacuum_vacuum_cost_limit    | 1000
 autovacuum_vacuum_scale_factor  | 0.01
 autovacuum_vacuum_threshold     | 50
 bgwriter_lru_maxpages           | 400
 checkpoint_completion_target    | 0.8
 checkpoint_segments             | 64
 client_encoding                 | UTF8
 commit_delay                    | 10
 effective_cache_size            | 54GB
 effective_io_concurrency        | 4
 lc_collate                      | en_US.UTF-8
 lc_ctype                        | en_US.UTF-8
 log_autovacuum_min_duration     | 0
 log_filename                    | postgresql-%Y-%m-%d.log
 log_line_prefix                 | %t [%p]: [%l-1] (user=%u) (rhost=%h) (db=%d)
 log_lock_waits                  | on
 log_min_duration_statement      | 100ms
 log_temp_files                  | 0
 logging_collector               | on
 maintenance_work_mem            | 6GB
 max_connections                 | 200
 max_locks_per_transaction       | 100
 max_stack_depth                 | 6MB
 pg_stat_statements.max          | 1000
 pg_stat_statements.track        | all
 random_page_cost                | 1
 search_path                     | "$user",public,admin
 server_encoding                 | UTF8
 shared_buffers                  | 6GB
 shared_preload_libraries        | pg_stat_statements
 track_activity_query_size       | 6144
 track_functions                 | pl
 track_io_timing                 | on
 vacuum_cost_limit               | 2000
 wal_buffers                     | 16MB
 wal_keep_segments               | 100
 wal_level                       | minimal
 work_mem                        | 900MB




test01=# \d event_20130406
                                              Table "public.event_20130406"
        Column         |            Type             |
            Modifiers

-----------------------+-----------------------------+-------------------------------------------------------------------
 id                    | bigint                      | not null
default nextval('event_id_seq'::regclass)
 org_id                | integer                     | not null
 date_utc              | date                        | not null
default (timezone('UTC'::text, clock_timestamp()))::date
 tstamp_utc            | timestamp without time zone | not null
default timezone('UTC'::text, clock_timestamp())
 device                | text                        | not null
 provider              | text                        | not null
 form                  | text                        |
 engine                | text                        |
 user_service_provider | text                        |
 sender                | text                        |
 properties            | json                        |
 delivery_windows      | json                        |
 device_type_id        | smallint                    |
Indexes:
    "event_20130406_id_pk" PRIMARY KEY, btree (id) WITH (fillfactor=100)
    "event_20130406_date_utc_idx" btree (date_utc) WITH (fillfactor=100)
    "event_20130406_device_type_id_idx" btree (device_type_id) WITH
(fillfactor=100)
    "event_20130406_org_id_idx" btree (org_id) WITH (fillfactor=100)
    "event_20130406_tstamp_utc_org_id_idx" btree (tstamp_utc, org_id)
WITH (fillfactor=100)
Check constraints:
    "event_20130406_date_utc_check" CHECK (date_utc = '2013-04-06'::date)


test01=# \d notification_counts_by_status_20130406
        Table "public.notification_counts_by_status_20130406"
          Column           |            Type             | Modifiers
---------------------------+-----------------------------+-----------
 event_id                  | bigint                      | not null
 org_id                    | integer                     | not null
 status                    | text                        | not null
 tstamp_utc                | timestamp without time zone | not null
 event_creation_tstamp_utc | timestamp without time zone | not null
 delivery_tstamp_utc       | timestamp without time zone |
Indexes:
    "notification_counts_by_status_20130406_event_id_org_id_pk"
PRIMARY KEY, btree (event_id, org_id)
    "ncbs_20120406_delivery_tstamp_utc_pidx" btree
(delivery_tstamp_utc) WHERE delivery_tstamp_utc IS NOT NULL
    "notification_counts_by_status_20130406_ectu_idx" btree
(event_creation_tstamp_utc) WITH (fillfactor=66)
    "notification_counts_by_status_20130406_org_id_idx" btree (org_id)
WITH (fillfactor=98)
    "notification_counts_by_status_20130406_status_idx" btree (status)
WITH (fillfactor=66)
Check constraints:
    "ncbs_20130406_ectu_ck" CHECK (event_creation_tstamp_utc >=
'2013-04-06 00:00:00'::timestamp without time zone AND
event_creation_tstamp_utc < '2013-04-07 00:00:00'::timestamp without
time zone)
Inherits: notification_counts_by_status


-- Row counts, in case that is useful
test01=# select count(*) from event_20130404:
6,354,940
test01=# select count(*) from event_20130404 where tstamp_utc BETWEEN
'2013-04-04 10:00:00' AND '2013-04-04 18:00:00';
2,157,002

test01=# select count(*) from event_20130405 where tstamp_utc BETWEEN
'2013-04-05 10:00:00' AND '2013-04-05 18:00:00';
2,160,009
test01=# select count(*) from event_20130405;
6,479,278


test01=# select count(*) from notification_counts_by_status_20130404
where event_creation_tstamp_utc BETWEEN '2013-04-04 10:00:00' AND
'2013-04-04 18:00:00';
2,157,000

test01=# select count(*) from notification_counts_by_status_20130405
where event_creation_tstamp_utc BETWEEN '2013-04-05 10:00:00' AND
'2013-04-05 18:00:00';
2,160,009


Re: Changing ORDER BY column slows query dramatically

От
Shaun Thomas
Дата:
On 04/12/2013 11:51 AM, brick pglists wrote:

>           ->  Index Scan Backward using event_20130405_id_pk on
> event_20130405 e  (cost=0.00..612732.34 rows=1889715 width=34) (actual
> time=2076.812..2111.274 rows=10100 loops=1)
 >                 Filter: ((tstamp_utc >= '2013-04-05
 > 10:00:00'::timestamp without time zone) AND (tstamp_utc <= '2013-04-05
 > 18:00:00'::timestamp without time zone) AND (date_utc =
 > '2013-04-05'::date) AND (org_id = 216471))

This right here is your culprit. The planner thinks it'll be faster to
grab 100 rows by scanning your primary key backwards and filtering out
the matching utc timestamps and other criteria.

Since it doesn't show up in your GUC list, you should probably increase
your default_statistics_target to 400 or more, analyze, and try again.
The heuristics for the dates aren't complete enough, so it thinks there
are few matches. If that doesn't work and you want a quick, but ugly fix
for this, you can create the following index:

CREATE INDEX event_20130406_id_desc_tstamp_utc_idx
     ON event_20130406 (id DESC, tstamp_utc);

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-676-8870
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Changing ORDER BY column slows query dramatically

От
brick pglists
Дата:
Hi Shaun,

On Fri, Apr 12, 2013 at 12:59 PM, Shaun Thomas <sthomas@optionshouse.com> wrote:
> On 04/12/2013 11:51 AM, brick pglists wrote:
>
> Since it doesn't show up in your GUC list, you should probably increase your
> default_statistics_target to 400 or more, analyze, and try again. The
> heuristics for the dates aren't complete enough, so it thinks there are few
> matches. If that doesn't work and you want a quick, but ugly fix for this,
> you can create the following index:
>
> CREATE INDEX event_20130406_id_desc_tstamp_utc_idx
>     ON event_20130406 (id DESC, tstamp_utc);


Thanks for your suggestions. Bumping up the default_statistics_target
several times all the way to 4000 (ANALYZEd each time) did not help,
however, adding the index you suggested helped with that query. It is
still over a magnitude slower than the version that sorts by
tstamp_utc, but it's a start. I created a similar index (CREATE INDEX
event_20130406_id_desc_tstamp_utc_desc_idx ON event_20130406 (id DESC,
tstamp_utc DESC)) where both columns were sorted DESCm and given the
choice between those two, it chose the latter.
Setting enable_mergejoin to false results in a plan much closer to the
original fast one, and further changing cpu_tuple_cost up to 1 results
in a query about 3x slower than the original fast one.

The ORDER BY e.id query, with the new index, enable_mergejoin
disabled, and cpu_tuple_cost bumped up to 1:

 Limit  (cost=125386.16..126640.02 rows=100 width=42) (actual
time=220.807..221.864 rows=100 loops=1)
   Buffers: shared hit=49171 read=6770
   I/O Timings: read=44.980
   ->  Nested Loop  (cost=0.00..7734858.92 rows=616883 width=42)
(actual time=110.718..213.923 rows=10100 loops=1)
         Buffers: shared hit=49171 read=6770
         I/O Timings: read=44.980
         ->  Index Scan using
event_20130406_id_desc_tstamp_utc_desc_idx on event_20130406 e
(cost=0.00..2503426.81 rows=1851068 width=34) (actual
time=110.690..139.001 rows=10100 loops=1)
               Index Cond: ((tstamp_utc >= '2013-04-06
10:00:00'::timestamp without time zone) AND (tstamp_utc <= '2013-04-06
18:00:00'::timestamp without time zone))
               Filter: ((date_utc = '2013-04-06'::date) AND (org_id = 216471))
               Rows Removed by Filter: 1554
               Buffers: shared hit=8647 read=6770
               I/O Timings: read=44.980
         ->  Index Scan using
notification_counts_by_status_20130406_event_id_org_id_pk on
notification_counts_by_status_20130406 ncbs  (cost=0.00..1.83 rows=1
width=16) (actual time=0.003..0.004 rows=1 loops=10100)
               Index Cond: (event_id = e.id)
               Filter: ((event_creation_tstamp_utc >= '2013-04-06
10:00:00'::timestamp without time zone) AND (event_creation_tstamp_utc
<= '2013-04-06 18:00:00'::timestamp without time zone) AND (status =
'DELIVERED'::text))
               Buffers: shared hit=40524
 Total runtime: 222.127 ms
(17 rows)

Still not at the ~90ms from the "ORDER BY e.tstamp_utc DESC" version,
but not too bad.  Now I need to figure out how I can get the best plan
choice without monkeying around with enable_mergejoin and changing
cpu_tuple_cost too much.

If any more suggestions are forthcoming, I am all ears!