Slow select performance despite seemingly reasonable query plan

Поиск
Список
Период
Сортировка
От David Brain
Тема Slow select performance despite seemingly reasonable query plan
Дата
Msg-id 849c74160905070714w638a2277tedfd099476793cbd@mail.gmail.com
обсуждение исходный текст
Ответы Re: Slow select performance despite seemingly reasonable query plan  (Scott Mead <scott.lists@enterprisedb.com>)
Re: Slow select performance despite seemingly reasonable query plan  (Matthew Wakeling <matthew@flymine.org>)
Re: Slow select performance despite seemingly reasonable query plan  (Nikolas Everett <nik9000@gmail.com>)
Список pgsql-performance
Hi,

Some context, we have a _lot_ of data, > 1TB, mostly in 1 'table' -
the 'datatable' in the example below although in order to improve
performance this table is partitioned (by date range) into a number of
partition tables.  Each partition contains up to 20GB of data (tens of
millons of rows), with an additional ~3GB of indexes, all this is
served off a fairly high performance server (8 core 32Gb, with FC
attached SAN storage).  PostgreSQL version is 8.3.5 (running on 64bit
RHEL 5.2)

This has been working reasonably well, however in the last few days
I've been seeing extremely slow performance on what are essentially
fairly simple 'index hitting' selects on this data.  From the host
side I see that the postgres query process is mostly in IO wait,
however there is very little data actually being transferred (maybe
2-4 MB/s) - when a different query (say a select count(*) form
datatable) will yield a sustained 150+ MB/s.  There have been no
configuration changes during this time, although of course the
database has grown as data is added on a daily basis.

I'm not sure of the best way to diagnose this issue - the possible
causes  I can think of are:

1. Problem with random versus sequential reads on storage system.
2. 'Something' with PostgreSQL itself.
3. Problem with the host environment - one suspicion I have here is
that we are >90% full on the storage drives (ext3), I'm not sure if
that is impacting performance.

Any thoughts as to how to procede from here would be very welcome.

Here is an example query plan - looks reasonable to me, seems is
making use of the indexes and the constraint exclusion on the
partition tables:

Nested Loop Left Join  (cost=0.00..6462463.96 rows=1894 width=110)
   ->  Append  (cost=0.00..6453365.66 rows=1894 width=118)
         ->  Seq Scan on datatable sum  (cost=0.00..10.75 rows=1 width=118)
               Filter: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = 'xxxx'::text) AND (NOT CASE WHEN (NOT obsolete)
THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN
(obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true
WHEN (cdrdatasourceid = 1) THEN false ELSE true END END))
         ->  Index Scan using
datatable_20090328_customeriddatapointdate_idx on datatable_20090328
sum  (cost=0.00..542433.51 rows=180 width=49)
               Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = 'xxxx'::text))
               Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
= 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
(cdrdatasourceid = 1) THEN false ELSE true END END)
         ->  Index Scan using
datatable_20090404_customeriddatapointdate_idx on datatable_20090404
sum  (cost=0.00..1322098.74 rows=405 width=48)
               Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = 'xxxx'::text))
               Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
= 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
(cdrdatasourceid = 1) THEN false ELSE true END END)
         ->  Index Scan using
datatable_20090411_customeriddatapointdate_idx on datatable_20090411
sum  (cost=0.00..1612744.29 rows=450 width=48)
               Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = 'xxxx'::text))
               Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
= 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
(cdrdatasourceid = 1) THEN false ELSE true END END)
         ->  Index Scan using
datatable_20090418_customeriddatapointdate_idx on datatable_20090418
sum  (cost=0.00..1641913.58 rows=469 width=49)
               Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = 'xxxx'::text))
               Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
= 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
(cdrdatasourceid = 1) THEN false ELSE true END END)
         ->  Index Scan using
datatable_20090425_customeriddatapointdate_idx on datatable_20090425
sum  (cost=0.00..1334164.80 rows=389 width=49)
               Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = 'xxxx'::text))
               Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
= 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
(cdrdatasourceid = 1) THEN false ELSE true END END)
   ->  Index Scan using pk_cdrextension on cdrextension ext
(cost=0.00..4.77 rows=1 width=8)
         Index Cond: (sum.id = ext.datatableid)


Thanks,

David.

В списке pgsql-performance по дате отправления:

Предыдущее
От: Matthew Wakeling
Дата:
Сообщение: Re: GiST index performance
Следующее
От: Scott Mead
Дата:
Сообщение: Re: Slow select performance despite seemingly reasonable query plan