Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

Поиск

Список

Период

Сортировка

От	Tory M Blue
Тема	Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
Дата	22 января 2010 г. 17:03:50
Msg-id	8a547c841001221003n90f3420va3c719bde59bae51@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL (Richard Huxton <dev@archonet.com>)
Ответы	Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL (Richard Huxton <dev@archonet.com>) Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL (Matthew Wakeling <matthew@flymine.org>)
Список	pgsql-performance

Дерево обсуждения

On Fri, Jan 22, 2010 at 1:42 AM, Richard Huxton <dev@archonet.com> wrote:

On 21/01/10 22:15, Tory M Blue wrote:
· Data distribution = In the 98mill records, there are 7000 unique

makeid's, and 21mill unique UID's. About 41mill of the records have
tagged=true

· Time to execute the following query with indices on makeid and
tagged = 90-120 seconds. The planner uses the webid index and filters on
tagged and then rechecks the webid index

* SELECT COUNT(DISTINCT uid ) AS active_users FROM
pixelpool.userstats WHERE makeid ='bmw-ferman' AND tagged =true*

· Time to execute the the same query with a combined index on makeid
and tagged = 60-100 seconds. The planner uses the combined index and then
filters tagged.

Two things:

1. You have got the combined index on (makeid, tagged) and not (tagged, makeid) haven't you? Just checking.

Yes we do

2. If it's mostly tagged=true you are interested in you can always use a partial index: CREATE INDEX ... (makeid) WHERE tagged
This might be a win even if you need a second index with WHERE NOT tagged.

Partial index doesn't seem to fit here due to the fact that there are 35-40% Marked True.

Didn't think about creating a second index for false, may give that a shot.

Also, either I've not had enough cofee yet, or a bitmap scan is an odd choice for only ~ 13000 rows out of 100 million.

* " -> Bitmap Index Scan on

idx_retargetuserstats_makeidtag (cost=0.00..360.20 rows=13175 width=0)"*

* " Index Cond: ((makeid = 'b1mw-ferman'::text)
AND (tagged = true))"*

Otherwise, see what Craig said.

I'm assuming this isn't the query that is CPU bound for a long time. Unless your table is horribly bloated, there's no reason for that judging by this plan.

It is, but not always, only when there are 10K more matches. And the explain unfortunately is sometimes way high or way low, so the expalin is hit and miss.

But the same sql that returns maybe 500 rows is pretty fast, it's the return of 10K+ rows that seems to stall and is CPU Bound.

Thanks

Tory

В списке pgsql-performance по дате отправления:

Предыдущее

От: Tory M Blue
Дата: 22 января 2010 г., 16:59:46
Сообщение: Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

Следующее

От: Richard Huxton
Дата: 22 января 2010 г., 17:25:13
Сообщение: Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

Предыдущее

Следующее