Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running
Дата	11 сентября 2007 г. 18:23:27
Msg-id	15496.1189535000@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running (El-Lotso <el.lotso@gmail.com>)
Ответы	Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running (El-Lotso <el.lotso@gmail.com>)
Список	pgsql-performance

Дерево обсуждения

El-Lotso <el.lotso@gmail.com> writes:
> sorry.. I sent this as I was about to go to bed and the explain analyse
> of the query w/ 4 tables joined per subquery came out.

It's those factor-of-1000 misestimates of the join sizes that are
killing you, eg this one:

>               ->  Hash Join  (cost=249.61..512.56 rows=1 width=87) (actual time=15.139..32.858 rows=969 loops=1)
>                     Hash Cond: (((test_db.ts.id)::text = (test_db.d.id)::text) AND (test_db.ts.start_timestamp =
test_db.trd.start_timestamp)AND (test_db.ts.ttype = test_db.trd.ttype)) 
>                     ->  Seq Scan on ts  (cost=0.00..226.44 rows=3244 width=40) (actual time=0.135..6.916 rows=3244
loops=1)
>                     ->  Hash  (cost=235.00..235.00 rows=835 width=47) (actual time=14.933..14.933 rows=1016 loops=1)

The single-row-result estimate persuades it to use a nestloop at the
next level up, and then when the output is actually 969 rows, that
means 969 executions of the other side of the upper join.

The two input size estimates are reasonably close to reality, so
the problem seems to be in the estimate of selectivity of the
join condition.  First off, do you have up-to-date statistics
for all the columns being joined here?  It might be that
increasing the statistics targets for those columns would help.

But what I'm a bit worried about is the idea that the join
conditions are correlated or even outright redundant; the
planner will not know that, and will make an unrealistic
estimate of their combined selectivity.  If that's the
case, you might need to redesign the table schema to
eliminate the redundancy before you'll get good plans.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее

От: Gregory Stark
Дата: 11 сентября 2007 г., 18:16:15
Сообщение: Re: random_page_costs - are defaults of 4.0 realistic for SCSI RAID 1

Следующее

От: Mario Weilguni
Дата: 11 сентября 2007 г., 18:42:39
Сообщение: Re: DRBD and Postgres: how to improve the perfomance?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

Предыдущее

Следующее