Re: DELETE performance problem

Поиск
Список
Период
Сортировка
От Luca Tettamanti
Тема Re: DELETE performance problem
Дата
Msg-id 20091125161309.GA19636@nb-core2.darkstar.lan
обсуждение исходный текст
Ответ на Re: DELETE performance problem  (marcin mank <marcin.mank@gmail.com>)
Ответы Re: DELETE performance problem
Список pgsql-performance
On Wed, Nov 25, 2009 at 04:22:47PM +0100, marcin mank wrote:
> On Tue, Nov 24, 2009 at 2:37 PM, Luca Tettamanti <kronos.it@gmail.com> wrote:
> >         ->  HashAggregate  (cost=1031681.15..1033497.20 rows=181605 width=8) (a
> > ctual time=571807.575..610178.552 rows=26185953 loops=1)
>
>
> This is Your problem. The system`s estimate for the number of distinct
> annotation_ids in t2 is wildly off.

Ah, I see.

> The disk activity is almost certainly swapping (You can check it
> iostat on the linux machine).

Nope, zero swap activity. Under Linux postgres tops up at about 4.4GB, leaving
3.6GB of page cache (nothing else is running right now).

> Can You try "analyze t2" just before the delete quety? maybe try
> raising statistics target for the annotation_id column.

I already tried, the estimation is still way off.

> If all else fails, You may try "set enable_hashagg to false" just
> before the query.

 Hash IN Join  (cost=1879362.27..11080576.17 rows=202376 width=6) (actual time=250281.607..608638.141 rows=26185953
loops=1)
    Hash Cond: (t1.annotation_id = t2.annotation_id)
       ->  Seq Scan on t1  (cost=0.00..661734.12 rows=45874812 width=14) (actual time=0.017..193661.353 rows=45874812
loops=1)
          ->  Hash  (cost=879289.12..879289.12 rows=60956812 width=8) (actual time=250271.012..250271.012 rows=60956812
loops=1)
               ->  Seq Scan on t2  (cost=0.00..879289.12 rows=60956812 width=8) (actual time=0.023..178297.862
rows=60956812loops=1) 
 Total runtime: 900019.033 ms
(6 rows)

This is after an analyze.

The alternative query suggested by Shrirang Chitnis:

DELETE FROM t1 WHERE EXISTS (SELECT 1 FROM t2 WHERE t1.annotation_id = t2.annotation_id)

performs event better:

 Seq Scan on t1  (cost=0.00..170388415.89 rows=22937406 width=6) (actual time=272.625..561241.294 rows=26185953
loops=1)
    Filter: (subplan)
       SubPlan
            ->  Index Scan using t2_idx on t2  (cost=0.00..1113.63 rows=301 width=0) (actual time=0.008..0.008 rows=1
loops=45874812)
                   Index Cond: ($0 = annotation_id)
 Total runtime: 629426.014 ms
(6 rows)

Will try on the full data set.

thanks,
Luca

В списке pgsql-performance по дате отправления:

Предыдущее
От: marcin mank
Дата:
Сообщение: Re: DELETE performance problem
Следующее
От: Grzegorz Jaśkiewicz
Дата:
Сообщение: Re: DELETE performance problem