Re: Long Running Update

Поиск

Список

Период

Сортировка

От	Vitalii Tymchyshyn
Тема	Re: Long Running Update
Дата	24 июня 2011 г. 08:45:43
Msg-id	4E0478CB.8080002@gmail.com обсуждение исходный текст
Ответ на	Re: Long Running Update (Harry Mantheakis <harry.mantheakis@riskcontrollimited.com>)
Список	pgsql-performance

Дерево обсуждения

24.06.11 14:16, Harry Mantheakis написав(ла):
>
> > EXPLAIN the statement
>
> Here is the EXPLAIN result:
>
> ----------------------------------------------------------------------
> QUERY PLAN
> ----------------------------------------------------------------------
> Hash Join (cost=2589312.08..16596998.47 rows=74558048 width=63)
> Hash Cond: (table_A.id = table_B.id)
> -> Seq Scan on table_A(cost=0.00..1941825.05 rows=95612705 width=47)
> -> Hash (cost=1220472.48..1220472.48 rows=74558048 width=20)
> -> Seq Scan on table_B(cost=0.00..1220472.48 rows=74558048 width=20)
> ----------------------------------------------------------------------
>
> The documentation says the 'cost' numbers are 'units of disk page
> fetches'.
>
> Do you, by any chance, have any notion of how many disk page fetches
> can be processed per second in practice - at least a rough idea?
>
> IOW how do I convert - guesstimate! - these numbers into (plausible)
> time values?
No chance. This are "virtual values" for planner only.
If I read correctly, your query should go into two phases: build hash
map on one table, then update second table using the map. Not that this
all valid unless you have any constraints (including foreign checks,
both sides) to check on any field of updated table. If you have, you'd
better drop them.
Anyway, this is two seq. scans. For a long query I am using a tool like
ktrace (freebsd) to get system read/write calls backend is doing. Then
with catalog tables you can map file names to relations
(tables/indexes). Then you can see which stage you are on and how fast
is it doing.
Note that partially cached tables are awful (in FreeBSD, dunno for
linux) for such a query - I suppose this is because instead on
sequential read, you get a lot of random reads that fools prefetch
logic. "dd if=table_file of=/dev/null bs=8m" helps me a lot. You can see
it it helps if CPU time goes up.

Best regards, Vitalii Tymchyshyn

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Long Running Update