Re: Long-running and non-finishing VACUUM ANALYZE on large table

Поиск
Список
Период
Сортировка
От Jan
Тема Re: Long-running and non-finishing VACUUM ANALYZE on large table
Дата
Msg-id 5614649D.6050906@j.mk-contact.de
обсуждение исходный текст
Ответ на Re: Long-running and non-finishing VACUUM ANALYZE on large table  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-admin
On 10/07/2015 01:53 AM, Tom Lane wrote:
> Jan <pgsql.admin@j.mk-contact.de> writes:
>> I'm still not getting the math behind it. The below tuple stats show a
>> dead tuple count of 63,187,655 whereas the PGadmin output (see my
>> initial e-mail) reported the message "scanned index protein_hsps_pkey to
>> remove 178956753 row versions" nine times before I cancelled it. That
>> is, if one multiplies 178,956,753 by 9 it yields 1,610,610,777 (dead)
>> rows. But the latter number is much higher than the above 63m rows? Do I
>> compare the wrong numbers?
> There's something awfully wacky about that.  I suspect that pgstattuple
> is somehow giving wrong answers, but I don't see any plausible theory
> as to why.
>
>> Some more background: the whole database was recently migrated to a new
>> database server and thus restored from a dump file. That is, the table
>> 'protein_hsps' and its index were build from scratch. Since then, the
>> only operations on that table were some column type conversions (e.g.,
>> integer to smallint, double to real). Data-wise, the only operations
>> were UPDATES on a single boolean column by adding precalculated values
>> (true/false) to EACH row in the database (~ 16bn rows). These UPDATEs
>> were most likely the cause for the (huge) number of dead tuples (0.32%,
>> see above), weren't they?
> Such an UPDATE should have left the table 50% dead tuples, since every
> row would leave behind a dead version.  On the other hand, an ALTER
> COLUMN TYPE operation should rewrite the whole table and leave no dead
> tuples behind.  No matter which one you did last, it doesn't square with
> 0.32% dead tuples.
>
> My best guess at this point is that what you did last is an UPDATE,
> so you have 50% dead tuples, and for some reason pgstattuple is not
> telling you the truth about that.  But the VACUUM is showing reality.

Exactly, I altered the column types first. (That had been on my list for
a long time, and I used the planned database server downtime for these
optimizations.) After that, back in production mode, the aforementioned
UPDATEs were applied (not at once, but in batches of rows). That is, I
also assume 50% dead tuples right now.

> How long did those UPDATEs and ALTER TABLEs take?  If an ALTER seemed
> tolerable then maybe what you want to do is VACUUM FULL, which would
> be roughly the same cost.

I protocolled the exact execution times of each ALTER statement
(unfortunately not available right now because I'm at home) and these
took roundabout 1-2 days each.
Now I will go for the VACUUM FULL and will report back here once it's
done. *fingers crossed*

Many thanks for your time!

Kind regards,
Jan


В списке pgsql-admin по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Long-running and non-finishing VACUUM ANALYZE on large table
Следующее
От: Nikolay Popov
Дата:
Сообщение: Pacemaker dynamic membership