Re: Long-running and non-finishing VACUUM ANALYZE on large table

Поиск
Список
Период
Сортировка
От Jan
Тема Re: Long-running and non-finishing VACUUM ANALYZE on large table
Дата
Msg-id 56145236.2030805@j.mk-contact.de
обсуждение исходный текст
Ответ на Re: Long-running and non-finishing VACUUM ANALYZE on large table  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Long-running and non-finishing VACUUM ANALYZE on large table
Список pgsql-admin
Tom, thanks a lot for your assessment and please excuse my late response
(had to gather some data and to think this over)!

On 10/02/2015 07:16 PM, Tom Lane wrote:
> Jan<pgsql.admin@j.mk-contact.de>  writes:
>> That is, autovacuum on that table does not finish (even after six days
>> it is still running). To narrow down the problem, I stopped autovacuum,
>> temporarily disabled it (but ONLY on that particular table; not a
>> permanent solution, I know) and started a manual VACUUM ANALYZE on that
>> table via PGAdmin instead. In analogy to the autovacuum, the manually
>> started process is also running virtually endless (until I terminate it
>> after some days) but PGadmin's verbose output (see below) tells me at
>> least that it (apparently) tries to process the same table over and over
>> again. I'm definitely not a Postgres guru, but this can't be the
>> expected behaviour, even on such a big table, right?Did I overlook
>> something?
> No, that looks fairly normal: each time it fills up memory with TIDs
> of dead tuples, it has to go and scan the indexes to remove index entries
> pointing at those tuples.

I'm still not getting the math behind it. The below tuple stats show a
dead tuple count of 63,187,655 whereas the PGadmin output (see my
initial e-mail) reported the message "scanned index protein_hsps_pkey to
remove 178956753 row versions" nine times before I cancelled it. That
is, if one multiplies 178,956,753 by 9 it yields 1,610,610,777 (dead)
rows. But the latter number is much higher than the above 63m rows? Do I
compare the wrong numbers? Two additional related questions here:

(i) Is it possible to infer, based on the intermediate VACUUM ANALYZE
output, how many passes the VACUUM ANALYZE needs to go in total?
(ii) As you are mentioning the index on that table, would an index
rebuild be a faster alternative to get rid of old row versions?

> You could increase maintenance_work_mem to reduce the number of scans
> through the indexes, but it looks like you've already got that set to
> 1GB or so, so I'm not sure that increasing it further would be a good
> idea.

Exactly, maintenance_work_mem is currently set to 1GB.

> The good news is you are making progress.  Perhaps you should just wait it
> out.  However, it'd be good to try to estimate how many dead tuples there
> are (have you checked the physical size of the table?).  If there are
> vastly more dead tuples than live ones, a VACUUM FULL might be a less
> painful way to get out of this, though it would lock out other accesses
> to the table so you might have to take some downtime to do it.

I executed the following two commands to get both the current tuple
statistics and physical size.

1st)
-- CREATE EXTENSION pgstattuple;
SELECT * FROM pgstattuple('protein_hsps');

-- yielded:

table_len    1484828352512
tuple_count    11353611882
tuple_len    874228114914
tuple_percent    58.88
dead_tuple_count    63187655
dead_tuple_len    4810998304
dead_tuple_percent    0.32
free_space    495246133064
free_percent    33.35


2nd)
SELECT nspname || '.' || relname AS "relation",
        pg_size_pretty(pg_relation_size(C.oid)) AS "size"
FROM pg_class C LEFT JOIN pg_namespace N ON N.oid = C.relnamespace
WHERE relname IN ('protein_hsps',
       'protein_hsps_clustidx_on_origin',
       'protein_hsps_pkey')
ORDER BY pg_relation_size(C.oid) DESC;

-- yielded:

public.protein_hsps    1383 GB
public.protein_hsps_clustidx_on_origin    499 GB
public.protein_hsps_pkey    494 GB

> While you're waiting, you should try to figure out how you got into a
> state with so many dead tuples, and think about how to readjust things
> so it doesn't happen again.

Some more background: the whole database was recently migrated to a new
database server and thus restored from a dump file. That is, the table
'protein_hsps' and its index were build from scratch. Since then, the
only operations on that table were some column type conversions (e.g.,
integer to smallint, double to real). Data-wise, the only operations
were UPDATES on a single boolean column by adding precalculated values
(true/false) to EACH row in the database (~ 16bn rows). These UPDATEs
were most likely the cause for the (huge) number of dead tuples (0.32%,
see above), weren't they?

My original "reasoning" behind not partitioning the table was:

1st) Postgres supports bigint/serial, so billions of rows are possible,
in principle ...
2nd) A clustered index will help me to efficiently pick the rowset by a
certain criterion (column 'origin' in this case)
3rd) I'm actually never deleting rows but rather only do INSERTs and
UPDATEs (However, I just realized that UPDATEs do also cause dead tuples).
4th) Partitioning still involves a fair amount of manual work and
planning (setup triggers etc.)

Is there any good rule of thumb regarding how large a table should be at
most (row count-wise), before a table partitioning becomes rather mandatory?


Kind regards,
Jan


В списке pgsql-admin по дате отправления:

Предыдущее
От: Igor Neyman
Дата:
Сообщение: Re: [XX000] ERROR: could not open relation with OID "someoid"
Следующее
От: Payal Singh
Дата:
Сообщение: Re: PostgreSQL Dump Based Restore with tar format is failing with 'corrupt tar header' error