Re: Some ideas about Vacuum

Поиск
Список
Период
Сортировка
От Markus Schiltknecht
Тема Re: Some ideas about Vacuum
Дата
Msg-id 4784BC49.5090701@bluegap.ch
обсуждение исходный текст
Ответ на Some ideas about Vacuum  ("Gokulakannan Somasundaram" <gokul007@gmail.com>)
Ответы Re: Some ideas about Vacuum  ("Gokulakannan Somasundaram" <gokul007@gmail.com>)
Re: Some ideas about Vacuum  (Gregory Stark <stark@enterprisedb.com>)
Список pgsql-hackers
Hi,

Gokulakannan Somasundaram wrote:
> If we can ask the Vacuum process to scan 
> the WAL log, it can get all the relevant details on where it needs to 
> go.

You seem to be assuming that only few tuples have changed between 
vacuums, so that WAL could quickly guide the VACUUM processes to the 
areas where cleaning is necessary.

Let's drop that assumption, because by default, autovacuum_scale_factor 
is 20%, so a VACUUM process normally kicks in after 20% of tuples 
changed (disk space is cheap, I/O isn't). Additionally, there's a 
default nap time of one minute - and VACUUM is forced to take at least 
that much of a nap.

So it's easily possible having more dead tuples, than live ones. In such 
cases, scanning the WAL can easily takes *longer* than scanning the 
table, because the amount of WAL to read would be bigger.

>     One main restriction it places on the WAL Logs is that the WAL Log 
> needs to be archived only after all the transactions in it completes. In 
> other words, WAL logs need to be given enough space, to survive the 
> longest transaction of the database. It is possible to avoid this 
> situation by asking the Vacuum process to take the necessary information 
> out of WAL log and store it somewhere and wait for the long running 
> transaction to complete.

That would result in even more I/O...

>     The information of interest in WAL is only the table 
> inserts/updates/deletes. So if everyone accepts that this is a good 
> idea, till this point, there is a point in reading further.

Well, that's the information of interest, the question is where to store 
that information. Maintaining a dead space map looks a lot cheaper to 
me, than relying on the WAL to store that information.

>     Ultimately, what has been achieved till now is that we have made the 
> sequential scans made by the Vacuum process on each table into a few 
> random i/os. Of course there are optimizations possible to group the 
> random i/os and find some sequential i/o out of it. But still we need to 
> do a full index scan for all those indexes out there. HOT might have 
> saved some work over there. But i am pessimistic here and wondering how 
> it could have been improved. So it just strikes me, we can do the same 
> thing which we did just with the tables. Convert a seq scan of the 
> entire table into a random scan of few blocks. We can read the necessary 
> tuple information from the tuples, group them and hit at the index in 
> just those blocks and clean it up.

Sorry, I don't quite get what you are talking about here. What do 
indexes have to do with dead space? Why not just keep acting on the 
block level?

>    I can already hear people, saying that it is not always possible to 
> go back to index from table. There is this culprit called unstable 
> function based indexes.

No, there's no such thing. Citing [1]: "All functions and operators used 
in an index definition must be "immutable", that is, their results must 
depend only on their arguments and never on any outside influence".

Of course, you can mark any function IMMUTABLE and get unstable function 
based indexes, but that turns into a giant foot gun very quickly.

> P.S.:  Let the objections/opposing views have a subtle reduction in its 
> harshness.

I'm just pointing at things that are in conflict with my knowledge, 
assumptions and believes, all which might be erroneous, plain wrong or 
completely mad. ;-)

Regards

Markus

[1]: the Very Fine Postgres Manual on CREATE INDEX:
http://www.postgresql.org/docs/8.3/static/sql-createindex.html


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Akinde
Дата:
Сообщение: Re: VACUUM FULL out of memory
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Dynamic Partitioning using Segment Visibility Maps