Re: AW: AW: Plans for solving the VACUUM problem

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: AW: AW: Plans for solving the VACUUM problem
Дата
Msg-id 15440.990196536@sss.pgh.pa.us
обсуждение исходный текст
Ответ на AW: AW: Plans for solving the VACUUM problem  (Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>)
Список pgsql-hackers
Zeugswetter Andreas SB  <ZeugswetterA@wien.spardat.at> writes:
> It was my understanding, that the heap xtid is part of the key now,

It is not.

There was some discussion of doing that, but it fell down on the little
problem that in normal index-search cases you *don't* know the heap tid
you are looking for.

> And in above case, the keys (since identical except xtid) will stick close 
> together, thus caching will be good.

Even without key-collision problems, deleting N tuples out of a total of
M index entries will require search costs like this:

bulk delete in linear scan way:
O(M)        I/O costs (read all the pages)O(M log N)    CPU costs (lookup each TID in sorted list)

successive index probe way:
O(N log M)    I/O costs for probing indexO(N log M)    CPU costs for probing index (key comparisons)

For N << M, the latter looks like a win, but you have to keep in mind
that the constant factors hidden by the O() notation are a lot different
in the two cases.  In particular, if there are T indexentries per page,
the former I/O cost is really M/T * sequential read cost whereas the
latter is N log M * random read cost, yielding a difference in constant
factors of probably a thousand or two.  You get some benefit in the
latter case from caching the upper btree levels, but that's by
definition not a large part of the index bulk.  So where's the breakeven
point in reality?  I don't know but I suspect that it's at pretty small
N.  Certainly far less than one percent of the table, whereas I would
think that people would try to schedule VACUUMs at an interval where
they'd be reclaiming several percent of the table.

So, as I said to Hiroshi, this alternative looks to me like a possible
future refinement, not something we need to do in the first version.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Running config vars
Следующее
От: teg@redhat.com (Trond Eivind Glomsrød)
Дата:
Сообщение: Re: Need Postgresql ODBC Driver