Re: Vacuum: allow usage of more than 1GB of work mem

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Vacuum: allow usage of more than 1GB of work mem
Дата
Msg-id f8d1dd03-c632-86fa-15cf-6f5633350b78@iki.fi
обсуждение исходный текст
Ответ на Re: Vacuum: allow usage of more than 1GB of work mem  (Claudio Freire <klaussfreire@gmail.com>)
Ответы Re: Vacuum: allow usage of more than 1GB of work mem  (Claudio Freire <klaussfreire@gmail.com>)
Список pgsql-hackers
On 03/04/18 17:20, Claudio Freire wrote:
> Ok, rebased patches attached

Thanks! I took a look at this.

First, now that the data structure is more complicated, I think it's 
time to abstract it, and move it out of vacuumlazy.c. The Tid Map needs 
to support the following operations:

* Add TIDs, in order (in 1st phase of vacuum)
* Random lookup, by TID (when scanning indexes)
* Iterate through all TIDs, in order (2nd pass over heap)

Let's add a new source file to hold the code for the tid map data 
structure, with functions corresponding those operations.

I took a stab at doing that, and I think it makes vacuumlazy.c nicer.

Secondly, I'm not a big fan of the chosen data structure. I think the 
only reason that the segmented "multi-array" was chosen is that each 
"segment" works is similar to the simple array that we used to have. 
After putting it behind the abstraction, it seems rather ad hoc. There 
are many standard textbook data structures that we could use instead, 
and would be easier to understand and reason about, I think.

So, I came up with the attached patch. I used a B-tree as the data 
structure. Not sure it's the best one, I'm all ears for suggestions and 
bikeshedding on alternatives, but I'm pretty happy with that. I would 
expect it to be pretty close to the simple array with binary search in 
performance characteristics. It'd be pretty straightforward to optimize 
further, and e.g. use a bitmap of OffsetNumbers or some other more dense 
data structure in the B-tree leaf pages, but I resisted doing that as 
part of this patch.

I haven't done any performance testing of this (and not much testing in 
general), but at least the abstraction seems better this way. 
Performance testing would be good, too. In particular, I'd like to know 
how this might affect the performance of lazy_tid_reaped(). That's a hot 
spot when vacuuming indexes, so we don't want to add any cycles there. 
Was there any ready-made test kits on that in this thread? I didn't see 
any at a quick glance, but it's a long thread..

- Heikki

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [HACKERS] MERGE SQL Statement for PG11
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: pgsql: New files for MERGE