optimizing vacuum truncation scans

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема optimizing vacuum truncation scans
Дата
Msg-id CAMkU=1wK_At0JfE4Esm8Mgrfrr8mLTxpEYnvpVggPawKv6VC0A@mail.gmail.com
обсуждение исходный текст
Ответы Re: optimizing vacuum truncation scans  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Re: optimizing vacuum truncation scans  (Qingqing Zhou <zhouqq.postgres@gmail.com>)
Список pgsql-hackers
After a large bulk load aborted near the end, I decided to vacuum the main table so as to not leave a huge chunk of free space in the middle of it, before re-running the bulk load.  This vacuum took a frustratingly long time, as the backwards scan over the table to truncate the space did not trigger read-ahead (or read-behind) in the file system.  (Which was ext4).

In the hind-sight, the best thing to do would have been to open a psql session and take an access-share lock on the main table, so that it would prevent the truncation from occurring. Leaving cleaned, reusable space at the end of the table to be refilled upon the next (successful) run of the bulk load.  But it would be nice if the system could do some optimization for me, as hind sight never comes in time.

I did literally the simplest thing I could think of as a proof of concept patch, to see if it would actually fix things.  I just jumped back a certain number of blocks occasionally and prefetched them forward, then resumed the regular backward scan.  The patch and driving script are attached.  

It was very successful in this test.  I've report the fastest time to do the truncation for each setting of the prefetch size out of about 10 runs each, after removing a handful of cases where buffer pins prevented it from doing a full-size truncation.  (I used the minimum, because the server has other duties what would occasionally kick in and cause a spike in time for a few runs).  0 setting disables this feature.

JJFETCH  trunc_time(s)
0355.78
1350.5
2277.61
4213.9
8172.79
16138.87
32133.02
6476.25
12847.63
25635.28
51220.44
102418.6
204815.63
409615.34
819214.72
1638413.45
3276813.13

For context, the forward-scan part of this vacuum took about 80 sec (IQR 75 to 88).

Do we want something like this at all?  If so, what would we have to do to implement it for real?  I thought that maybe the back-off amount could start at one block and double each time, until it hits a maximum.  And that the maximum would be either seg_size, or 1/8 of effective_cache_size, whichever is lower.  (I don't think we would want to prefetch more than a fraction of effective_cache_size, or else it could have already been displaced by the time the scan gets back to it).

Also, what would be the best way to drill a hole through bufmgr.c into md.c so that the prefetch could specify an entire range, rather than looping over each individual block?

What would have to be done to detect people running on SSD and disable the feature, if anything?

I'll add this to next commitfest as WIP patch.

Cheers,

Jeff
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: alternative compression algorithms?
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Improve sleep processing of pg_rewind TAP tests