Re: Vacuum: allow usage of more than 1GB of work mem

Поиск

Список

Период

Сортировка

От	Pavan Deolasee
Тема	Re: Vacuum: allow usage of more than 1GB of work mem
Дата	14 сентября 2016 г. 09:46:00
Msg-id	CABOikdNVfVkY8hy4A94Kzkeu8-MQ4UFB7WXc8n+TgA876=dVWg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Vacuum: allow usage of more than 1GB of work mem (Pavan Deolasee <pavan.deolasee@gmail.com>)
Ответы	Re: Vacuum: allow usage of more than 1GB of work mem
Список	pgsql-hackers

Дерево обсуждения

On Wed, Sep 14, 2016 at 8:47 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

Sawada-san offered to reimplement the patch based on what I proposed upthread. In the new scheme of things, we will allocate a fixed size bitmap of length 2D bits per page where D is average page density of live + dead tuples. (The rational behind multiplying D by a factor of 2 is to consider worst case scenario where every tuple also has a LP_DIRECT line pointer). The value of D in most real world, large tables should not go much beyond, say 100, assuming 80 bytes wide tuple and 8K blocksize. That translates to about 25 bytes/page. So all TIDs with offset less than 2D can be represented by a single bit. We augment this with an overflow to track tuples which fall outside this limit. I believe this area will be small, say 10% of the total allocation.

So I cooked up the attached patch to track number of live/dead tuples found at offset 1 to MaxOffsetNumber. The idea was to see how many tuples actually go beyond the threshold of 2D offsets per page. Note that I am proposing to track 2D offsets via bitmap and rest via regular TID array.

So I ran a pgbench test for 2hrs with scale factor 500. autovacuum scale factor was set to 0.1 or 10%.

Some interesting bits:

postgres=# select relname, n_tup_ins, n_tup_upd, n_tup_hot_upd, n_live_tup, n_dead_tup, pg_relation_size(relid)/8192 as relsize, (n_live_tup+n_dead_tup)/(pg_relation_size(relid)/8192) as density from pg_stat_user_tables ;

------------------+-----------+-----------+---------------+------------+------------+---------+---------

pgbench_tellers | 5000 | 95860289 | 87701578 | 5000 | 0 | 3493 | 1

pgbench_branches | 500 | 95860289 | 94158081 | 967 | 0 | 1544 | 0

pgbench_accounts | 50000000 | 95860289 | 93062567 | 51911657 | 3617465 | 865635 | 64

pgbench_history | 95860289 | 0 | 0 | 95258548 | 0 | 610598 | 156

(4 rows)

Smaller tellers and branches tables bloat so much that the density as computed by live + dead tuples falls close to 1 tuple/page. So for such tables, the idea of 2D bits/page will fail miserably. But I think these tables are worst representatives and I would be extremely surprised if we ever find very large table bloated so much. But even then, this probably tells us that we can't solely rely on the density measure.

Another interesting bit about these small tables is that the largest used offset for these tables never went beyond 291 which is the value of MaxHeapTuplesPerPage. I don't know if there is something that prevents inserting more than MaxHeapTuplesPerPage offsets per heap page and I don't know at this point if this gives us upper limit for bits per page (may be it does).

For pgbench_accounts table, the maximum offset used was 121, though bulk of the used offsets were at the start of the page (see attached graph). Now the test did not create enough dead tuples to trigger autovacuum on pgbench_accounts table. So I ran a manul vacuum at the end. (There are about 5% dead tuples in the table by the time test finished)

postgres=# VACUUM VERBOSE pgbench_accounts ;

INFO: vacuuming "public.pgbench_accounts"

INFO: scanned index "pgbench_accounts_pkey" to remove 2797722 row versions

DETAIL: CPU 0.00s/9.39u sec elapsed 9.39 sec.

INFO: "pgbench_accounts": removed 2797722 row versions in 865399 pages

DETAIL: CPU 0.10s/7.01u sec elapsed 7.11 sec.

INFO: index "pgbench_accounts_pkey" now contains 50000000 row versions in 137099 pages

DETAIL: 2797722 index row versions were removed.

0 index pages have been deleted, 0 are currently reusable.

CPU 0.00s/0.00u sec elapsed 0.00 sec.

INFO: "pgbench_accounts": found 852487 removable, 50000000 nonremovable row versions in 865635 out of 865635 pages

DETAIL: 0 dead row versions cannot be removed yet.

There were 802256 unused item pointers.

Skipped 0 pages due to buffer pins.

0 pages are entirely empty.

CPU 0.73s/27.20u sec elapsed 27.98 sec.tuple count at each offset (offnum:all_tuples:dead_tuples)

For 2797722 dead line pointers, the current representation would have used 2797722 x 6 = 16786332 bytes of memory. The most optimal bitmap would have used 121 bits/page x 865399 pages = 13089159 bytes where as if we had provisioned 2D bits/page and assuming D = 64 based on the above calculation, we would have used 13846384 bytes of memory. This is about 18% less than the current representation. Of course, we would have allocated some space for overflow region, which will make the difference smaller/negligible. But the bitmaps would be extremely cheap to lookup during index scans.

Now may be I got lucky, may be I did nor run tests long enough (though I believe that may have worked in favour of bitmap), may be mostly HOT updated tables are not good candidate for testing and may be there are situations where the proposed bitmap representation will fail badly. But these tests show that the idea is at least worth considering and we can improve things for at least some workload. The question is can be avoid regression in not-so-good cases.

Thanks,

Pavan

Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Vacuum: allow usage of more than 1GB of work mem

Вложения