Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

Поиск
Список
Период
Сортировка
От Anastasia Lubennikova
Тема Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Дата
Msg-id cafbd48d-9b66-145d-6804-dfa6ed7e31fa@postgrespro.ru
обсуждение исходный текст
Ответ на Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
13.09.2019 4:04, Peter Geoghegan wrote:
On Wed, Sep 11, 2019 at 2:04 PM Peter Geoghegan <pg@bowt.ie> wrote:
I think that the new WAL record has to be created once per posting
list that is generated, not once per page that is deduplicated --
that's the only way that I can see that avoids a huge increase in
total WAL volume. Even if we assume that I am wrong about there being
value in making deduplication incremental, it is still necessary to
make the WAL-logging behave incrementally.
It would be good to hear your thoughts on this _bt_dedup_one_page()
WAL volume/"write amplification" issue.

Attached is v14 based on v12 (v13 changes are not merged).

In this version, I fixed the bug you mentioned and also fixed nbtinsert,
so that it doesn't save newposting in xlog record anymore.

I tested patch with nbtree_wal_test, and found out that the real issue is
not the dedup WAL records themselves, but the full page writes that they trigger.
Here are test results (config is standard, except fsync=off to speedup tests):

'FPW on' and 'FPW off' are tests on v14.
NO_IMAGE is the test on v14 with REGBUF_NO_IMAGE in bt_dedup_one_page().

+-------------------+-----------+-----------+----------------+-----------+
|        ---        |   FPW on  |  FPW off  | FORCE_NO_IMAGE |   master  |
+-------------------+-----------+-----------+----------------+-----------+
| time              | 09:12 min | 06:56 min | 06:24 min      | 08:10 min |
| nbtree_wal_volume | 8083 MB   | 2128 MB   | 2327 MB        | 2439 MB   |
| index_size        | 169 MB    | 169 MB    | 169 MB         | 1118 MB   |
+-------------------+-----------+-----------+----------------+-----------+

With random insertions into btree it's highly possible that deduplication will often be
the first write after checkpoint, and thus will trigger FPW, even if only a few tuples were compressed.
That's why there is no significant difference with log_newpage_buffer() approach.
And that's why "lazy" deduplication doesn't help to decrease amount of WAL.

Also, since the index is packed way better than before, it probably benefits less of wal_compression.

One possible "fix" to decrease WAL amplification is to add REGBUF_NO_IMAGE flag to XLogRegisterBuffer in bt_dedup_one_page().
As you can see from test result, it really eliminates the problem of inadequate WAL amount.
However, I doubt that it is a crash-safe idea.

Another, and more realistic approach is to make deduplication less intensive:
if freed space is less than some threshold, fall back to not changing page at all and not generating xlog record.

Probably that was the reason, why patch became faster after I added BT_COMPRESS_THRESHOLD in early versions,
not because deduplication itself is cpu bound or something, but because WAL load decreased.

So I propose to develop this idea. The question is how to choose threshold.
I wouldn't like to introduce new user settings.  Any ideas?

I also noticed that the number of checkpoints differ between tests:
select checkpoints_req from pg_stat_bgwriter ;
+-----------------+---------+---------+----------------+--------+
|       ---       |  FPW on | FPW off | FORCE_NO_IMAGE | master |
+-----------------+---------+---------+----------------+--------+
| checkpoints_req |      16 |       7 |              8 |     10 |
+-----------------+---------+---------+----------------+--------+
And I struggle to explain the reason of this.
Do you understand what can cause the difference?

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Define jsonpath functions as stable
Следующее
От: Robert Haas
Дата:
Сообщение: Re: block-level incremental backup