Preventing free space from being reused

Поиск
Список
Период
Сортировка
От Noah Bergbauer
Тема Preventing free space from being reused
Дата
Msg-id CABjy+RhbFu_Hs8ZEiOzaPaJSGB9jqFF0gDU5gtwCLiurG3NLjQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Preventing free space from being reused  (Michael Lewis <mlewis@entrata.com>)
Список pgsql-general
Hello,

I am working on a system to store and analyze JSON-based log data. The idea is to store tuples with a monotonically increasing timestamp (i.e. CURRENT_TIMESTAMP) in a table and run some queries that focus on specific time windows (~one hour). It seems to me that a BRIN index is designed exactly for this type of application.

However, I find that a BRIN index with standard settings (pages_per_range = 128) with very simple filters (e.g. ts > (CURRENT_TIMESTAM - INTERVAL '1 hour')) causes ~20x more pages to be scanned than I would expect. A majority of these tuples is then discarded due to index rechecking. Inspecting the BRIN pages using the pageinspect extension reveals the following problem: it seems that if a page is left with some free space that can not be filled right away (because the tuples being inserted are too large), then this hole can be filled at some arbitrary later point in time (sometimes hours later) once a small enough tuple comes along. This substantially reduces the BRIN index's effectiveness.

I confirmed this theory by CLUSTERing the table using a temporary btree index. Suddenly the query performance exactly matched my estimates.

The JSON data structure is often similar, so the table is stored on ZFS with compression. Hence, filling these holes brings no benefit - they were taking up no space thanks to the compression. On the other hand, rewriting old pages in order to fill up holes also creates a lot of fragmentation because of ZFS's copy-on-write semantics.

In summary, all of these problems would be solved if there was some way to stop the database system from ever reusing free space.

Bonus question: what's the best TOAST configuration for this use case? Is there any point in out-of-line storage when all tuples are going to be quite large (i.e. > 1kB)? Is there any benefit in having postgresql compress the data when ZFS runs compression regardless?

Thank you,
Noah Bergbauer

В списке pgsql-general по дате отправления:

Предыдущее
От: Guy Burgess
Дата:
Сообщение: Re: PostgreSQL occasionally unable to rename WAL files (NTFS)
Следующее
От: Tom Lane
Дата:
Сообщение: Re: kernel.shmmax and kernel.shmall for Linux server