Rapid disk usage spikes when updating large tables with GIN indexes

Поиск
Список
Период
Сортировка
От Jonathan Marks
Тема Rapid disk usage spikes when updating large tables with GIN indexes
Дата
Msg-id F16EF890-B574-487B-973F-4D8C13366DAB@gmail.com
обсуждение исходный текст
Ответы Re: Rapid disk usage spikes when updating large tables with GIN indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Hello!

We have a mid-sized database on RDS running 10.1 (32 cores, 240 GB RAM, 5TB total disk space, 20k PIOPS) with several
large(100GB+, tens of millions of rows) tables that use GIN indexes for full-text search. We at times need to index
verylarge (hundreds of pages) documents and as a result our tables have a mix of small (tens of tokens) to very large
(hundredsof thousands of tokens near to the tsvector 1MB limit). All our GIN indexes have fastupdate turned off — we
foundthat turning fastupdate on led to significant blocking and that we get better average performance with it turned
off.We’ve put a lot of effort into tuning our database over the last several years to the point where we have
acceptableread and write performance for these tables.  

One recurring, and predictable, issue that we have experienced regularly for multiple years is that inserting or
updatingrows in any table with GIN indexes results in extremely large drops in free disk space — i.e. inserting 10k
rowswith a total size of 10GB can result in the temporary loss of several hundred gigabytes of free disk space over 2-3
hours(and it could be more — we try to keep a 10-15% buffer of free disk space so that often represents almost all
availabledisk space). Once we stop the operation, free disk space rapidly recovers, which makes us believe that this
occursdue to logs, or some kind of temporary table. Our work_mem and maintenance_work_mem settings are pretty large
(12GBand 62GB, respectively). The database’s size on disk scarcely budges during this process. 

Unfortunately, we’re on RDS, so we’re unable to ssh directly into the instance to see what files are so large, and none
ofthe logs we can see (nor the wal logs) are large enough to explain this process. Any suggestions about where to look
tosee the cause of this problem (or about any settings we can tune or changes we could make to stop it) would be
greatlyappreciated. 

Thank you!

В списке pgsql-general по дате отправления:

Предыдущее
От: Christoph Moench-Tegeder
Дата:
Сообщение: Re: Recommended way to copy database files on Windows OS (to performfile system level backup)
Следующее
От: Andreas Kretschmer
Дата:
Сообщение: Re: Recommended way to copy database files on Windows OS (to perform file system level backup)