Re: logical decoding / rewrite map vs. maxAllocatedDescs

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: logical decoding / rewrite map vs. maxAllocatedDescs
Дата
Msg-id b346c3eb-0e65-29ec-47db-0d9a46feaa2c@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: logical decoding / rewrite map vs. maxAllocatedDescs  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Ответы Re: logical decoding / rewrite map vs. maxAllocatedDescs
Список pgsql-hackers

On 08/09/2018 07:47 PM, Alvaro Herrera wrote:
> On 2018-Aug-09, Tomas Vondra wrote:
> 
>> I suppose there are reasons why it's done this way, and admittedly the test
>> that happens to trigger this is a bit extreme (essentially running pgbench
>> concurrently with 'vacuum full pg_class' in a loop). I'm not sure it's
>> extreme enough to deem it not an issue, because people using many temporary
>> tables often deal with bloat by doing frequent vacuum full on catalogs.
> 
> Actually, it seems to me that ApplyLogicalMappingFile is just leaking
> the file descriptor for no good reason.  There's a different
> OpenTransientFile call in ReorderBufferRestoreChanges that is not
> intended to be closed immediately, but the other one seems a plain bug,
> easy enough to fix.
> 

Indeed. Adding a CloseTransientFile to ApplyLogicalMappingFile solves
the issue with hitting maxAllocatedDecs. Barring objections I'll commit
this shortly.

>> But wait, there's more - what happens after hitting the limit? We restart
>> the decoding process, and end up getting this:
>>
>>     ERROR:  53000: exceeded maxAllocatedDescs (492) while trying to open
>>     file "pg_logical/mappings/map-4000-4eb-1_60DE1E08-5376b5-537c6b"
>>     LOCATION:  OpenTransientFile, fd.c:2161
>>     LOG:  00000: starting logical decoding for slot "s"
>>     DETAIL:  streaming transactions committing after 1/6097DD48, reading
>>     WAL from 1/60275848
>>     LOCATION:  CreateDecodingContext, logical.c:414
>>     LOG:  00000: logical decoding found consistent point at 1/60275848
>>     DETAIL:  Logical decoding will begin using saved snapshot.
>>     LOCATION:  SnapBuildRestore, snapbuild.c:1872
>>     ERROR:  XX000: subtransaction logged without previous top-level txn
>>     record
> 
> Hmm, I wonder if we introduced some bug in f49a80c481f7.
> 

Not sure. I've tried reproducing it both on b767b3f2e5b7 (that's f49...
in REL_10_STABLE branch) and 09879f7536350 (that's the commit
immediately before it), and I can't reproduce it for some reason. I'll
try on Monday on the same laptop I used before (it's in the office, so I
can't try now).

But while running the tests on this machine, I repeatedly got pgbench
failures like this:

client 2 aborted in command 0 of script 0; ERROR:  could not read block
3 in file "base/16384/24573": read only 0 of 8192 bytes

That kinda reminds me the issues we're observing on some buildfarm
machines, I wonder if it's the same thing.

I suppose relfilenode 24573 is pg_class, which is the only table I'm
running vacuum full on here.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Improve behavior of concurrent TRUNCATE
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Allowing printf("%m") only where it actually works