Re: logical decoding / rewrite map vs. maxAllocatedDescs
От | Tomas Vondra |
---|---|
Тема | Re: logical decoding / rewrite map vs. maxAllocatedDescs |
Дата | |
Msg-id | b346c3eb-0e65-29ec-47db-0d9a46feaa2c@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: logical decoding / rewrite map vs. maxAllocatedDescs (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Ответы |
Re: logical decoding / rewrite map vs. maxAllocatedDescs
|
Список | pgsql-hackers |
On 08/09/2018 07:47 PM, Alvaro Herrera wrote: > On 2018-Aug-09, Tomas Vondra wrote: > >> I suppose there are reasons why it's done this way, and admittedly the test >> that happens to trigger this is a bit extreme (essentially running pgbench >> concurrently with 'vacuum full pg_class' in a loop). I'm not sure it's >> extreme enough to deem it not an issue, because people using many temporary >> tables often deal with bloat by doing frequent vacuum full on catalogs. > > Actually, it seems to me that ApplyLogicalMappingFile is just leaking > the file descriptor for no good reason. There's a different > OpenTransientFile call in ReorderBufferRestoreChanges that is not > intended to be closed immediately, but the other one seems a plain bug, > easy enough to fix. > Indeed. Adding a CloseTransientFile to ApplyLogicalMappingFile solves the issue with hitting maxAllocatedDecs. Barring objections I'll commit this shortly. >> But wait, there's more - what happens after hitting the limit? We restart >> the decoding process, and end up getting this: >> >> ERROR: 53000: exceeded maxAllocatedDescs (492) while trying to open >> file "pg_logical/mappings/map-4000-4eb-1_60DE1E08-5376b5-537c6b" >> LOCATION: OpenTransientFile, fd.c:2161 >> LOG: 00000: starting logical decoding for slot "s" >> DETAIL: streaming transactions committing after 1/6097DD48, reading >> WAL from 1/60275848 >> LOCATION: CreateDecodingContext, logical.c:414 >> LOG: 00000: logical decoding found consistent point at 1/60275848 >> DETAIL: Logical decoding will begin using saved snapshot. >> LOCATION: SnapBuildRestore, snapbuild.c:1872 >> ERROR: XX000: subtransaction logged without previous top-level txn >> record > > Hmm, I wonder if we introduced some bug in f49a80c481f7. > Not sure. I've tried reproducing it both on b767b3f2e5b7 (that's f49... in REL_10_STABLE branch) and 09879f7536350 (that's the commit immediately before it), and I can't reproduce it for some reason. I'll try on Monday on the same laptop I used before (it's in the office, so I can't try now). But while running the tests on this machine, I repeatedly got pgbench failures like this: client 2 aborted in command 0 of script 0; ERROR: could not read block 3 in file "base/16384/24573": read only 0 of 8192 bytes That kinda reminds me the issues we're observing on some buildfarm machines, I wonder if it's the same thing. I suppose relfilenode 24573 is pg_class, which is the only table I'm running vacuum full on here. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: