Re: [HACKERS] WAL logging problem in 9.4.3?
От | Noah Misch |
---|---|
Тема | Re: [HACKERS] WAL logging problem in 9.4.3? |
Дата | |
Msg-id | 20200316034647.GA1121601@rfd.leadboat.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] WAL logging problem in 9.4.3? (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Ответы |
Re: [HACKERS] WAL logging problem in 9.4.3?
(Noah Misch <noah@leadboat.com>)
|
Список | pgsql-hackers |
On Wed, Mar 04, 2020 at 04:29:19PM +0900, Kyotaro Horiguchi wrote: > The attached is back-patches from 9.5 through master. Thanks. I've made some edits. I'll plan to push the attached patches on Friday or Saturday. > log_newpage_range has been introduced at PG12. Fortunately the > required infrastructure is introduced at PG9.5 so what I need to do > for PG95-PG11 is back-patching the function and its counter part in > xlog_redo. It doen't WAL format itself but XLOG_FPI gets to have 2 or > more backup pages so the compatibility is forward only. That is, newer > minor versions read WAL from older minor versions, but not vise > versea. I'm not sure it is back-patchable so in the attached the > end-of-xact WAL feature is separated for PG9.5-PG11. > (000x-Add-end-of-xact-WAL-feature-of-WAL-skipping.patch) The main patch's introduction of XLOG_GIST_ASSIGN_LSN already creates a WAL upgrade hazard. Changing XLOG_FPI is riskier, because an old server will apply the first FPI and ignore the rest. For v11 and earlier, I decided to introduce XLOG_FPI_MULTI. It behaves exactly like XLOG_FPI, but this PANICs if one reads post-update WAL with a pre-update server. The main alternative would be to issue one XLOG_FPI per page, but I was concerned that would cause a notable performance loss. > In the patchset for 12, I let the functions heap_sync, > heapam_methods.finish_bulk_insert and table_finish_bulk_insert left > as-is. As the result heapam_finish_bulk_insert becomes no-op. heapam_finish_bulk_insert() is a static function, so I deleted it. > begin_heap_rewrite is a public function but the last parameter is > useless and rather harmful as it looks as if it works. So I removed > the parameter. Agreed. Also, pgxn contains no references to begin_heap_rewrite(). > For 9.6, mdexists() creates the specified file while bootstrap mode > and that leads to assertion failure of smgrDoPendingSyncs. So I made > CreateStorage not register pending sync while bootstrap mode. > gistbuild generates the LSN for root page of a newly created index > using gistGetFakeLSN(heap), which fires assertion failure in > gistGetFakeLSN. I think we should use index instead of heap there, > but it doesn't matter if we don't have the new pending sync mechanism, > so I didn't split it as a separate patch. v11 and v10, too, had the gistGetFakeLSN(heap) problem. I saw that and other problems by running the following on each branch: make check-world printf '%s\n%s\n%s\n' 'log_statement = all' 'wal_level = minimal' 'max_wal_senders = 0' >/tmp/minimal.conf make check-world TEMP_CONFIG=/tmp/minimal.conf make -C doc # catch breakage when XML changes don't work in SGML > For 9.5, pg_visibility does not exist so I dropped the test for the > module. The test would have required further changes to work in v11 or earlier, so I deleted the test. It was a low-importance test. > It lacks a part of TAP infrastructure nowadays we have, but I > want to have the test (and it actually found a bug I made during this > work). So I added a patch to back-patch TestLib.pm, PostgresNode.pm > and RecursiveCopy.pm along with 018_wal_optimize.pl. > (0004-Add-TAP-test-for-WAL-skipping-feature.patch) That is a good idea. Rather than make it specific to this test, I would like to back-patch all applicable test files from 9.6 src/test/recovery. I'll plan to push that one part on Thursday. Other notable changes: - Like you suggested earlier, I moved restoration of old*Subid from index_create() back to ATExecAddIndex(). I decided to do this when I observed that pg_idx_advisor calls index_create(). That's not a strong reason, but it was enough to change a decision that had been arbitrary. - Updated the wal_skip_threshold GUC category to WAL_SETTINGS, for consistency with the documentation move. Added the GUC to postgresql.conf.sample. - In released branches, I moved the new public struct fields to the end. This reduces the number of extensions requiring a recompile. From a grep of pgxn, one extension ("citus") relies on sizeof(RelationData), and nothing relies on sizeof(IndexStmt). - In 9.6, I rewrote the mdimmedsync() changes so the function never ignores FileSync() failure. Other observations: - The new test file takes ~62s longer on 9.6 and 9.5, mostly due to commit c61559ec3 first appearing in v10. I am fine with this. - This is the most-demanding back-branch fix I've ever attempted. Hopefully I've been smarter than usual while reviewing it, but that is unlikely. Thanks, nm
Вложения
В списке pgsql-hackers по дате отправления: