Re: [HACKERS] pg_upgrade to clusters with a different WAL segment size
От | Jeremy Schneider |
---|---|
Тема | Re: [HACKERS] pg_upgrade to clusters with a different WAL segment size |
Дата | |
Msg-id | CA+fnDAb+8wfxT6zU_pVQCi1TXxJvfFGqZVsut6aZ3R4+T=sfnA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [HACKERS] pg_upgrade to clusters with a different WAL segment size (Michael Paquier <michael.paquier@gmail.com>) |
Список | pgsql-hackers |
On Fri, Nov 10, 2017 at 4:04 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Sat, Nov 11, 2017 at 12:46 AM, Bossart, Nathan <bossartn@amazon.com> wrote: >> Allowing changes to the WAL segment size during pg_upgrade seems like a >> nice way to avoid needing a dump and load, so I would like to propose >> adding support for this. I'd be happy to submit patches for this in the >> next commitfest. > > That's a worthy goal. I'm also interested in this item and I helped Nathan with a little of the initial testing. Also, having changed redo sizes on other database platforms a couple times (a simple & safe runtime operation there), it seems to me that a feature like this would benefit PostgreSQL. I would add that we increased the max segment size in pg10 - but the handful of users who are in the most pain with very high activity rates on running systems are still limited to logical upgrades or dump-and-load to get the benefit of larger WAL segment sizes. From a technical perspective, it doesn't seem like it should be too complicated to implement this in pg_upgrade since you're moving into a new cluster anyway. On Fri, Nov 10, 2017 at 7:46 AM, Bossart, Nathan <bossartn@amazon.com> wrote: > We've had success with our initial testing of upgrades to larger WAL > segment sizes, including post-upgrade pgbench runs. Just to fill this out a little; our very first test was to take a 9.6.5 16mb-wal post-pgbench db and pg_upgrade it to 10.0 128mb-wal with no changes except removing the WAL size from check_control_data() then doing more pgbench runs on the same db post-upgrade. Checked for errors or problematic variation in TPS. More of a smoke-screen than a thorough test, but everything looks good so far. On Fri, Nov 10, 2017 at 7:46 AM, Bossart, Nathan <bossartn@amazon.com> wrote: > Beyond adjusting > check_control_data(), it looks like the 'pg_resetwal -l' call in > copy_xact_xlog_xid() may need to be adjusted to ensure that the WAL > starting address is set to a valid value. This was related to one interesting quirk we observed. The pg_upgrade tried to call pg_resetwal on the *new* database with a log sequence number that assumes the *old* wal size. In our test, it called "pg_resetwal -l 000000010000000200000071" which is an invalid filename with 128mb wal segment. In order to get a sensible filename, PostgreSQL took the "71" and wrapped three times and added to get a new WAL filename of "000000010000000500000011". This actually raises a really interesting concern with pg_upgrade and different WAL segment sizes. We have WAL filenames and then we have XLogSegNo. If pg_upgrade just chooses the next valid filename, then XLogSegNo will decrease and overlap when the WAL segment size goes up. If pg_upgrade calculates the next XLogSegNo then the WAL segment filename will decrease and overlap when the WAL segment size goes down. from xlog_internal.h: #define XLogFileName(fname, tli, logSegNo, wal_segsz_bytes) \ snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli, \ (uint32) ((logSegNo) / XLogSegmentsPerXLogId(wal_segsz_bytes)), \ (uint32) ((logSegNo) % XLogSegmentsPerXLogId(wal_segsz_bytes))) ... #define XLogFromFileName(fname, tli, logSegNo, wal_segsz_bytes) \ do { \ uint32 log; \ uint32 seg; \ sscanf(fname, "%08X%08X%08X", tli, &log, &seg); \ *logSegNo = (uint64) log * XLogSegmentsPerXLogId(wal_segsz_bytes) + seg; \ } while (0) If there's an archive_command script that simply copies WAL files somewhere then it might overwrite old logs when filenames overlap. I haven't surveyed all the postgres backup tools & scripts out there but it also seems conceivable that some tools will do the equivalent of XLogFromFileName() so that they can be aware of there are missing logs in a recovery scenario. Those tools could conceivably get broken by an overlapping/decremented XLogSegNo. I haven't fully thought through replication to consider whether anything could break there, but that's another open question. There are a few different approaches that could be taken to determine the next WAL sequence number. 1) simplest: increment filename's middle digit by 1, zero out the right digit. no filename overlap, don't need to know WAL segment size. has XLogSegNo overlap. 2) use the next valid WAL filename with segment size awareness. no filename overlap, has XLogSegNo overlap. 3) translate old DB filename to XLogSegNo, XLogSegNo++, translate to new DB filename. no XLogSegNo overlap, has filename overlap. 4) most complex: XLogSegNo++, translate to new DB filename, then increase filename until it's greater than last used filename in old db. Always has gaps, never overlaps. I'm thinking option 4 sounds the most correct. Any thoughts from others to the contrary? Anything else that is worth testing to look for potential problems after pg_upgrade with different WAL segment sizes? -Jeremy -- http://about.me/jeremy_schneider
В списке pgsql-hackers по дате отправления: