Re: Statistics Import and Export
От | Nathan Bossart |
---|---|
Тема | Re: Statistics Import and Export |
Дата | |
Msg-id | Z-3x2AnPCP331JA3@nathan обсуждение исходный текст |
Ответ на | Re: Statistics Import and Export (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: Statistics Import and Export
|
Список | pgsql-hackers |
On Tue, Apr 01, 2025 at 10:44:19PM -0700, Jeff Davis wrote: > On Tue, 2025-04-01 at 22:21 -0500, Nathan Bossart wrote: >> We might be able to improve this by inventing a new callback that fails for >> all formats except for custom with feesko() available. That would at least >> ensure hard failures if these assumptions change. That problably wouldn't >> be terribly invasive. I'm curious what you think. > > That sounds fine, I'd say do that if it feels reasonable, and if the > extra callbacks get too messy, we can just document the assumptions > instead. I did write a version with callbacks, but it felt a bit silly because it is very obviously intended for this one case. So, I removed them in the attached patch set. >> Hm. One thing we could do is to send the TocEntry to the callback and >> verify that matches the one we were expecting to see next (as set by a >> previous call). Does that sound like a strong enough check? > > Again, I'd just be practical here and do the check if it feels natural, > and if not, improve the comments so that someone modifying the code > would know where to look. Okay, here is an updated patch set. I did add some verification code, which ended up being a really good idea because it revealed a couple of cases we weren't handling: * Besides custom format calling WriteToc() twice to update the data offsets, tar format calls WriteToc() followed by RestoreArchive() to write restore.sql. I couldn't think of a great way to avoid executing the queries twice in this case, so I settled on allowing it for only that mode. While we don't expect the second set of queries to result in different stats definitions, even if it did, the worst case is that the content of restore.sql (which isn't used by pg_restore) would be different. I noticed some past discussion that seems to suggest that this format might be a candidate for deprecation [0], so I'm not sure it's worth doing anything fancier. * Our batching code assumes that stats entries are dumped in TOC order, which unfortunately wasn't true for formats that use RestoreArchive() for dumping. This is because RestoreArchive() does multiple passes through the TOC and selectively dumps certain entries each time. This is particularly troublesome for index stats and a subset of matview stats; both are in SECTION_POST_DATA, but matview stats that depend on matview data are dumped in RESTORE_PASS_POST_ACL, while all other stats data is dumped in RESTORE_PASS_MAIN. To deal with this, I propose moving all stats entries in SECTION_POST_DATA to RESTORE_PASS_POST_ACL, which ensures that we always dump stats in TOC order. One convenient side effect of this change is that we can revert a decent chunk of commit a0a4601765. It might be possible to do better via smarter lookahead code or a more sophisticated cache, but it's a bit late in the game for that. [0] https://postgr.es/m/20180727015306.fzlo4inv5i3zqr2c%40alap3.anarazel.de -- nathan
Вложения
В списке pgsql-hackers по дате отправления: