Обсуждение: Synchronized snapshots versus multiple databases
I've thought of another nasty problem for the sync-snapshots patch. Consider the following sequence of events: 1. Transaction A, which is about to export a snapshot, is running in database X. 2. Transaction B is making some changes in database Y. 3. A takes and exports a snapshot showing B's xid as running. 4. Transaction B ends. 5. Autovacuum launches in database Y. It sees nothing running in Y, so it decides it can vacuum dead rows right up to nextXid,including anything B deleted. 6. Transaction C starts in database Y, and imports the snapshot from A. Now it thinks it can see rows deleted by B ... butvacuum is busy removing them, or maybe already finished doing so. The problem here is that A's xmin is ignored by GetOldestXmin when calculating cutoff XIDs for non-shared tables in database Y, so it doesn't protect would-be adoptees of the exported snapshot. I can see a few alternatives, none of them very pleasant: 1. Restrict exported snapshots to be loaded only by transactions running in the same database as the exporter. This would fix the problem, but it cuts out one of the main use-cases for sync snapshots, namely getting cluster-wide-consistent dumps in pg_dumpall. 2. Allow a snapshot exported from another database to be loaded so long as this doesn't cause the DB-local value of GetOldestXmin to go backwards. However, in scenarios such as the above, C is certain to fail such a test. To make it work, pg_dumpall would have to start "advance guard" transactions in each database before it takes the intended-to-be-shared snapshot, and probably even wait for these to be oldest. Ick. 3. Remove the optimization that lets GetOldestXmin ignore XIDs outside the current database. This sounds bad, but OTOH I don't think there's ever been any proof that this optimization is worth much in real-world usage. We've already had to lobotomize that optimization for walsender processes, anyway. 4. Somehow mark the xmin of a process that has exported a snapshot so that it will be honored in all DBs not just the current one. The difficulty here is that we'd need to know *at the time the snap is taken* that it's going to be exported. (Consider the scenario above, except that A doesn't get around to exporting the snapshot it took in step 3 until between steps 5 and 6. If the xmin wasn't already marked as globally applicable when vacuum looked at it in step 5, we lose.) This is do-able but it will contort the user-visible API of the sync snapshots feature. One way we could do it is to require that transactions that want to export snapshots set a transaction mode before they take their first snapshot. Thoughts, better ideas? regards, tom lane
On Oct21, 2011, at 17:36 , Tom Lane wrote: > 1. Restrict exported snapshots to be loaded only by transactions running > in the same database as the exporter. This would fix the problem, but > it cuts out one of the main use-cases for sync snapshots, namely getting > cluster-wide-consistent dumps in pg_dumpall. Isn't the use-case getting consistent *parallel* dumps of a single database rather than consistent dump of multiple databases? Since we don't have atomic cross-database commits, what does using the same snapshot to dump multiple databases buy us? On that grounds, +1 for option 1 here. > 3. Remove the optimization that lets GetOldestXmin ignore XIDs outside > the current database. This sounds bad, but OTOH I don't think there's > ever been any proof that this optimization is worth much in real-world > usage. We've already had to lobotomize that optimization for walsender > processes, anyway. Hm, we've told people who wanted cross-database access to tables in the past to either * use dblink or * not split their tables over multiple databases in the first place, and to use schemas instead If we remove the GetOldestXmin optimization, we're essentially reversing course on this. Do we really wanna go there? best regards, Florian Pflug
On 10/21/2011 12:05 PM, Florian Pflug wrote: > On Oct21, 2011, at 17:36 , Tom Lane wrote: >> 1. Restrict exported snapshots to be loaded only by transactions running >> in the same database as the exporter. This would fix the problem, but >> it cuts out one of the main use-cases for sync snapshots, namely getting >> cluster-wide-consistent dumps in pg_dumpall. > Isn't the use-case getting consistent *parallel* dumps of a single database > rather than consistent dump of multiple databases? Since we don't have atomic > cross-database commits, what does using the same snapshot to dump multiple > databases buy us? That was my understanding of the use case. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > On 10/21/2011 12:05 PM, Florian Pflug wrote: >> On Oct21, 2011, at 17:36 , Tom Lane wrote: >>> 1. Restrict exported snapshots to be loaded only by transactions running >>> in the same database as the exporter. This would fix the problem, but >>> it cuts out one of the main use-cases for sync snapshots, namely getting >>> cluster-wide-consistent dumps in pg_dumpall. >> Isn't the use-case getting consistent *parallel* dumps of a single database >> rather than consistent dump of multiple databases? Since we don't have atomic >> cross-database commits, what does using the same snapshot to dump multiple >> databases buy us? > That was my understanding of the use case. Um, which one are you supporting? Anyway, the value of using the same snapshot across all of a pg_dumpall run would be that you could be sure that what you'd dumped concerning role and tablespace objects was consistent with what you then dump about database-local objects. (In principle, anyway --- I'm not sure how much of that happens under SnapshotNow rules because of use of backend functions. But you'll most certainly never be able to guarantee it if pg_dumpall can't export its snapshot to each subsidiary pg_dump run.) regards, tom lane
Florian Pflug <fgp@phlo.org> writes: > On Oct21, 2011, at 17:36 , Tom Lane wrote: >> 3. Remove the optimization that lets GetOldestXmin ignore XIDs outside >> the current database. This sounds bad, but OTOH I don't think there's >> ever been any proof that this optimization is worth much in real-world >> usage. We've already had to lobotomize that optimization for walsender >> processes, anyway. > Hm, we've told people who wanted cross-database access to tables in the > past to either > * use dblink or > * not split their tables over multiple databases in the first place, > and to use schemas instead > If we remove the GetOldestXmin optimization, we're essentially reversing > course on this. Do we really wanna go there? Huh? The behavior of GetOldestXmin is purely a backend-internal matter. I don't see how it's related to cross-database access --- or at least, changing this would not represent a significant move towards supporting that. regards, tom lane
On Fri, Oct 21, 2011 at 11:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I've thought of another nasty problem for the sync-snapshots patch. > > 1. Restrict exported snapshots to be loaded only by transactions running > in the same database as the exporter. This would fix the problem, but > it cuts out one of the main use-cases for sync snapshots, namely getting > cluster-wide-consistent dumps in pg_dumpall. > > 2. Allow a snapshot exported from another database to be loaded so long > as this doesn't cause the DB-local value of GetOldestXmin to go > backwards. However, in scenarios such as the above, C is certain to > fail such a test. To make it work, pg_dumpall would have to start > "advance guard" transactions in each database before it takes the > intended-to-be-shared snapshot, and probably even wait for these to be > oldest. Ick. > > 3. Remove the optimization that lets GetOldestXmin ignore XIDs outside > the current database. This sounds bad, but OTOH I don't think there's > ever been any proof that this optimization is worth much in real-world > usage. We've already had to lobotomize that optimization for walsender > processes, anyway. > > 4. Somehow mark the xmin of a process that has exported a snapshot so > that it will be honored in all DBs not just the current one. The > difficulty here is that we'd need to know *at the time the snap is > taken* that it's going to be exported. (Consider the scenario above, > except that A doesn't get around to exporting the snapshot it took in > step 3 until between steps 5 and 6. If the xmin wasn't already marked > as globally applicable when vacuum looked at it in step 5, we lose.) > This is do-able but it will contort the user-visible API of the sync > snapshots feature. One way we could do it is to require that > transactions that want to export snapshots set a transaction mode > before they take their first snapshot. I am unexcited by #2 on usability grounds. I agree with you that #3 might end up being a fairly small pessimization in practice, but I'd be inclined to just do #1 for now and revisit the issue when and if someone shows an interest in revamping pg_dumpall to do what you're proposing (and hopefully a bunch of other cleanup too). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Oct21, 2011, at 19:09 , Tom Lane wrote: > Florian Pflug <fgp@phlo.org> writes: >> On Oct21, 2011, at 17:36 , Tom Lane wrote: >>> 3. Remove the optimization that lets GetOldestXmin ignore XIDs outside >>> the current database. This sounds bad, but OTOH I don't think there's >>> ever been any proof that this optimization is worth much in real-world >>> usage. We've already had to lobotomize that optimization for walsender >>> processes, anyway. > >> Hm, we've told people who wanted cross-database access to tables in the >> past to either > >> * use dblink or > >> * not split their tables over multiple databases in the first place, >> and to use schemas instead > >> If we remove the GetOldestXmin optimization, we're essentially reversing >> course on this. Do we really wanna go there? > > Huh? The behavior of GetOldestXmin is purely a backend-internal matter. > I don't see how it's related to cross-database access --- or at least, > changing this would not represent a significant move towards supporting > that. AFAIR, the performance hit we'd take by making the vacuum cutoff point (i.e. GetOldestXmin()) global instead of database-local has been repeatedly used in the past as an against against cross-database queries. I have to admit that I currently cannot seem to find an entry in the archives to back that up, though. best regards, Florian Pflug
On Fri, Oct 21, 2011 at 1:40 PM, Florian Pflug <fgp@phlo.org> wrote: > AFAIR, the performance hit we'd take by making the vacuum cutoff point > (i.e. GetOldestXmin()) global instead of database-local has been repeatedly > used in the past as an against against cross-database queries. I have to > admit that I currently cannot seem to find an entry in the archives to > back that up, though. I think the main argument against cross-database queries is that every place in the backend that, for example, uses an OID to identify a table would need to be modified to use a database OID and a table OID.Even if the distributed performance penalty of sucha change doesn't bother you, the amount of code churn that it would take to make such a change is mind-boggling. I haven't seen anyone explain why they really need this feature anyway, and I think it's going in the wrong direction. IMHO, anyone who wants to be doing cross-database queries should be using schemas instead, and if that's not workable for some reason, then we should improve the schema implementation until it becomes workable. I think that the target use case for separate databases ought to be multi-tenancy, but what is needed there is actually more isolation (e.g. wrt/role names, cluster-wide visibility of pg_database contents, etc.), not less. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10/21/2011 01:06 PM, Tom Lane wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >> On 10/21/2011 12:05 PM, Florian Pflug wrote: >>> On Oct21, 2011, at 17:36 , Tom Lane wrote: >>>> 1. Restrict exported snapshots to be loaded only by transactions running >>>> in the same database as the exporter. This would fix the problem, but >>>> it cuts out one of the main use-cases for sync snapshots, namely getting >>>> cluster-wide-consistent dumps in pg_dumpall. >>> Isn't the use-case getting consistent *parallel* dumps of a single database >>> rather than consistent dump of multiple databases? Since we don't have atomic >>> cross-database commits, what does using the same snapshot to dump multiple >>> databases buy us? >> That was my understanding of the use case. > Um, which one are you supporting? #1 seemed OK from this POV. Everything else looks ickier and/or more fragile, at first glance anyway. > Anyway, the value of using the same snapshot across all of a pg_dumpall > run would be that you could be sure that what you'd dumped concerning > role and tablespace objects was consistent with what you then dump about > database-local objects. (In principle, anyway --- I'm not sure how > much of that happens under SnapshotNow rules because of use of backend > functions. But you'll most certainly never be able to guarantee it if > pg_dumpall can't export its snapshot to each subsidiary pg_dump run.) > > For someone who is concerned with that, maybe pg_dumpall could have an option to take an EXCLUSIVE lock on the shared catalogs? cheers andrew
On Oct21, 2011, at 19:47 , Robert Haas wrote: > On Fri, Oct 21, 2011 at 1:40 PM, Florian Pflug <fgp@phlo.org> wrote: >> AFAIR, the performance hit we'd take by making the vacuum cutoff point >> (i.e. GetOldestXmin()) global instead of database-local has been repeatedly >> used in the past as an against against cross-database queries. I have to >> admit that I currently cannot seem to find an entry in the archives to >> back that up, though. > I haven't seen anyone explain why they really need this feature > anyway, and I think it's going in the wrong direction. IMHO, anyone > who wants to be doing cross-database queries should be using schemas > instead, and if that's not workable for some reason, then we should > improve the schema implementation until it becomes workable. I think > that the target use case for separate databases ought to be > multi-tenancy, but what is needed there is actually more isolation > (e.g. wrt/role names, cluster-wide visibility of pg_database contents, > etc.), not less. Agreed. I wasn't trying to argue for cross-database queries - quite the opposite, actually. My point was more that since we've used database isolation as an argument against cross-database queries in the past, we shouldn't sacrifice it now for synchronized snapshots. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > AFAIR, the performance hit we'd take by making the vacuum cutoff point > (i.e. GetOldestXmin()) global instead of database-local has been repeatedly > used in the past as an against against cross-database queries. I have to > admit that I currently cannot seem to find an entry in the archives to > back that up, though. To my mind, the main problem with cross-database queries is that none of the backend is set up to deal with more than one set of system catalogs. regards, tom lane
On Fri, Oct 21, 2011 at 2:06 PM, Florian Pflug <fgp@phlo.org> wrote: > On Oct21, 2011, at 19:47 , Robert Haas wrote: >> On Fri, Oct 21, 2011 at 1:40 PM, Florian Pflug <fgp@phlo.org> wrote: >>> AFAIR, the performance hit we'd take by making the vacuum cutoff point >>> (i.e. GetOldestXmin()) global instead of database-local has been repeatedly >>> used in the past as an against against cross-database queries. I have to >>> admit that I currently cannot seem to find an entry in the archives to >>> back that up, though. > >> I haven't seen anyone explain why they really need this feature >> anyway, and I think it's going in the wrong direction. IMHO, anyone >> who wants to be doing cross-database queries should be using schemas >> instead, and if that's not workable for some reason, then we should >> improve the schema implementation until it becomes workable. I think >> that the target use case for separate databases ought to be >> multi-tenancy, but what is needed there is actually more isolation >> (e.g. wrt/role names, cluster-wide visibility of pg_database contents, >> etc.), not less. > > Agreed. I wasn't trying to argue for cross-database queries - quite the opposite, > actually. My point was more that since we've used database isolation as an > argument against cross-database queries in the past, we shouldn't sacrifice > it now for synchronized snapshots. Right, I agree. It might be nice to take a cluster-wide dump that is guaranteed to be transactionally consistent, but I bet a lot of people would actually be happier to see us go the opposite direction - e.g. give each database its own XID space, so that activity in one database doesn't accelerate the need for anti-wraparound vacuums in another database. Not sure that could ever actually happen, but the point is that people probably should not be relying on serializability across databases too much, because the whole point of the multiple databases feature is to have multiple, independent databases in one cluster that are thoroughly isolated from each other, and any future changes we make should probably lean in that direction. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Oct 21, 2011 at 11:36 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 1. Restrict exported snapshots to be loaded only by transactions running >> in the same database as the exporter. �This would fix the problem, but >> it cuts out one of the main use-cases for sync snapshots, namely getting >> cluster-wide-consistent dumps in pg_dumpall. > I am unexcited by #2 on usability grounds. I agree with you that #3 > might end up being a fairly small pessimization in practice, but I'd > be inclined to just do #1 for now and revisit the issue when and if > someone shows an interest in revamping pg_dumpall to do what you're > proposing (and hopefully a bunch of other cleanup too). Seems like that is the consensus view, so that's what I'll do. regards, tom lane
On Fri, Oct 21, 2011 at 4:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I can see a few alternatives, none of them very pleasant: > > 1. Restrict exported snapshots to be loaded only by transactions running > in the same database as the exporter. This would fix the problem, but > it cuts out one of the main use-cases for sync snapshots, namely getting > cluster-wide-consistent dumps in pg_dumpall. > 4. Somehow mark the xmin of a process that has exported a snapshot so > that it will be honored in all DBs not just the current one. The > difficulty here is that we'd need to know *at the time the snap is > taken* that it's going to be exported. (Consider the scenario above, > except that A doesn't get around to exporting the snapshot it took in > step 3 until between steps 5 and 6. If the xmin wasn't already marked > as globally applicable when vacuum looked at it in step 5, we lose.) > This is do-able but it will contort the user-visible API of the sync > snapshots feature. One way we could do it is to require that > transactions that want to export snapshots set a transaction mode > before they take their first snapshot. 1 *and* 4 please. So, unless explicitly requested, an exported snapshot is limited to just one database. If explicitly requested to be transportable, we can use the snapshot in other databases. This allows us to do parallel pg_dump in both 1+ databases, as well as allowing pg_dumpall to be fully consistent across all dbs. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > 1 *and* 4 please. Given the lack of enthusiasm I'm not going to do anything about #4 now. Somebody else can add it later. > So, unless explicitly requested, an exported snapshot is limited to > just one database. If explicitly requested to be transportable, we can > use the snapshot in other databases. Yeah, we could make it work like that when it gets added. regards, tom lane