Josh Berkus <josh@agliodbs.com> writes: >> However, if you were doing something like parallel pg_dump you could >> just run the parent and child instances all against the slave, so the >> pg_dump scenario doesn't seem to offer much of a supporting use-case for >> worrying about this. When would you really need to be able to do it?
> If you had several standbys, you could distribute the work of the > pg_dump among them. This would be a huge speedup for a large database, > potentially, thanks to parallelization of I/O and network. Imagine > doing a pg_dump of a 300GB database in 10min.
That does sound kind of attractive. But to do that I think we'd have to go with the pass-the-snapshot-through-the-client approach. Shipping internal snapshot files through the WAL stream doesn't seem attractive to me.
While I see Robert's point about preferring not to expose the snapshot contents to clients, I don't think it outweighs all other considerations here; and every other one is pointing to doing it the other way.
How about the publishing transaction puts the snapshot in a (new) system table and passes a UUID to its children, and the joining transactions looks for that UUID in the system table using dirty snapshot (SnapshotAny) using a security-definer function owned by superuser.
No shared memory used, and if WAL-logged, the snapshot would get to the slaves too.
I realize SnapshotAny wouldn't be sufficient since we want the tuple to become invisible when the publishing transaction ends (commit/rollback), hence something akin to (new) HeapTupleSatisfiesStillRunning() would be needed.