Обсуждение: Injection point locking
InjectionPointRun() acquires InjectionPointLock, looks up the hash entry, and releases the lock: > LWLockAcquire(InjectionPointLock, LW_SHARED); > entry_by_name = (InjectionPointEntry *) > hash_search(InjectionPointHash, name, > HASH_FIND, &found); > LWLockRelease(InjectionPointLock); Later, it reads fields from the entry it looked up: > /* not found in local cache, so load and register */ > snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path, > entry_by_name->library, DLSUFFIX); Isn't that a straightforward race condition, if the injection point is detached in between? Another thing: I wanted use injection points to inject an error early at backend startup, to write a test case for the scenarios that Jacob point out at https://www.postgresql.org/message-id/CAOYmi%2Bnwvu21mJ4DYKUa98HdfM_KZJi7B1MhyXtnsyOO-PB6Ww%40mail.gmail.com. But I can't do that, because InjectionPointRun() requires a PGPROC entry, because it uses an LWLock. That also makes it impossible to use injection points in the postmaster. Any chance we could allow injection points to be triggered without a PGPROC entry? Could we use a simple spinlock instead? With a fast path for the case that no injection points are attached or something? -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > ... I can't do that, because InjectionPointRun() requires a PGPROC > entry, because it uses an LWLock. That also makes it impossible to use > injection points in the postmaster. Any chance we could allow injection > points to be triggered without a PGPROC entry? Could we use a simple > spinlock instead? With a fast path for the case that no injection points > are attached or something? Even taking a spinlock in the postmaster is contrary to project policy. Maybe we could look the other way for debug-only code, but it seems like a pretty horrible precedent. Given your point that the existing locking is just a fig leaf anyway, maybe we could simply not have any? Then it's on the test author to be sure that the injection point won't be getting invoked when it's about to be removed. Or just rip out the removal capability, and say that once installed an injection point is there until postmaster shutdown (or till shared memory reinitialization, anyway). regards, tom lane
On Mon, Jun 24, 2024 at 01:29:38PM +0300, Heikki Linnakangas wrote: > InjectionPointRun() acquires InjectionPointLock, looks up the hash entry, > and releases the lock: > > > LWLockAcquire(InjectionPointLock, LW_SHARED); > > entry_by_name = (InjectionPointEntry *) > > hash_search(InjectionPointHash, name, > > HASH_FIND, &found); > > LWLockRelease(InjectionPointLock); > > Later, it reads fields from the entry it looked up: > > > /* not found in local cache, so load and register */ > > snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path, > > entry_by_name->library, DLSUFFIX); > > Isn't that a straightforward race condition, if the injection point is > detached in between? This is a feature, not a bug :) Jokes apart, this is a behavior that Noah was looking for so as it is possible to detach a point to emulate what a debugger would do with a breakpoint for some of his tests with concurrent DDL bugs, so not taking a lock while running a point is important. It's true, though, that we could always delay the LWLock release once the local cache is loaded, but would it really matter? -- Michael
Вложения
On Mon, Jun 24, 2024 at 11:03:09AM -0400, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: > > ... I can't do that, because InjectionPointRun() requires a PGPROC > > entry, because it uses an LWLock. That also makes it impossible to use > > injection points in the postmaster. Any chance we could allow injection > > points to be triggered without a PGPROC entry? Could we use a simple > > spinlock instead? That sounds fine to me. If calling hash_search() with a spinlock feels too awful, a list to linear-search could work. > > With a fast path for the case that no injection points > > are attached or something? > > Even taking a spinlock in the postmaster is contrary to project > policy. Maybe we could look the other way for debug-only code, > but it seems like a pretty horrible precedent. If you're actually using an injection point in the postmaster, that would be the least of concerns. It is something of a concern for running an injection point build while not attaching any injection point. One solution could be a GUC to control whether the postmaster participates in injection points. Another could be to make the data structure readable with atomics only. > Given your point that the existing locking is just a fig leaf > anyway, maybe we could simply not have any? Then it's on the > test author to be sure that the injection point won't be > getting invoked when it's about to be removed. That's tricky with injection_points_set_local() plus an injection point at a frequently-reached location. It's one thing to control when the injection_points_set_local() process reaches the injection point. It's too hard to control all the other processes that just reach the injection point and conclude it's not for their PID. > Or just rip > out the removal capability, and say that once installed an > injection point is there until postmaster shutdown (or till > shared memory reinitialization, anyway). That could work. Tests do need a way to soft-disable, but it's okay with me if nothing can reclaim the resources.
On Mon, Jun 24, 2024 at 11:03:09AM -0400, Tom Lane wrote: > Given your point that the existing locking is just a fig leaf > anyway, maybe we could simply not have any? Then it's on the > test author to be sure that the injection point won't be > getting invoked when it's about to be removed. That would work for me to put the responsibility to the test author, ripping out the LWLock. I was wondering when somebody would come up with a case where they'd want to point to the postmaster to do something, without really coming down to a case, so there was that from my side originally. Looking at all the points currently in the tree, nothing cares about the concurrent locking when attaching or detaching a point, so perhaps it is a good thing based on these experiences to just let this LWLock go. This should not impact the availability of the tests, either. > Or just rip > out the removal capability, and say that once installed an > injection point is there until postmaster shutdown (or till > shared memory reinitialization, anyway). But not that. Being able to remove points on the fly can be important in some cases, for example where you'd still want to issue an ERROR (partial write path is one case) in a SQL test, then remove it in a follow-up SQL query to not trigger the same ERROR. -- Michael
Вложения
On Tue, Jun 25, 2024 at 11:14:57AM +0900, Michael Paquier wrote: > On Mon, Jun 24, 2024 at 01:29:38PM +0300, Heikki Linnakangas wrote: > > InjectionPointRun() acquires InjectionPointLock, looks up the hash entry, > > and releases the lock: > > > > > LWLockAcquire(InjectionPointLock, LW_SHARED); > > > entry_by_name = (InjectionPointEntry *) > > > hash_search(InjectionPointHash, name, > > > HASH_FIND, &found); > > > LWLockRelease(InjectionPointLock); > > > > Later, it reads fields from the entry it looked up: > > > > > /* not found in local cache, so load and register */ > > > snprintf(path, MAXPGPATH, "%s/%s%s", pkglib_path, > > > entry_by_name->library, DLSUFFIX); > > > > Isn't that a straightforward race condition, if the injection point is > > detached in between? > > This is a feature, not a bug :) > > Jokes apart, this is a behavior that Noah was looking for so as it is > possible to detach a point to emulate what a debugger would do with a > breakpoint for some of his tests with concurrent DDL bugs, so not > taking a lock while running a point is important. It's true, though, > that we could always delay the LWLock release once the local cache is > loaded, but would it really matter? I think your last sentence is what Heikki is saying should happen, and I agree. Yes, it matters. As written, InjectionPointRun() could cache an entry_by_name->function belonging to a different injection point.
On Tue, Jun 25, 2024 at 09:10:06AM -0700, Noah Misch wrote: > I think your last sentence is what Heikki is saying should happen, and I > agree. Yes, it matters. As written, InjectionPointRun() could cache an > entry_by_name->function belonging to a different injection point. That's true, we could delay the release of the lock to happen just before a callback is run. Now, how much do people wish to see for the postmaster bits mentioned upthread? Taking a spinlock for so long is not going to work, so we could just remove it and let developers deal with that and feed on the flexibility with the lock removal to allow this stuff in more areas. All the existing tests are OK with that, and I think that also the case of what you have proposed for the concurrency issues with in-place updates of catalogs. Or we could live with a no-lock path when going through that with the postmaster, but that's a bit weird. Note that with the current callbacks in the module, assuming that a point is added within BackendStartup() in the postmaster like the attached, an ERROR is promoted to a FATAL, taking down the cluster. A NOTICE of course works find. Waits with conditional variables are not really OK. How much are you looking for here? The shmem state being initialized in the DSM registry is not something that's going to work in the context of the postmaster, but we could tweak the module so as it can be loaded, initializing the shared state with the shmem hooks and falling back to a DSM registry when the library is not loaded with shared_preload_libraries. For example, see the POC attached, where I've played with injection points in BackendStartup(), which is the area I'm guessing Heikki was looking at. -- Michael
Вложения
On Wed, Jun 26, 2024 at 10:56:12AM +0900, Michael Paquier wrote: > That's true, we could delay the release of the lock to happen just > before a callback is run. I am not sure what else we can do for the postmaster case for now, so I've moved ahead with the concern regarding the existing locking release delay when running a point, and pushed a patch for it. -- Michael