Обсуждение: Set log_lock_waits=on by default
Here is a patch to implement this. Being stuck behind a lock for more than a second is almost always a problem, so it is reasonable to turn this on by default. Yours, Laurenz Albe
Вложения
On Thu, Dec 21, 2023 at 8:29 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > Here is a patch to implement this. > Being stuck behind a lock for more than a second is almost > always a problem, so it is reasonable to turn this on by default. I think it depends somewhat on the lock type, and also on your threshold for what constitutes a problem. For example, you can wait for 1 second for a relation extension lock pretty easily, I think, just because the I/O system is busy. Or I think also a VXID lock held by some transaction that has a tuple locked could be not particularly exciting. A conflict on a relation lock seems more likely to represent a real issue, but I guess it's all kind of a judgement call. A second isn't really all that long on an overloaded system, and I see an awful lot of overloaded systems (because those are the people who call me). Just a random idea but what if we separated log_lock_waits from deadlock_timeout? Say, it becomes time-valued rather than Boolean-valued, but it has to be >= deadlock_timeout? Because I'd probably be more interested in hearing about a lock wait that was more than say 10 seconds, but I don't necessarily want to wait 10 seconds for the deadlock detector to trigger. In general, I do kind of like the idea of trying to log more problem situations by default, so that when someone has a major issue, you don't have to start by having them change all the logging settings and then wait until they get hosed a second time before you can troubleshoot anything. I'm just concerned that 1s might be too sensitive for a lot of users who aren't as, let's say, diligent about keeping the system healthy as you probably are. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Dec 21, 2023 at 05:29 Laurenz Albe <laurenz.albe@cybertec.at> wrote:
Here is a patch to implement this.
Being stuck behind a lock for more than a second is almost
always a problem, so it is reasonable to turn this on by default.
I think it's a very good idea. On all heavily loaded systems I have observed so far, we always have turned it on. 1s (default deadlock_timeout) is quite large value for web/mobile apps, meaning that default frequency of logging is quite low, so any potential suffering from observer effect doesn't happen -- saturation related active session number happens much, much earlier, even if you have very slow disk IO for logging.
At the same time, I like the idea by Robert to separate logging of log waits and deadlock_timeout logic -- the current implementation is a quite confusing for new users. I also had cases when people wanted to log lock waits earlier than deadlock detection. And also, most always lock wait logging lacks the information another the blocking session (its state, and last query, first of all), but is maybe an off topic worthing another effort of improvements.
Nik
Le 21/12/2023 à 14:29, Laurenz Albe a écrit : > Here is a patch to implement this. > Being stuck behind a lock for more than a second is almost > always a problem, so it is reasonable to turn this on by default. I think it's a really good idea. At Dalibo, we advise our customers to switch it on. AFAICT, it's never been a problem. Best regards, Frédéric
Re: Robert Haas > On Thu, Dec 21, 2023 at 8:29 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > Here is a patch to implement this. > > Being stuck behind a lock for more than a second is almost > > always a problem, so it is reasonable to turn this on by default. > > I think it depends somewhat on the lock type, and also on your > threshold for what constitutes a problem. For example, you can wait > for 1 second for a relation extension lock pretty easily, I think, > just because the I/O system is busy. Or I think also a VXID lock held > by some transaction that has a tuple locked could be not particularly > exciting. A conflict on a relation lock seems more likely to represent > a real issue, but I guess it's all kind of a judgement call. A second > isn't really all that long on an overloaded system, and I see an awful > lot of overloaded systems (because those are the people who call me). If a system is so busy that it's waiting so long for the disk, I would like PG to tell me about it. Likewise, if my transactions are slow because they are waiting for each other, I'd also like PG to tell me. Especially as the 2nd condition can't be seen by "it's slow because CPU or IO is at 100%". In any case, setting log_lock_waits=on by default helps. In fact, everyone I talked to was wondering why log_checkpoints was turned on by default, and not this parameter. The info provided by log_lock_waits is much more actionable than the stream of log_checkpoint messages. > Just a random idea but what if we separated log_lock_waits from > deadlock_timeout? Say, it becomes time-valued rather than > Boolean-valued, but it has to be >= deadlock_timeout? Because I'd > probably be more interested in hearing about a lock wait that was more > than say 10 seconds, but I don't necessarily want to wait 10 seconds > for the deadlock detector to trigger. That's also a good point, but I'd like to see log_lock_waits default to 'on' independently from having this extra change. > In general, I do kind of like the idea of trying to log more problem > situations by default, so that when someone has a major issue, you > don't have to start by having them change all the logging settings and > then wait until they get hosed a second time before you can > troubleshoot anything. I'm just concerned that 1s might be too > sensitive for a lot of users who aren't as, let's say, diligent about > keeping the system healthy as you probably are. I don't think 1s would be too sensitive by default. Christoph
On Thu, 2023-12-21 at 09:14 -0500, Robert Haas wrote: > > I think it depends somewhat on the lock type, and also on your > threshold for what constitutes a problem. For example, you can wait > for 1 second for a relation extension lock pretty easily, I think, > just because the I/O system is busy. Or I think also a VXID lock held > by some transaction that has a tuple locked could be not particularly > exciting. A conflict on a relation lock seems more likely to represent > a real issue, but I guess it's all kind of a judgement call. A second > isn't really all that long on an overloaded system, and I see an awful > lot of overloaded systems (because those are the people who call me). Sure, you don't want "log_lock_waits = on" in all conceivable databases. I have seen applications that use database locks to synchronize application threads (*shudder*). If it is normal for your database to experience long lock waits, disable the parameter. My point is that in the vast majority of cases, long lock waits indicate a problem that you would like to know about, so the parameter should default to "on". (Out of curiosity: what would ever wait for a VXID lock?) > Just a random idea but what if we separated log_lock_waits from > deadlock_timeout? Say, it becomes time-valued rather than > Boolean-valued, but it has to be >= deadlock_timeout? Because I'd > probably be more interested in hearing about a lock wait that was more > than say 10 seconds, but I don't necessarily want to wait 10 seconds > for the deadlock detector to trigger. That is an appealing thought, but as far as I know, "log_lock_waits" is implemented by the deadlock detector, which is why it is tied to "deadlock_timeout". So if we want that, we'd need a separate "live lock detector". I don't know if we want to go there. Yours, Laurenz Albe
On 2023-12-22 20:00, Laurenz Albe wrote: > My point is that in the vast majority of cases, long lock waits > indicate a problem that you would like to know about, so the parameter > should default to "on". +1. I always set log_lock_waits=on, so I agree with this. >> Just a random idea but what if we separated log_lock_waits from >> deadlock_timeout? Say, it becomes time-valued rather than >> Boolean-valued, but it has to be >= deadlock_timeout? Because I'd >> probably be more interested in hearing about a lock wait that was more >> than say 10 seconds, but I don't necessarily want to wait 10 seconds >> for the deadlock detector to trigger. > > That is an appealing thought, but as far as I know, "log_lock_waits" > is implemented by the deadlock detector, which is why it is tied to > "deadlock_timeout". So if we want that, we'd need a separate "live > lock detector". I don't know if we want to go there. Personally, I thought it was a good idea to separate log_lock_waits and deadlock_timeout, but I have not checked how that is implemented. -- Regards, Shinya Kato NTT DATA GROUP CORPORATION
On 12/21/23 6:58 AM, Nikolay Samokhvalov wrote: > On Thu, Dec 21, 2023 at 05:29 Laurenz Albe <laurenz.albe@cybertec.at > <mailto:laurenz.albe@cybertec.at>> wrote: > > Here is a patch to implement this. > Being stuck behind a lock for more than a second is almost > always a problem, so it is reasonable to turn this on by default. > > > I think it's a very good idea. On all heavily loaded systems I have > observed so far, we always have turned it on. 1s (default > deadlock_timeout) is quite large value for web/mobile apps, meaning that > default frequency of logging is quite low, so any potential suffering > from observer effect doesn't happen -- saturation related active session > number happens much, much earlier, even if you have very slow disk IO > for logging. FWIW, enabling this setting has also been a long-time "happiness hint" that I've passed along to people. What would be the worst case amount of logging that we're going to generate at scale? I think the worst case would largely scale according to connection count? So if someone had a couple thousand backends on a busy top-end system, then I guess they might generate up to a couple thousand log messages every second or two under load after this parameter became enabled with a 1 second threshold? I'm not aware of any cases where enabling this parameter with a 1 second threshold overwhelmed the logging collector (unlike, for example, log_statement=all) but I wanted to pose the question in the interest of being careful. > At the same time, I like the idea by Robert to separate logging of log > waits and deadlock_timeout logic -- the current implementation is a > quite confusing for new users. I also had cases when people wanted to > log lock waits earlier than deadlock detection. And also, most always > lock wait logging lacks the information another the blocking session > (its state, and last query, first of all), but is maybe an off topic > worthing another effort of improvements. I agree with this, though it's equally true that proliferation of new GUCs is confusing for new users. I hope the project avoids too low of a bar for adding new GUCs. But using the deadlock_timeout GUC for this completely unrelated log threshold really doesn't make sense. -Jeremy -- http://about.me/jeremy_schneider
Hi, On Thu, Dec 21, 2023 at 02:29:05PM +0100, Laurenz Albe wrote: > Here is a patch to implement this. > Being stuck behind a lock for more than a second is almost > always a problem, so it is reasonable to turn this on by default. I also think that this should be set to on. I had a look at the patch and it works fine. Regarding the documentation, maybe the back-reference at deadlock_timeout could be made a bit more explicit then as well, as in the attached patch, but this is mostly bikeshedding. Michael
Вложения
Hi, On Thu, Jan 11, 2024 at 03:24:55PM +0100, Michael Banck wrote: > On Thu, Dec 21, 2023 at 02:29:05PM +0100, Laurenz Albe wrote: > > Here is a patch to implement this. > > Being stuck behind a lock for more than a second is almost > > always a problem, so it is reasonable to turn this on by default. > > I also think that this should be set to on. > > I had a look at the patch and it works fine. > > Regarding the documentation, maybe the back-reference at > deadlock_timeout could be made a bit more explicit then as well, as in > the attached patch, but this is mostly bikeshedding. I've marked it ready for committer now, as the above really is bikeshedding. Michael
I saw this was marked ready-for-committer, so I took a look at the thread. It looks like there are two ideas: * Separate log_lock_waits from deadlock_timeout. It looks like this idea has a decent amount of support, but I didn't see any patch for it so far. IMHO this is arguably a prerequisite for setting log_lock_waits on by default, as we could then easily set it higher by default to help address concerns about introducing too much noise in the logs. * Set log_lock_waits on by default. The folks on this thread seem to support this idea, but given the lively discussion for enabling log_checkpoints by default [0], I'm hesitant to commit something like this without further community discussion. [0] https://postgr.es/m/CALj2ACX-rW_OeDcp4gqrFUAkf1f50Fnh138dmkd0JkvCNQRKGA%40mail.gmail.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Nathan Bossart <nathandbossart@gmail.com> writes: > It looks like there are two ideas: > * Separate log_lock_waits from deadlock_timeout. It looks like this idea > has a decent amount of support, but I didn't see any patch for it so far. I think the support comes from people who have not actually looked at the code. The reason they are not separate is that the same timeout service routine does both things. To pull them apart, you would have to (1) detangle that code and (2) incur the overhead of two timeout events queued for every lock wait. It's not clear to me that it's worth it. In some sense, deadlock_timeout is exactly the length of time after which you want to get concerned. > IMHO this is arguably a prerequisite for setting log_lock_waits on by > default, as we could then easily set it higher by default to help address > concerns about introducing too much noise in the logs. Well, that's the question --- would it be sane to enable log_lock_waits by default if we don't separate them? > * Set log_lock_waits on by default. The folks on this thread seem to > support this idea, but given the lively discussion for enabling > log_checkpoints by default [0], I'm hesitant to commit something like > this without further community discussion. I was, and remain, of the opinion that that was a bad idea that we'll eventually revert, just like we previously got rid of most inessential log chatter in the default configuration. So I doubt you'll have much trouble guessing my opinion of this one. I think the default logging configuration should be chosen with the understanding that nobody ever looks at the logs of most installations, and we should be more worried about their disk space consumption than anything else. Admittedly, log_lock_waits is less bad than log_checkpoints, because no such events will occur in a well-tuned configuration ... but still, it's going to be useless chatter in the average installation. regards, tom lane
On Tue, 2024-02-06 at 12:29 -0500, Tom Lane wrote: > Nathan Bossart <nathandbossart@gmail.com> writes: > > It looks like there are two ideas: > > [...] > > * Set log_lock_waits on by default. The folks on this thread seem to > > support this idea, but given the lively discussion for enabling > > log_checkpoints by default [0], I'm hesitant to commit something like > > this without further community discussion. > > I was, and remain, of the opinion that that was a bad idea that > we'll eventually revert, just like we previously got rid of most > inessential log chatter in the default configuration. So I doubt > you'll have much trouble guessing my opinion of this one. I think > the default logging configuration should be chosen with the > understanding that nobody ever looks at the logs of most > installations, and we should be more worried about their disk space > consumption than anything else. Admittedly, log_lock_waits is less > bad than log_checkpoints, because no such events will occur in a > well-tuned configuration ... but still, it's going to be useless > chatter in the average installation. Unsurprisingly, I want to argue against that. You say that it is less bad than "log_checkpoints = on", and I agree. I can't remember seeing any complaints by users about "log_checkpoints", and I predict that the complaints about "log_lock_waits = on" will be about as loud. I am all for avoiding useless chatter in the log. In my personal experience, that is usually "database typo does not exist" and constraint violation errors. I always recommend people to enable "log_lock_waits", and so far I have not seen it spam the logs. I agree that usually nobody ever looks into the log file. The time when people *do* look into the log file is when they encounter trouble, and my stance is that the default configuration should be such that the log contains clues as to what may be the problem. If a statement sometimes takes unreasonably long, it is very valuable corroborative information that the statement occasionally waits more than a second for a lock. Yours, Laurenz Albe
Greetings, * Laurenz Albe (laurenz.albe@cybertec.at) wrote: > On Tue, 2024-02-06 at 12:29 -0500, Tom Lane wrote: > > Nathan Bossart <nathandbossart@gmail.com> writes: > > > It looks like there are two ideas: > > > [...] > > > * Set log_lock_waits on by default. The folks on this thread seem to > > > support this idea, but given the lively discussion for enabling > > > log_checkpoints by default [0], I'm hesitant to commit something like > > > this without further community discussion. > > > > I was, and remain, of the opinion that that was a bad idea that > > we'll eventually revert, just like we previously got rid of most > > inessential log chatter in the default configuration. So I doubt > > you'll have much trouble guessing my opinion of this one. I think > > the default logging configuration should be chosen with the > > understanding that nobody ever looks at the logs of most > > installations, and we should be more worried about their disk space > > consumption than anything else. Admittedly, log_lock_waits is less > > bad than log_checkpoints, because no such events will occur in a > > well-tuned configuration ... but still, it's going to be useless > > chatter in the average installation. > > Unsurprisingly, I want to argue against that. I tend to agree with this position- log_checkpoints being on has been a recommended configuration for a very long time and is valuable information to have about what's been happening when someone does go and look at the log. Having log_lock_waits on by default is likely to be less noisy and even more useful for going back in time to figure out what happened. > You say that it is less bad than "log_checkpoints = on", and I agree. > I can't remember seeing any complaints by users about > "log_checkpoints", and I predict that the complaints about > "log_lock_waits = on" will be about as loud. Yeah, agreed. > I am all for avoiding useless chatter in the log. In my personal > experience, that is usually "database typo does not exist" and > constraint violation errors. I always recommend people to enable > "log_lock_waits", and so far I have not seen it spam the logs. I really wish we could separate out the messages about typos and constraint violations from these logs about processes waiting a long time for locks or about checkpoints or even PANIC's or other really important messages. That said, that's a different problem and not something this change needs to concern itself with. As for if we want to separate out log_lock_waits from deadlock_timeout- no, I don't think we do, for the reasons that Tom mentioned. I don't see it as necessary either for enabling log_lock_waits by default. Waiting deadlock_timeout amount of time for a lock certainly is a problem already and once we've waited that amount of time, I can't see the time spent logging about it as being a serious issue. +1 for simply enabling log_lock_waits by default. All that said ... if we did come up with a nice way to separate out the timing for deadlock_timeout and log_lock_waits, I wouldn't necessarily be against it. Perhaps one approach to that would be to set just one timer but have it be the lower of the two, and then set another when that fires (if there's more waiting to do) and then catch it when it happens... Again, I'd view this as some independent improvement though and not a requirement for just enabling log_lock_waits by default. Thanks, Stephen
Вложения
Stephen Frost <sfrost@snowman.net> writes: >> On Tue, 2024-02-06 at 12:29 -0500, Tom Lane wrote: >>> I was, and remain, of the opinion that that was a bad idea that >>> we'll eventually revert, just like we previously got rid of most >>> inessential log chatter in the default configuration. >> Unsurprisingly, I want to argue against that. > I tend to agree with this position- log_checkpoints being on has been a > recommended configuration for a very long time and is valuable > information to have about what's been happening when someone does go and > look at the log. We turned on default log_checkpoints in v15, which means that behavior has been in the field for about sixteen months. I don't think that that gives it the status of a settled issue; my bet is that most users still have not seen it. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > >> On Tue, 2024-02-06 at 12:29 -0500, Tom Lane wrote: > >>> I was, and remain, of the opinion that that was a bad idea that > >>> we'll eventually revert, just like we previously got rid of most > >>> inessential log chatter in the default configuration. > > >> Unsurprisingly, I want to argue against that. > > > I tend to agree with this position- log_checkpoints being on has been a > > recommended configuration for a very long time and is valuable > > information to have about what's been happening when someone does go and > > look at the log. > > We turned on default log_checkpoints in v15, which means that behavior > has been in the field for about sixteen months. I don't think that > that gives it the status of a settled issue; my bet is that most > users still have not seen it. Apologies for not being clear- log_checkpoints being on has been a configuration setting that I (and many others I've run into) have been recommending since as far back as I can remember. I was not referring to the change in the default. Thanks, Stephen