Обсуждение: lazy_truncate_heap()
While watching WAL records float by I noticed some AccessExclusiveLocks occurring unnecessarily during VACUUMs. This is caused by lines 186-189 in lazy_vacuum_rel(), vacuumlazy.c possibly_freeable = vacrelstats->rel_pages - vacrelstats->nonempty_pages; if (possibly_freeable >= REL_TRUNCATE_MINIMUM|| possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION) lazy_truncate_heap(onerel,vacrelstats); If you look closely you'll see that if rel_pages is small then we will attempt to truncate the heap even if possibly_freeable == 0. While looking at this some more, I notice there is another bug. When VACUUM has nothing at all to do, then it appears that vacrelstats->nonempty_pages is zero, so that possibly_freeable is always set to vacrelstats->rel_pages. vacrelstats->nonempty_pages doesn't appear to be used anywhere else, so nobody notices it is wrongly set. Both of these bugs are minor, but the effect of either/both of them is to cause more AccessExclusiveLocks than we might expect. For Hot Standby this means that many VACUUMs take AccessExclusiveLocks on relations, which would potentially lead to having queries cancelled for no reason at all. Does anybody think any of the above is intentional? Can I fix? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > While watching WAL records float by I noticed some AccessExclusiveLocks > occurring unnecessarily during VACUUMs. > > This is caused by lines 186-189 in lazy_vacuum_rel(), vacuumlazy.c > > possibly_freeable = vacrelstats->rel_pages - > vacrelstats->nonempty_pages; > if (possibly_freeable >= REL_TRUNCATE_MINIMUM || > possibly_freeable >= vacrelstats->rel_pages / > REL_TRUNCATE_FRACTION) > lazy_truncate_heap(onerel, vacrelstats); > > If you look closely you'll see that if rel_pages is small then we will > attempt to truncate the heap even if possibly_freeable == 0. Good catch! And it goes all the way back to 7.4. > While looking at this some more, I notice there is another bug. When > VACUUM has nothing at all to do, then it appears that > vacrelstats->nonempty_pages is zero, so that possibly_freeable is always > set to vacrelstats->rel_pages. vacrelstats->nonempty_pages doesn't > appear to be used anywhere else, so nobody notices it is wrongly set. Hmm, this is a new issue with the visibility map. For pages at the end that are skipped, we don't know if they're empty or not. Currently we assume that they are, but perhaps it would be better to assume they're not? On the other hand, that makes it much less likely that we even try to truncate a relation, and when we do, we wouldn't truncate it as far as we could. > Does anybody think any of the above is intentional? Can I fix? No. Yes please. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 31 Dec 2008, at 13:21, Simon Riggs <simon@2ndQuadrant.com> wrote: > > Both of these bugs are minor, but the effect of either/both of them is > to cause more AccessExclusiveLocks than we might expect. > > For Hot Standby this means that many VACUUMs take AccessExclusiveLocks > on relations, which would potentially lead to having queries cancelled > for no reason at all. Well by default it would just cause wal to pause briefly until the queries with those locks finish, no?
Greg Stark wrote: > > On 31 Dec 2008, at 13:21, Simon Riggs <simon@2ndQuadrant.com> wrote: >> >> Both of these bugs are minor, but the effect of either/both of them is >> to cause more AccessExclusiveLocks than we might expect. >> >> For Hot Standby this means that many VACUUMs take AccessExclusiveLocks >> on relations, which would potentially lead to having queries cancelled >> for no reason at all. > > Well by default it would just cause wal to pause briefly until the > queries with those locks finish, no? Wait a minute. Why does an AccessExclusiveLock lead to cancelled queries or pausing WAL application? I thought it'd just block other queries trying to acquire a conflicting lock in the standby, just like holding an AccessExclusiveLock on the primary does. It's unrelated to the xmin horizon issue. There is a noteworthy point though. In the primary, vacuum trying to truncate takes AccessExclusiveLock conditionally, so that it doesn't disturb queries accessing the table, and only truncates the table if it got the lock. But in standby, we have to truncate the table, and therefore have to acquire the lock, waiting until we get it. I guess we have to stop applying WAL while waiting. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2008-12-31 at 14:45 -0500, Greg Stark wrote: > On 31 Dec 2008, at 13:21, Simon Riggs <simon@2ndQuadrant.com> wrote: > > > > Both of these bugs are minor, but the effect of either/both of them is > > to cause more AccessExclusiveLocks than we might expect. > > > > For Hot Standby this means that many VACUUMs take AccessExclusiveLocks > > on relations, which would potentially lead to having queries cancelled > > for no reason at all. > > Well by default it would just cause wal to pause briefly until the > queries with those locks finish, no? Yes, but why allow pointless actions in the first place? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Wed, 2008-12-31 at 21:45 +0200, Heikki Linnakangas wrote: > > Can I fix? > > Yes please. Fix attached. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support
Вложения
On Thu, 2009-01-01 at 12:00 +0200, Heikki Linnakangas wrote: > Greg Stark wrote: > > > > On 31 Dec 2008, at 13:21, Simon Riggs <simon@2ndQuadrant.com> wrote: > >> > >> Both of these bugs are minor, but the effect of either/both of them is > >> to cause more AccessExclusiveLocks than we might expect. > >> > >> For Hot Standby this means that many VACUUMs take AccessExclusiveLocks > >> on relations, which would potentially lead to having queries cancelled > >> for no reason at all. > > > > Well by default it would just cause wal to pause briefly until the > > queries with those locks finish, no? > > Wait a minute. Why does an AccessExclusiveLock lead to cancelled queries > or pausing WAL application? I thought it'd just block other queries > trying to acquire a conflicting lock in the standby, just like holding > an AccessExclusiveLock on the primary does. It's unrelated to the xmin > horizon issue. Yes, it is unrelated to the xmin horizon issue. There are two reasons for delaying WAL apply: * locks * xmin horizon When a lock is acquired on the primary it almost always precedes an action which cannot occur concurrently. For example, if VACUUM did truncate a table then queries could get errors because parts of their table disappear from under them. Others are drop table etc.. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Thu, 2009-01-01 at 12:00 +0200, Heikki Linnakangas wrote: >> Greg Stark wrote: >>> On 31 Dec 2008, at 13:21, Simon Riggs <simon@2ndQuadrant.com> wrote: >>>> Both of these bugs are minor, but the effect of either/both of them is >>>> to cause more AccessExclusiveLocks than we might expect. >>>> >>>> For Hot Standby this means that many VACUUMs take AccessExclusiveLocks >>>> on relations, which would potentially lead to having queries cancelled >>>> for no reason at all. >>> Well by default it would just cause wal to pause briefly until the >>> queries with those locks finish, no? >> Wait a minute. Why does an AccessExclusiveLock lead to cancelled queries >> or pausing WAL application? I thought it'd just block other queries >> trying to acquire a conflicting lock in the standby, just like holding >> an AccessExclusiveLock on the primary does. It's unrelated to the xmin >> horizon issue. > > Yes, it is unrelated to the xmin horizon issue. There are two reasons > for delaying WAL apply: > * locks > * xmin horizon > > When a lock is acquired on the primary it almost always precedes an > action which cannot occur concurrently. For example, if VACUUM did > truncate a table then queries could get errors because parts of their > table disappear from under them. Others are drop table etc.. Have you implemented the query cancellation mechanism for that scenario too? (I'm cool either way, just curious..) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Sun, 2009-01-04 at 13:01 +0200, Heikki Linnakangas wrote: > Why does an AccessExclusiveLock lead to cancelled queries > >> or pausing WAL application? I thought it'd just block other queries > >> trying to acquire a conflicting lock in the standby, just like holding > >> an AccessExclusiveLock on the primary does. It's unrelated to the xmin > >> horizon issue. > > > > Yes, it is unrelated to the xmin horizon issue. There are two reasons > > for delaying WAL apply: > > * locks > > * xmin horizon > > > > When a lock is acquired on the primary it almost always precedes an > > action which cannot occur concurrently. For example, if VACUUM did > > truncate a table then queries could get errors because parts of their > > table disappear from under them. Others are drop table etc.. > > Have you implemented the query cancellation mechanism for that scenario > too? (I'm cool either way, just curious..) Yes, they both lead to a conflict between WAL and standby queries, so are treated the same, currently: if conflict occurs, wait until max_standby_delay expires, then cancel. Logically, "xmin horizon" conflicts could be flexible/soft. That is, if we implemented the idea to store a lastCleanedLSN for each buffer then "xmin horizon" conflicts would be able to continue executing until they see a buffer with buffer.lastCleanedLSN > conflictLSN. Whereas the lock would be a hard limit beyond which a query could not progress. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
> Logically, "xmin horizon" conflicts could be flexible/soft. > That is, if we implemented the idea to store a lastCleanedLSN for each buffer then > "xmin horizon" conflicts would be able to continue executing until they > see a buffer with buffer.lastCleanedLSN > conflictLSN. I think the trouble is, that HOT can put extremely recent lastCleanedLSN's on pages. It would need some knobs to avoid this, that most likely reduce efficiency of HOT. What about using the page LSN after max_standby_delay ? Using the page LSN cancels queries earlier than the lastCleanedLSN, but probably in many cases later than an immediate cancel after max_standby_delay. Of course that only helps when reading static parts of tables :-( Instead of a cancel message, the replay would need to send (set in shmem) the first LSN applied after max_standby_delay to the relevant backend for it's LSN checks (if buffer.LSN >= received_max_delay_lsn cancel). Andreas
On Mon, 2009-01-05 at 18:24 +0100, Zeugswetter Andreas OSB sIT wrote: > > Logically, "xmin horizon" conflicts could be flexible/soft. > > That is, if we implemented the idea to store a lastCleanedLSN for each buffer then > > "xmin horizon" conflicts would be able to continue executing until they > > see a buffer with buffer.lastCleanedLSN > conflictLSN. > > I think the trouble is, that HOT can put extremely recent lastCleanedLSN's on pages. > It would need some knobs to avoid this, that most likely reduce efficiency of HOT. > > What about using the page LSN after max_standby_delay ? > Using the page LSN cancels queries earlier than the lastCleanedLSN, > but probably in many cases later than an immediate cancel after max_standby_delay. > Of course that only helps when reading static parts of tables :-( > > Instead of a cancel message, the replay would need to send (set in shmem) the first > LSN applied after max_standby_delay to the relevant backend for it's LSN checks > (if buffer.LSN >= received_max_delay_lsn cancel). I like your train of thought there: If HOT is the problem then lastCleanedLSN approx= LSN on HOT blocks, so having lastCleanedLSN doesn't help much. OK, so I'll skip that idea and go with what you suggest. Design: When conflict occurs we set a RecoveryConflictLSN on the Proc of the backend to be cancelled. Every time we read a block in recovery we check MyProc.RecoveryConflictLSN against the LSN on the block and backend will commit suicide (ERROR) if block LSN is equal or greater. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote:
> On Wed, 2008-12-31 at 21:45 +0200, Heikki Linnakangas wrote:
>>> Can I fix?
>> Yes please.
> 
> Fix attached.
> --- 183,192 ----
>        * number of pages.  Otherwise, the time taken isn't worth it.
>        */
>       possibly_freeable = vacrelstats->rel_pages - vacrelstats->nonempty_pages;
> !     if (vacrelstats->tuples_deleted > 0 &&
> !         (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
> !          (possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION &&
> !           possibly_freeable > 0)))
>           lazy_truncate_heap(onerel, vacrelstats);
>   
Where did that "tuples_deleted > 0" condition come from? It seems 
counter-productive; if a previous vacuum failed to acquire the lock, 
subsequent vacuums wouldn't even try if they don't remove any tuples. 
How about simply:
***************
*** 183,190 ****       * number of pages.  Otherwise, the time taken isn't worth it.       */      possibly_freeable =
vacrelstats->rel_pages- 
 
vacrelstats->nonempty_pages;
!     if (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
!         possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION)          lazy_truncate_heap(onerel,
vacrelstats);
      /* Vacuum the Free Space Map */
--- 183,191 ----       * number of pages.  Otherwise, the time taken isn't worth it.       */      possibly_freeable =
vacrelstats->rel_pages- 
 
vacrelstats->nonempty_pages;
!     if (possibly_freeable > 0 &&
!         (possibly_freeable >= REL_TRUNCATE_MINIMUM ||
!          possibly_freeable >= vacrelstats->rel_pages / REL_TRUNCATE_FRACTION))          lazy_truncate_heap(onerel,
vacrelstats);
      /* Vacuum the Free Space Map */
--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com
			
		On Tue, 2009-01-06 at 15:48 +0200, Heikki Linnakangas wrote: > How about simply: OK -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Tue, 2009-01-06 at 15:48 +0200, Heikki Linnakangas wrote: >> How about simply: > > OK Committed and backpatched all the way back to 7.4 stable. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com