Обсуждение: vacuum internals and performance affect
dear all, i am trying to understand if i am missing something on how vacuum works. i ve read the manual, and did some research on the web about that but i am still not sure. to my understanding, vacuum just marks the dead rows of a table so that from that point on that space would be re-used for new inserts and new updates on that specific table. however, if there is an open transaction, vacuum can only do what is described above up to the point that the open transaction was started. so if for example there is a query running for 1 day, no matter how many times i will have vacuumed the table (manual or auto), the dead rows wont be possible to be marked as re-usable space. -is the above correct? -is there something more about vacuum in that case i am describing? would for example mark the rows as 'semi-dead' so that when a scan would be made these rows wouldn't be checked and so the queries would be faster? is there anything else for this specific case? -would there be any effect from the vacuum on the indexes of the table?like i said above for the table, would the entries of the index not be scanned for a query, due to some reason? if there is a something i could read to answer these questions plz point me to that direction, otherwise i would really appreciate any information you may have. thx in advance -- View this message in context: http://postgresql.1045698.n5.nabble.com/vacuum-internals-and-performance-affect-tp5033043p5033043.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
MX, > to my understanding, vacuum just marks the dead rows of a table so that from > that point on that space would be re-used for new inserts and new updates on > that specific table. however, if there is an open transaction, vacuum can > only do what is described above up to the point that the open transaction > was started. so if for example there is a query running for 1 day, no matter > how many times i will have vacuumed the table (manual or auto), the dead > rows wont be possible to be marked as re-usable space. > -is the above correct? More or less. The transactionID isn't a timestamp, so the "stop point" is based on snapshots rather than a point-in-time. But that's a fine distinction. > -is there something more about vacuum in that case i am describing? would > for example mark the rows as 'semi-dead' so that when a scan would be made > these rows wouldn't be checked and so the queries would be faster? is there > anything else for this specific case? Well, vacuum does some other work, yes. > -would there be any effect from the vacuum on the indexes of the table?like > i said above for the table, would the entries of the index not be scanned > for a query, due to some reason? Vacuum also does some pruning dead index pointers. Otherwise, I'm not sure what you're asking. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
thx a lot for your answer :) so when a transaction is still open from a while back (according to the transactionID), no 'new dead' tuples can be marked as re-usable space for new rows, right? by 'new dead' i mean that for example there is a transaction running from 10.00am(with a specific transactionID). when i delete rows at 11.00am these are the ones i am referring to. the same thing happens with the index, right? the dead enties for the rows that were deleted at 11.00am cannot be removed yet (this is not based on the timestamp, i get it, i just want to point out that due to MVCC these rows should be visible to the old transaction and by using timestamps this is more obvious) but, for these rows, the 'deleted' ones. does vacuum do anything at all at that time? and if so, what is it? thx in advance -- View this message in context: http://postgresql.1045698.n5.nabble.com/vacuum-internals-and-performance-affect-tp5033043p5036800.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
MirrorX, > so when a transaction is still open from a while back (according to the > transactionID), no 'new dead' tuples can be marked as re-usable space for > new rows, right? by 'new dead' i mean that for example there is a > transaction running from 10.00am(with a specific transactionID). when i > delete rows at 11.00am these are the ones i am referring to. With the understanding that what we're actually checking is snapshots (which are not completely linear) and not timestamps, yes, that's a good simplification for what happens. > but, for these rows, the 'deleted' ones. does vacuum do anything at all at > that time? and if so, what is it? thx in advance No, it does nothing. What would it do? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
from what i ve read and have i ve seen in practice, i expected it to do nothing at all. i just wanted to be absolutely sure and that's why i asked here. thank you very much for the clarification -- View this message in context: http://postgresql.1045698.n5.nabble.com/vacuum-internals-and-performance-affect-tp5033043p5039677.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.