Обсуждение: Re[2]: Weird indices
Joseph, I think you're going a bit too far... Tom and Stephan have been very patient explaining you the basics of indices. >> The name of the game here is to make a plan *without* actually going >> out and expending large amounts of time to find out the true state of >> affairs; by the time you know for sure, you've already done the query. Believe this. All the best DB engines including PostgreSQL work that way. This is based on measures, on real life. JS> Well I'd hope that extracting the count from the index should be very JS> low cost. That is what indecies are for. No, indices are made for finding a record in one go or for isolating a small range of values. JS> But certain things could be done. Like planning for the case of there JS> being a single not null value, and updating the indecies not to point at JS> expired rows. And then you'll ask when there are 2 not null values...? JS> Isn't the point of a vacuum to get rid of old rows? Then JS> why doesn't it update the index as well? It does. Look at vacuum verbose. JS> I mean the explain shows that getting the count(*) from the field that JS> is indexed has to do a seq scan, presumably to determine if the rows are JS> in fact valid. count(*) means you want all the rows that have all the fields "not null". Read carefully : ALL THE FIELDS. JS> That is ridiculous. ahem. One solution to the problem is known as "optimizer hints" in Oracle : you specify directly in the query HOW the optimizer should execute the query. It's very useful in various situations. I have asked Tom many times if that exists in PostgreSQL but didn't get any answer. I guess it's on a TODO list somewhere ;-) -- Jean-Christophe Boggio cat@thefreecat.org Independant Consultant and Developer Delphi, Linux, Perl, PostgreSQL
Jean-Christophe Boggio <cat@thefreecat.org> writes: > JS> I mean the explain shows that getting the count(*) from the field that > JS> is indexed has to do a seq scan, presumably to determine if the rows are > JS> in fact valid. > count(*) means you want all the rows that have all the fields "not > null". Read carefully : ALL THE FIELDS. No, actually it just means "count the rows". count(f) for a field f (or more generally any expression f) counts the number of non-null values of f, but "*" just indicates count the rows. Nonetheless, it's not that easy to keep a running count(*) total for a table, even if we thought that select count(*) with no WHERE clause was a sufficiently critical operation to justify slowing down every other operation to keep the count(*) stats up to date. Think about committed vs not-committed transactions. In the worst case, each active transaction could have a different view of the table and thus a different idea of what count(*) should yield; and each transaction might have different pending updates that should affect the count(*) total when and if it commits. > ahem. One solution to the problem is known as "optimizer hints" in > Oracle : you specify directly in the query HOW the optimizer should > execute the query. It's very useful in various situations. I have > asked Tom many times if that exists in PostgreSQL but didn't get any > answer. I guess it's on a TODO list somewhere ;-) Not on mine ;-). I don't believe in the idea, first because it's not standard SQL, and second because I don't trust the user to know better than the system what the best plan is in a particular context. Hints that you put in last year may have been the right thing at the time (or not...) but they'll still be lying there forgotten in your code when the table contents and the Postgres implementation have changed beyond recognition. Yes, the optimizer needs work, and it'll get that work over time --- but a hint that's wrong is worse than no hint. I'd rather have Postgres blamed for performance problems of its own making than those of the user's making. regards, tom lane
Jean-Christophe Boggio wrote: > > Joseph, > > I think you're going a bit too far... Tom and Stephan have been very > patient explaining you the basics of indices. > They are being patient. I thank them. I'm not trying to point fingers at anybody and say 'you idiot! why aren't you thinking like me?'. It's just that this whole visibility meets index meets planner thing is still eluding me. > >> The name of the game here is to make a plan *without* actually going > >> out and expending large amounts of time to find out the true state of > >> affairs; by the time you know for sure, you've already done the query. > > Believe this. All the best DB engines including PostgreSQL work that > way. This is based on measures, on real life. > I believe it. I just seemed to me that that looking at the index should have been before the cutoff point. And actually seemed to be in my little experiment. > JS> Well I'd hope that extracting the count from the index should be very > JS> low cost. That is what indecies are for. > > No, indices are made for finding a record in one go or for isolating a > small range of values. > When I learned data structures in cs 102 I learned how to use an index to quickly get a count. I assume the postgres developers know this as well. > JS> But certain things could be done. Like planning for the case of there > JS> being a single not null value, and updating the indecies not to point at > JS> expired rows. > > And then you'll ask when there are 2 not null values...? I think I mispoke myself there. I meant the case of there being the single repeated value. I can think offhand of a few ways to do it but I don't know how postgres uses indeces. > > JS> Isn't the point of a vacuum to get rid of old rows? Then > JS> why doesn't it update the index as well? > > It does. Look at vacuum verbose. > In my test case it didn't. My database is vacuumed every night, and I hadn't touched it that day before I did my test. Therefore it should have known there were only 16 matches and not 52. > JS> I mean the explain shows that getting the count(*) from the field that > JS> is indexed has to do a seq scan, presumably to determine if the rows are > JS> in fact valid. > > count(*) means you want all the rows that have all the fields "not > null". Read carefully : ALL THE FIELDS. Uh, no. Tom said so in a different reply. I understand that keeping different views for different open transactions can be difficult, but after a transaction that updates a row is over why isn't the row marked as 'universally visible' for all new transactions until another update occurs? Maybe I'm not making myself understood. Another way of asking the same thing: Say there is a transaction that is looking at a non-current version of a row. 'non-current' could be the value it was at the start of the transaction (and was updated by another transaction) or was updated by this transaction but not committed yet. When this transaction is over is it really that hard to get rid of the refrence to the old version of the row? There should be a 1 bit field 'is old value and isn't being used by any transaction'. Is that really hard? Maybe this is part of the whole 'vacuum later' vs. 'update now' philosophy. If the point of vacuum later is to put off the performance hit until later if it is causing these performance hits on queries because index scans aren't being used then doesn't that mean 'update now' is more likely to pay off in the short run? -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com
Joseph Shraibman <jks@selectacast.net> writes: > Maybe I'm not making myself understood. Another way of asking the same > thing: > Say there is a transaction that is looking at a non-current version of a > row. 'non-current' could be the value it was at the start of the > transaction (and was updated by another transaction) or was updated by > this transaction but not committed yet. When this transaction is over > is it really that hard to get rid of the refrence to the old version of > the row? There should be a 1 bit field 'is old value and isn't being > used by any transaction'. Is that really hard? Sure, it's easy to do that sort of bookkeeping ... on a per-row basis. And we do. What's not so easy (an index helps not at all) is to summarize N per-row status values into a single count(*) statistic that you can maintain in a way significantly cheaper than just scanning the rows when you need the count(*) value. Especially when the per-row status values interact with the state values of the observing process to determine what it should think count(*) really is. The issue is not really "could we make count(*) fast"? Yeah, we probably could, if that were the only measure of performance we cared about. The real issue is "can we do it at a price we're willing to pay, considering the costs of slowdown of insert/update/delete operations, extra storage space, and extra system complexity?" So far the answer's been "no". You might want to look at the manual's discussion of MVCC and at the Postgres internals talks that were given at OSDN (see slides at http://www.postgresql.org/osdn/index.html) to learn more about how things work. regards, tom lane
Joseph Shraibman <jks@selectacast.net> writes: A caveat on this reply: I've been studying the Postgres internals, but I have not mastered them. > I understand that keeping different views for different open > transactions can be difficult, but after a transaction that updates a > row is over why isn't the row marked as 'universally visible' for all > new transactions until another update occurs? It is. This mark is on the tuple in the heap. When a tuple is current, and not locked for update, HEAP_XMAX_INVALID is set. After the tuple is removed, HEAP_XMAX_COMMITTED is set. > Maybe I'm not making myself understood. Another way of asking the same > thing: > Say there is a transaction that is looking at a non-current version of a > row. 'non-current' could be the value it was at the start of the > transaction (and was updated by another transaction) or was updated by > this transaction but not committed yet. When this transaction is over > is it really that hard to get rid of the refrence to the old version of > the row? There should be a 1 bit field 'is old value and isn't being > used by any transaction'. Is that really hard? There is a 1 bit field indicating that a tuple is an old value. Postgres can also determine whether any transaction can see the tuple. It does this by storing the transaction ID in the t_xmax field. If all current transactions are newer than that transaction ID, then that tuple is no longer visible to any transaction. In fact, I believe that is what the VACUUM command looks for. > Maybe this is part of the whole 'vacuum later' vs. 'update now' > philosophy. If the point of vacuum later is to put off the performance > hit until later if it is causing these performance hits on queries > because index scans aren't being used then doesn't that mean 'update > now' is more likely to pay off in the short run? I don't follow. A simple VACUUM doesn't update the statistics. VACUUM ANALYZE has to do more work. Are you suggesting that the statistics should be updated continuously? I guess that would be doable, but it would clearly slow down the database. For some applications, it would be an obviously bad idea. Ian
Ian Lance Taylor wrote: > > Joseph Shraibman <jks@selectacast.net> writes: > > A caveat on this reply: I've been studying the Postgres internals, but > I have not mastered them. > > > I understand that keeping different views for different open > > transactions can be difficult, but after a transaction that updates a > > row is over why isn't the row marked as 'universally visible' for all > > new transactions until another update occurs? > > It is. This mark is on the tuple in the heap. When a tuple is > current, and not locked for update, HEAP_XMAX_INVALID is set. After > the tuple is removed, HEAP_XMAX_COMMITTED is set. On the heap, but when is the index updated? Not until the next vacuum? > > > Maybe I'm not making myself understood. Another way of asking the same > > thing: > > Say there is a transaction that is looking at a non-current version of a > > row. 'non-current' could be the value it was at the start of the > > transaction (and was updated by another transaction) or was updated by > > this transaction but not committed yet. When this transaction is over > > is it really that hard to get rid of the refrence to the old version of > > the row? There should be a 1 bit field 'is old value and isn't being > > used by any transaction'. Is that really hard? > > There is a 1 bit field indicating that a tuple is an old value. > Postgres can also determine whether any transaction can see the > tuple. It does this by storing the transaction ID in the t_xmax > field. If all current transactions are newer than that transaction > ID, then that tuple is no longer visible to any transaction. > > In fact, I believe that is what the VACUUM command looks for. > > > Maybe this is part of the whole 'vacuum later' vs. 'update now' > > philosophy. If the point of vacuum later is to put off the performance > > hit until later if it is causing these performance hits on queries > > because index scans aren't being used then doesn't that mean 'update > > now' is more likely to pay off in the short run? > > I don't follow. A simple VACUUM doesn't update the statistics. > VACUUM ANALYZE has to do more work. I'm talking about indices. The index should be updated to only point at valid rows. But if the index isn't used by the planner then the point is moot. > > Are you suggesting that the statistics should be updated continuously? > I guess that would be doable, but it would clearly slow down the > database. For some applications, it would be an obviously bad idea. No, I'm suggesting that indices should be updated continuously so the planner can use them without having a performance hit from checking if tuples are valid or not. BTW is there any way to tell postgres to do an update at every commit without waiting for a vacuum? I understand that the postgres core team thinks it is a bad idea, but there are ways to (sort of) force using index scans whent he planner doesn't want to, so is there something similar to force incremental vacuuming at the end of each query? Java has this option: -Xincgc enable incremental garbage collection that isn't recommended, but the java developers recognized that sometimes a user might want it anyway for whatever reason. -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com
Err I wan't complaing about count(*) per se, I was just using that as a simple example of something that should be done with an index. Because if the index doesn't have to worry about rows that aren't current then you don't even have to go into the heap because the index alone should have enough information to do that. If it doesn't have to worry about rows that aren't visible. Tom Lane wrote: > > Joseph Shraibman <jks@selectacast.net> writes: > > Maybe I'm not making myself understood. Another way of asking the same > > thing: > > Say there is a transaction that is looking at a non-current version of a > > row. 'non-current' could be the value it was at the start of the > > transaction (and was updated by another transaction) or was updated by > > this transaction but not committed yet. When this transaction is over > > is it really that hard to get rid of the refrence to the old version of > > the row? There should be a 1 bit field 'is old value and isn't being > > used by any transaction'. Is that really hard? > > Sure, it's easy to do that sort of bookkeeping ... on a per-row basis. > And we do. What's not so easy (an index helps not at all) is to > summarize N per-row status values into a single count(*) statistic that > you can maintain in a way significantly cheaper than just scanning the > rows when you need the count(*) value. Especially when the per-row > status values interact with the state values of the observing process > to determine what it should think count(*) really is. > > The issue is not really "could we make count(*) fast"? Yeah, we > probably could, if that were the only measure of performance we cared > about. The real issue is "can we do it at a price we're willing to pay, > considering the costs of slowdown of insert/update/delete operations, > extra storage space, and extra system complexity?" So far the answer's > been "no". > > You might want to look at the manual's discussion of MVCC and at the > Postgres internals talks that were given at OSDN (see slides at > http://www.postgresql.org/osdn/index.html) to learn more about how > things work. > Ugh, pdf format. I'll have to remember to look at them next time I'm on a windows box. > regards, tom lane -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com
Joseph Shraibman <jks@selectacast.net> writes: > > > I understand that keeping different views for different open > > > transactions can be difficult, but after a transaction that updates a > > > row is over why isn't the row marked as 'universally visible' for all > > > new transactions until another update occurs? > > > > It is. This mark is on the tuple in the heap. When a tuple is > > current, and not locked for update, HEAP_XMAX_INVALID is set. After > > the tuple is removed, HEAP_XMAX_COMMITTED is set. > > On the heap, but when is the index updated? Not until the next vacuum? The index is updated right away. Otherwise it could never be used, since it would be inaccurate. When a new tuple is inserted in the heap, the index is updated to point to the new tuple. This involves inserting new tuples into the index. The old tuples in the index are left untouched, and continue to point to the old tuple in the heap. The old tuples in the index, and the heap, are removed by VACUUM. VACUUM walks through the index and checks the heap tuple corresponding to each index tuple. If the heap tuple is gone, or can no longer be seen by any transaction, then the index tuple can be removed. Note that this all implies that when walking through the index to find heap tuples, you must check the current validity of each heap tuple. It is normal for an index tuple to point to a heap tuple which has been deleted. > > > Maybe this is part of the whole 'vacuum later' vs. 'update now' > > > philosophy. If the point of vacuum later is to put off the performance > > > hit until later if it is causing these performance hits on queries > > > because index scans aren't being used then doesn't that mean 'update > > > now' is more likely to pay off in the short run? > > > > I don't follow. A simple VACUUM doesn't update the statistics. > > VACUUM ANALYZE has to do more work. > > I'm talking about indices. The index should be updated to only point at > valid rows. When should the index be updated to only point at valid rows? That is only possible when a heap tuple is finally and completely removed. But currently that is only known at the time of a VACUUM. Consider a transaction which sits around for a while and then aborts. At the moment that the transaction aborts, Postgres may become able to remove some heap tuples and some index tuples, and it might be invalid for Postgres to remove those tuples before the transaction aborts. But the transaction might never have looked at those tuples. So, given the Postgres data structures, the only way to keep an index fully up to date would be to effectively run a VACUUM over the entire database every time a transaction aborts. OK, that's not quite true. It would be possible to keep a list in shared memory of tuples which were recently dropped. Then as transactions dropped out, it would be possible to see which ones could be completely removed. This list of tuples would presumably include index tuples. > But if the index isn't used by the planner then the point > is moot. As far as I know the index itself isn't used by the planner. > > Are you suggesting that the statistics should be updated > > continuously? I guess that would be doable, but it would clearly > > slow down the database. For some applications, it would be an > > obviously bad idea. > > No, I'm suggesting that indices should be updated continuously so the > planner can use them without having a performance hit from checking if > tuples are valid or not. Well, as far as I know, the planner doesn't use the index. It only uses the statistics. > BTW is there any way to tell postgres to do an update at every commit > without waiting for a vacuum? I understand that the postgres core team > thinks it is a bad idea, but there are ways to (sort of) force using > index scans whent he planner doesn't want to, so is there something > similar to force incremental vacuuming at the end of each query? > > Java has this option: > -Xincgc enable incremental garbage collection > > that isn't recommended, but the java developers recognized that > sometimes a user might want it anyway for whatever reason. I don't think there is any way to do that today. It would be possible to implement something along the lines I suggest above. I have no idea if the Postgres maintainers have any plans along these lines. Ian
On Tue, 20 Feb 2001, Joseph Shraibman wrote: > Err I wan't complaing about count(*) per se, I was just using that as a > simple example of something that should be done with an index. Because > if the index doesn't have to worry about rows that aren't current then > you don't even have to go into the heap because the index alone should > have enough information to do that. If it doesn't have to worry about > rows that aren't visible. But the problem is how do you deal with concurrency? At any given point in time there are different sets of rows that are "current" for different transactions. They all need to be in the index so that index scans work for those transactions (unless you were to do something hacky to get around it) but not all of them are valid for each transaction, you still have to get that information somehow. You can keep the transaction information in the index but I know Tom's talked this idea down in the past (it's come up on hackers before), I don't really remember what the full arguments were both ways.
Ian Lance Taylor wrote: > > Joseph Shraibman <jks@selectacast.net> writes: > > > > > I understand that keeping different views for different open > > > > transactions can be difficult, but after a transaction that updates a > > > > row is over why isn't the row marked as 'universally visible' for all > > > > new transactions until another update occurs? > > > > > > It is. This mark is on the tuple in the heap. When a tuple is > > > current, and not locked for update, HEAP_XMAX_INVALID is set. After > > > the tuple is removed, HEAP_XMAX_COMMITTED is set. > > > > On the heap, but when is the index updated? Not until the next vacuum? > > The index is updated right away. Otherwise it could never be used, > since it would be inaccurate. I meant cleaned up. Which you answered: when vacuumed. <snip> > > Note that this all implies that when walking through the index to find > heap tuples, you must check the current validity of each heap tuple. > It is normal for an index tuple to point to a heap tuple which has > been deleted. <snip> > > > > I'm talking about indices. The index should be updated to only point at > > valid rows. > > When should the index be updated to only point at valid rows? That is > only possible when a heap tuple is finally and completely removed. > But currently that is only known at the time of a VACUUM. > You just said above 'It is normal for an index tuple to point to a heap tuple which has been deleted.' > Consider a transaction which sits around for a while and then aborts. > At the moment that the transaction aborts, Postgres may become able to > remove some heap tuples and some index tuples, and it might be invalid > for Postgres to remove those tuples before the transaction aborts. > > But the transaction might never have looked at those tuples. So, > given the Postgres data structures, the only way to keep an index > fully up to date would be to effectively run a VACUUM over the entire > database every time a transaction aborts. Why? There is a mechanism for keeping track of which heap tuples are valid, why not index tuples? It is the nature of indices to be updated on inserts, why not deletes? I would think that the advantage of being able to use the index in the planner would outweigh the immediate cost of doing the update. <snip> > > > But if the index isn't used by the planner then the point > > is moot. > > As far as I know the index itself isn't used by the planner. > But could be. As I understand it the reason the index isn't used by the planner is because the index could point at non-visible rows (row = heap tuple). If the index could be used, many things now that are seq scans could be converted to faster index scans. <snip> > I don't think there is any way to do that today. It would be possible > to implement something along the lines I suggest above. I have no > idea if the Postgres maintainers have any plans along these lines. > At the end of a transaction, when it sets the bit that this tuple isn't valid, couldn't it at the same time also remove it if was no longer visible to any transaction? It wouldn't remove the need for vacuum because there may be another transaction that prevents it from being removed right then and there. But for index tuples we could use your list system because (as I see it) the value of being able to use the index in the planner would outweigh the cost of the list system. > Ian -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com
Joseph Shraibman <jks@selectacast.net> writes: > Why? There is a mechanism for keeping track of which heap tuples are > valid, why not index tuples? It is the nature of indices to be updated > on inserts, why not deletes? An index is a hint: these tuples *might* be of interest to your transaction. It's OK for an index to point to some irrelevant tuples, but it's useless if it fails to point to all the possibly relevant tuples. Therefore, it's OK to insert an index entry at the earliest possible instant (as soon as a yet-uncommitted heap tuple is inserted); and contrariwise the index entry can't be deleted until the heap tuple can be proven to be no longer of interest to any still-alive transaction. Currently, proving that a heap tuple is globally no-longer-of-interest and removing it and its associated index tuples is the task of VACUUM. Intermediate state transitions (eg, this tuple has been deleted by a not-yet-committed transaction) are recorded in the heap tuple, but we don't try to look around and update all the associated index tuples at the same time. Maintaining that same state in all the index tuples would be expensive and would bloat the indexes. An index that's not a lot smaller than the associated heap is of little value, so extra bits in an index entry are to be feared. These are very fundamental system design decisions. If you'd like to show us the error of our ways, step right up to the plate and swing away; but unsubstantiated suggestions that these choices are wrong are not going to be taken with any seriousness. Postgres has come pretty far on the basis of these design choices. regards, tom lane
Joseph Shraibman <jks@selectacast.net> writes: > > Note that this all implies that when walking through the index to find > > heap tuples, you must check the current validity of each heap tuple. > > It is normal for an index tuple to point to a heap tuple which has > > been deleted. > > <snip> > > > > > > I'm talking about indices. The index should be updated to only point at > > > valid rows. > > > > When should the index be updated to only point at valid rows? That is > > only possible when a heap tuple is finally and completely removed. > > But currently that is only known at the time of a VACUUM. > > > You just said above 'It is normal for an index tuple to point to a heap > tuple which has been deleted.' I'm not sure what your point is here. I can see that there is an ambiguity in what I said. I said it is normal for an index tuple to point to a heap tuple which has been deleted. However, although the heap tuple has been deleted in present time, there may still be transactions which do not see that heap tuple as having been deleted. So the index tuple is still meaningful until all those transactions have completed. When all such transactions have completed is the case I meant when I described the heap tuple as finally and completely removed. > > Consider a transaction which sits around for a while and then aborts. > > At the moment that the transaction aborts, Postgres may become able to > > remove some heap tuples and some index tuples, and it might be invalid > > for Postgres to remove those tuples before the transaction aborts. > > > > But the transaction might never have looked at those tuples. So, > > given the Postgres data structures, the only way to keep an index > > fully up to date would be to effectively run a VACUUM over the entire > > database every time a transaction aborts. > > Why? There is a mechanism for keeping track of which heap tuples are > valid, why not index tuples? It is the nature of indices to be updated > on inserts, why not deletes? I would think that the advantage of being > able to use the index in the planner would outweigh the immediate cost > of doing the update. You're right. The mechanism used to preserve multiple versions of heap tuples could be extended to index tuples as well. Based on the heap tuple implementation, this would require adding two transaction ID's and a few flags to each index tuple. That's not insignificant. In a B-tree, right now, I think there are 8 bytes plus the key for each item in the tree. This would require adding another 10 bytes or so. That's a lot. Also, more work would be required for every update. Right now an update requires a B-tree insert for each index. With this change, every update would require an additional B-tree lookup and write for each index. That would require on average a bit less than one additional block write per index. That's a lot. In exchange, certain queries would become faster. Specifically, any query which only needed the information found in an index would become faster. Each such query would save on average a bit less than one additional block read per value found in the index. But since the indices would be less efficient, some of the advantage would be lost due to extra block reads of the index. What you are suggesting seems possible, but it does not seem to be obviously better. If you feel strongly about this, the most reasonable thing would be for you to implement it, and test the results. Since as far as I can see what you are suggesting is not clearly better, it's unlikely that anybody else is going to go to the considerable effort of implement it on your behalf. > > > But if the index isn't used by the planner then the point > > > is moot. > > > > As far as I know the index itself isn't used by the planner. > > > But could be. As I understand it the reason the index isn't used by the > planner is because the index could point at non-visible rows (row = heap > tuple). If the index could be used, many things now that are seq scans > could be converted to faster index scans. Some things could, sure. It's not obvious to me that many things could. The planner can't spend a lot of time looking at an index to decide whether or not to use it. If it's going to do that, it's better off to just decide to use the index in the first place. Index examination is not free. It requires disk reads just like everything else. > > I don't think there is any way to do that today. It would be possible > > to implement something along the lines I suggest above. I have no > > idea if the Postgres maintainers have any plans along these lines. > > > At the end of a transaction, when it sets the bit that this tuple isn't > valid, couldn't it at the same time also remove it if was no longer > visible to any transaction? It wouldn't remove the need for vacuum > because there may be another transaction that prevents it from being > removed right then and there. Yes, this could be done. It wouldn't speed things up, though. In fact, it would slow them down. The only advantage would be that VACUUM would be required less often--an advantage which is not insignificant. I would guess that in the average multi-user database less than half of the tuples could be deleted at that point. It would be easy to instrument Postgres to test this--why don't you try that? Ian
Ian Lance Taylor wrote: > <snip> > You're right. The mechanism used to preserve multiple versions of > heap tuples could be extended to index tuples as well. > > Based on the heap tuple implementation, this would require adding two > transaction ID's and a few flags to each index tuple. That's not > insignificant. In a B-tree, right now, I think there are 8 bytes plus > the key for each item in the tree. This would require adding another > 10 bytes or so. That's a lot. > OK now this is starting to make sense to me. I think. I guess I'll really have to sift throught the code again to figure out the rest. > Also, more work would be required for every update. Right now an > update requires a B-tree insert for each index. With this change, > every update would require an additional B-tree lookup and write for > each index. That would require on average a bit less than one > additional block write per index. That's a lot. > > In exchange, certain queries would become faster. Specifically, any > query which only needed the information found in an index would become > faster. Each such query would save on average a bit less than one > additional block read per value found in the index. But since the > indices would be less efficient, some of the advantage would be lost > due to extra block reads of the index. > > What you are suggesting seems possible, but it does not seem to be > obviously better. It may not be as obvious as it first seemed to me, but I bet there are certain databases out there that have just the right pattern of data that would benefit from this. I suppose this is something that compilers have tried to balance all along. Maybe there could be a different type of index that could be manually added by admins who wanted to fiddle around with their database. > > If you feel strongly about this, the most reasonable thing would be > for you to implement it, and test the results. Since as far as I can > see what you are suggesting is not clearly better, it's unlikely that > anybody else is going to go to the considerable effort of implement it > on your behalf. > <snip> > Some things could, sure. It's not obvious to me that many things > could. The planner can't spend a lot of time looking at an index to > decide whether or not to use it. If it's going to do that, it's > better off to just decide to use the index in the first place. Index > examination is not free. It requires disk reads just like everything > else. > Not free, but possibly worth it if it saves a seq scan. > > > I don't think there is any way to do that today. It would be possible > > > to implement something along the lines I suggest above. I have no > > > idea if the Postgres maintainers have any plans along these lines. > > > > > At the end of a transaction, when it sets the bit that this tuple isn't > > valid, couldn't it at the same time also remove it if was no longer > > visible to any transaction? It wouldn't remove the need for vacuum > > because there may be another transaction that prevents it from being > > removed right then and there. > > Yes, this could be done. It wouldn't speed things up, though. In > fact, it would slow them down. The only advantage would be that > VACUUM would be required less often--an advantage which is not > insignificant. > > I would guess that in the average multi-user database less than half > of the tuples could be deleted at that point. It would be easy to > instrument Postgres to test this--why don't you try that? > I just might. I've been thinking of hacking postgres, but for adding xml support to postgres. That seems to be mostly a matter of parsing. -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com
> > Also, more work would be required for every update. Right now an > > update requires a B-tree insert for each index. With this change, > > every update would require an additional B-tree lookup and write for > > each index. That would require on average a bit less than one > > additional block write per index. That's a lot. > > > > In exchange, certain queries would become faster. Specifically, any > > query which only needed the information found in an index would become > > faster. Each such query would save on average a bit less than one > > additional block read per value found in the index. But since the > > indices would be less efficient, some of the advantage would be lost > > due to extra block reads of the index. > > > > What you are suggesting seems possible, but it does not seem to be > > obviously better. > > It may not be as obvious as it first seemed to me, but I bet there are > certain databases out there that have just the right pattern of data > that would benefit from this. I suppose this is something that > compilers have tried to balance all along. Maybe there could be a > different type of index that could be manually added by admins who > wanted to fiddle around with their database. If you want performance options, MySQL is the champ. We usually require a clear reason to give users more options because too many options can be quite confusing. It is more a design philosophy. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026