Обсуждение: stats for failed transactions (was Re: [GENERAL] VACUUM Question)
"Matthew T. O'Connor" <matthew@zeut.net> writes: >> Also, somebody made a real good point about rolled-back insertions. >> Even if the only command you ever apply to the table is INSERT, you >> could still have dead rows in the table if some of those transactions >> occasionally roll back. > hmm... That's true. I don't think autovacuum doesn't anything to account > for the concept of rolledback inserts. I think this is the fault of the stats system design. AFAICT from a quick look at the code, inserted/updated/deleted tuples are reported to the collector in the same way regardless of whether the sending transaction committed or rolled back. I think this is unquestionably a bug, at least for autovacuum's purposes --- though it might be OK for the original intent of the stats system, which was simply to track activity levels. Any thoughts about how it ought to work? regards, tom lane
Tom Lane wrote: > I think this is the fault of the stats system design. AFAICT from a > quick look at the code, inserted/updated/deleted tuples are reported > to the collector in the same way regardless of whether the sending > transaction committed or rolled back. I think this is unquestionably > a bug, at least for autovacuum's purposes --- though it might be OK > for the original intent of the stats system, which was simply to track > activity levels. > > Any thoughts about how it ought to work? I don't remember exactly how it works -- I think the activity (insert, update, delete) counters are kept separately from commit/rollback status, right? Maybe we should keep three separate counters: "current transaction counters" and "counters for transactions that were aborted/committed". We only send the latter counts, and the former are added to them when the transaction ends. -- Alvaro Herrera http://www.advogato.org/person/alvherre "Aprende a avergonzarte más ante ti que ante los demás" (Demócrito)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Tom Lane wrote: >> I think this is unquestionably >> a bug, at least for autovacuum's purposes --- though it might be OK >> for the original intent of the stats system, which was simply to track >> activity levels. >> >> Any thoughts about how it ought to work? > I don't remember exactly how it works -- I think the activity (insert, > update, delete) counters are kept separately from commit/rollback > status, right? Maybe we should keep three separate counters: "current > transaction counters" and "counters for transactions that were > aborted/committed". We only send the latter counts, and the former are > added to them when the transaction ends. My question was at a higher level, actually: *what* should we be counting? I think doubling the number of counters in the stats system, which is what you seem to be proposing, is probably not acceptable --- we've already got a problem with the stats file becoming unreasonably bulky. We need to figure out exactly which counts there is adequate reason to be tracking. I don't, for instance, see any percentage in tracking block-level I/O operations separately for committed and rolled-back transactions. Those numbers are certainly things you watch only for total activity, and a failed xact is just as much system load as a committed one. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > Tom Lane wrote: > >> I think this is unquestionably > >> a bug, at least for autovacuum's purposes --- though it might be OK > >> for the original intent of the stats system, which was simply to track > >> activity levels. > >> > >> Any thoughts about how it ought to work? > > > I don't remember exactly how it works -- I think the activity (insert, > > update, delete) counters are kept separately from commit/rollback > > status, right? Maybe we should keep three separate counters: "current > > transaction counters" and "counters for transactions that were > > aborted/committed". We only send the latter counts, and the former are > > added to them when the transaction ends. > > My question was at a higher level, actually: *what* should we be > counting? Oh, I see. Do you think small incremental improvements to the stat system will buy us much? I think we should be thinking big here, i.e. rewrite most stuff instead. In the meantime, we should fix the minor issues but not spend too much time on it; IMHO anyway. I can devote some time to it starting from, say, mid february, which is when I think I'm going to have more time to spend on community stuff. (I've been spending the last couple of months on PL/php and internal Command Prompt stuff.) -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC "XML!" Exclaimed C++. "What are you doing here? You're not a programming language." "Tell that to the people who use me," said XML.
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Tom Lane wrote: >> My question was at a higher level, actually: *what* should we be >> counting? > Oh, I see. Do you think small incremental improvements to the stat > system will buy us much? I think we should be thinking big here, i.e. > rewrite most stuff instead. Uh, I wasn't aware of any proposals for a major rewrite. What did you have in mind? regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > Tom Lane wrote: > >> My question was at a higher level, actually: *what* should we be > >> counting? > > > Oh, I see. Do you think small incremental improvements to the stat > > system will buy us much? I think we should be thinking big here, i.e. > > rewrite most stuff instead. > > Uh, I wasn't aware of any proposals for a major rewrite. What did you > have in mind? Nothing yet really. But we mentioned it in the recent past. (About not using UDP, and also about the property of not having each backend load the whole stat file every time.) -- Alvaro Herrera http://www.PlanetPostgreSQL.org "This is a foot just waiting to be shot" (Andrew Dunstan)
Tom Lane wrote: >> hmm... That's true. I don't think autovacuum doesn't anything to account >> for the concept of rolledback inserts. >> > > I think this is the fault of the stats system design. AFAICT from a > quick look at the code, inserted/updated/deleted tuples are reported > to the collector in the same way regardless of whether the sending > transaction committed or rolled back. I think this is unquestionably > a bug, at least for autovacuum's purposes --- though it might be OK > for the original intent of the stats system, which was simply to track > activity levels. > > Any thoughts about how it ought to work? The fact that autovacuum bases it's decisions on info from the stats system is arguably an abuse of the original design. However I don't know of a better source of information at the moment. It has always been my vision that autovacuum will eventually incorporate additional information sources to make better informed decisions. There has always been discussion of using the FSM to help clue us in as to when we need another vacuum. Perhaps the addition of the vacuum space map that people are talking about will also help. None of this directly addresses the question of what the stats system *should* track, but perhaps it is wrongheaded to totally redesign the stats system for the purposes of autovacuum. As a quick semi-fix, perhaps autovacuum should look at the number of rollbacks vs. commits in an attempt to determine the accuracy of the stats. For example if 50% of the transactions are getting rolled back, then autovacuum might include 50% of the inserts in the count towards the vacuum threshold. Obviously this isn't perfect, but it probably gets us closer to reality with the information already available. Thoughts? Matt
"Matthew T. O'Connor" <matthew@zeut.net> writes: > None of this directly addresses the question of what the stats system > *should* track, but perhaps it is wrongheaded to totally redesign the > stats system for the purposes of autovacuum. I'd argue it's fine: there are tons of people using row-level stats via autovacuum, and (AFAICT) just about nobody using 'em for any other purpose. Certainly you never see anyone suggesting them as a tool for investigating problems on pgsql-performance. Sure, it's a repurposing of the stats subsystem, but we should be willing to do that when field experience teaches us what's really needed. > As a quick semi-fix, perhaps autovacuum should look at the number of > rollbacks vs. commits in an attempt to determine the accuracy of the > stats. For example if 50% of the transactions are getting rolled > back, then autovacuum might include 50% of the inserts in the count > towards the vacuum threshold. Obviously this isn't perfect, but it > probably gets us closer to reality with the information already > available. But all that we have is *global* counts. Inferring ratios applicable to particular tables seems a big stretch to me. Any given application is likely to have some types of transactions that roll back much more often than others. One thing we could do is tie the stats message sending more tightly to top-level transaction commit/abort. (It's already effectively true that we send stats only after commit/abort, but we send 'em from the wrong place, ie PostgresMain.) Then the semantics of the message could be read as "here's what I did before committing" or "here's what I did before aborting" and the collector could interpret the counts accordingly. However, this still fails in the case where a committed top-level transaction includes some failed subtransactions. I think the only full solution will involve backends doing some extra work at subtransaction commit/abort so that they can report properly classified update counts. regards, tom lane
Tom Lane wrote: > I'd argue it's fine: there are tons of people using row-level stats > via autovacuum, and (AFAICT) just about nobody using 'em for any other > purpose. Certainly you never see anyone suggesting them as a tool for > investigating problems on pgsql-performance. Sure, it's a repurposing > of the stats subsystem, but we should be willing to do that when field > experience teaches us what's really needed. > OK. > But all that we have is *global* counts. Inferring ratios applicable to > particular tables seems a big stretch to me. Any given application is > likely to have some types of transactions that roll back much more often > than others. > Certainly a stretch, but it's far from fatal, the worst case scenario is we occasionally vacuum a table that might not need it, whereas the way things stand right now, the worst case is that we never vacuum that might have a lot of slack space. BTW, I'm only arguing this based on what is in the stats system now and pondering how we might improve things if the stats system isn't changed to directly address this problem. > One thing we could do is tie the stats message sending more tightly to > top-level transaction commit/abort. (It's already effectively true that > we send stats only after commit/abort, but we send 'em from the wrong > place, ie PostgresMain.) Then the semantics of the message could be > read as "here's what I did before committing" or "here's what I did > before aborting" and the collector could interpret the counts > accordingly. However, this still fails in the case where a committed > top-level transaction includes some failed subtransactions. I think > the only full solution will involve backends doing some extra work at > subtransaction commit/abort so that they can report properly classified > update counts. Any guess as to the performance implications?
"Matthew T. O'Connor" <matthew@zeut.net> writes: > Tom Lane wrote: >> the only full solution will involve backends doing some extra work at >> subtransaction commit/abort so that they can report properly classified >> update counts. > Any guess as to the performance implications? Pushing some counts from one place to another doesn't seem that expensive, but it'd be nice to avoid scanning a lot of unrelated table-stats entries to find the ones that have to be adjusted. Not sure what it'll take exactly. Or we could blow it off for the time being. Certainly, getting things right at the top-transaction level would already be a big leg up in accuracy from where we are, and I don't think that would be hard at all. regards, tom lane
Tom, > I'd argue it's fine: there are tons of people using row-level stats > via autovacuum, and (AFAICT) just about nobody using 'em for any other > purpose. Certainly you never see anyone suggesting them as a tool for > investigating problems on pgsql-performance. Actually, I use the stats for performance tuning. However, I can't say that I care about the exact numbers; I'm just looking for columns which get "lots" of seq scans or indexes that don't get used. --Josh
On 1/27/2006 10:53 AM, Alvaro Herrera wrote: > Tom Lane wrote: > >> I think this is the fault of the stats system design. AFAICT from a >> quick look at the code, inserted/updated/deleted tuples are reported >> to the collector in the same way regardless of whether the sending >> transaction committed or rolled back. I think this is unquestionably >> a bug, at least for autovacuum's purposes --- though it might be OK >> for the original intent of the stats system, which was simply to track >> activity levels. >> >> Any thoughts about how it ought to work? > > I don't remember exactly how it works -- I think the activity (insert, > update, delete) counters are kept separately from commit/rollback > status, right? Maybe we should keep three separate counters: "current > transaction counters" and "counters for transactions that were > aborted/committed". We only send the latter counts, and the former are > added to them when the transaction ends. It's not a bug. More some "don't bother" attitude. The stats were not intended to be precise. Their sole purpose at the time of design and development was to give clues for missing or useless indexes, identify candidates to move onto different spindles and things like autovacuum. If the number of aborts you're riding significantly disturbs your statistic collection, you should look at some design issues with your own application instead. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On 1/27/2006 10:56 AM, Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >> Tom Lane wrote: >>> I think this is unquestionably >>> a bug, at least for autovacuum's purposes --- though it might be OK >>> for the original intent of the stats system, which was simply to track >>> activity levels. >>> >>> Any thoughts about how it ought to work? > >> I don't remember exactly how it works -- I think the activity (insert, >> update, delete) counters are kept separately from commit/rollback >> status, right? Maybe we should keep three separate counters: "current >> transaction counters" and "counters for transactions that were >> aborted/committed". We only send the latter counts, and the former are >> added to them when the transaction ends. > > My question was at a higher level, actually: *what* should we be > counting? > > I think doubling the number of counters in the stats system, which is > what you seem to be proposing, is probably not acceptable --- we've > already got a problem with the stats file becoming unreasonably bulky. > We need to figure out exactly which counts there is adequate reason > to be tracking. > > I don't, for instance, see any percentage in tracking block-level I/O > operations separately for committed and rolled-back transactions. > Those numbers are certainly things you watch only for total activity, > and a failed xact is just as much system load as a committed one. Right. The problem is that different questions need different counting semantics. If one looks at an application profile, he wants to know how many inserts, updates and deletes the app has issued, probably regardless of the final transaction outcome. But for autovac he wants to be able to figure out how many dead tuples there would be, so for that purpose inserts in a rolled back xact would count as ins+del where deletes would not count at all. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #