Обсуждение: vacuum analyze feedback
I know this topic has been rehashed a million times, but I just wanted to add one datapoint. I have a database (150 tables, less than 20K tuples in any one table) which I 'vacuum analyze'*HOURLY*, blocking all access, and I still see frequent situations where my query times bloat by roughly 300% (4 times slower) in the intervening time between vacuums. All this is to say that I think a more strategic implementation of the functionality of vacuum analyze (specifically, non-batched, automated, on-the-fly vacuuming/analyzing) would be a major "value add". I haven't educated myself as to the history of it, but I do wonder why the performance focus is not on this. I'd imagine it would be a performance hit (which argues for making it optional), but I'd gladly take a 10% performance hit over the current highly undesireable degradation. You could do a whole lotta optimization on the planner/parser/executor and not get close to the end-user-perceptible gains from fixing this problem... Regards, Ed Loehr
> I know this topic has been rehashed a million times, but I just wanted to > add one datapoint. I have a database (150 tables, less than 20K tuples > in any one table) which I 'vacuum analyze'*HOURLY*, blocking all access, > and I still see frequent situations where my query times bloat by roughly > 300% (4 times slower) in the intervening time between vacuums. All this > is to say that I think a more strategic implementation of the > functionality of vacuum analyze (specifically, non-batched, automated, > on-the-fly vacuuming/analyzing) would be a major "value add". I haven't > educated myself as to the history of it, but I do wonder why the > performance focus is not on this. I'd imagine it would be a performance > hit (which argues for making it optional), but I'd gladly take a 10% > performance hit over the current highly undesireable degradation. You > could do a whole lotta optimization on the planner/parser/executor and > not get close to the end-user-perceptible gains from fixing this > problem... > Vadim is planning over-write storage manager in 7.2 which will allow expired tuples to be reunsed without vacuum. Or is the ANALYZE the issue for you? You need hourly statistics? -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > > I know this topic has been rehashed a million times, but I just wanted to > > add one datapoint. I have a database (150 tables, less than 20K tuples > > in any one table) which I 'vacuum analyze'*HOURLY*, blocking all access, > > and I still see frequent situations where my query times bloat by roughly > > 300% (4 times slower) in the intervening time between vacuums. All this > > is to say that I think a more strategic implementation of the > > functionality of vacuum analyze (specifically, non-batched, automated, > > on-the-fly vacuuming/analyzing) would be a major "value add". I haven't > > educated myself as to the history of it, but I do wonder why the > > performance focus is not on this. I'd imagine it would be a performance > > hit (which argues for making it optional), but I'd gladly take a 10% > > performance hit over the current highly undesireable degradation. You > > could do a whole lotta optimization on the planner/parser/executor and > > not get close to the end-user-perceptible gains from fixing this > > problem... > > Vadim is planning over-write storage manager in 7.2 which will allow > expired tuples to be reunsed without vacuum. Sorry, I missed that in prior threads...that would be good. > Or is the ANALYZE the issue for you? Both, actually. More specifically, blocking end-user access during vacuum, and degraded end-user performance as pg_statistics diverge from reality. Both are losses of service from the system. > You need hourly statistics? My unstated point was that hourly stats have turned out *not* to be nearly good enough in my case. Better would be if the system was smart enough to recognize when the outcome of a query/plan was sufficiently divergent from statistics to warrant a system-initiated analyze (or whatever form it would take). I'll probably end up doing this detection from the app/client side, but that's not the right place for it, IMO. Regards, Ed Loehr
> Bruce Momjian wrote: > > > > > I know this topic has been rehashed a million times, but I just wanted to > > > add one datapoint. I have a database (150 tables, less than 20K tuples > > > in any one table) which I 'vacuum analyze'*HOURLY*, blocking all access, > > > and I still see frequent situations where my query times bloat by roughly > > > 300% (4 times slower) in the intervening time between vacuums. All this > > > is to say that I think a more strategic implementation of the > > > functionality of vacuum analyze (specifically, non-batched, automated, > > > on-the-fly vacuuming/analyzing) would be a major "value add". I haven't > > > educated myself as to the history of it, but I do wonder why the > > > performance focus is not on this. I'd imagine it would be a performance > > > hit (which argues for making it optional), but I'd gladly take a 10% > > > performance hit over the current highly undesireable degradation. You > > > could do a whole lotta optimization on the planner/parser/executor and > > > not get close to the end-user-perceptible gains from fixing this > > > problem... > > > > Vadim is planning over-write storage manager in 7.2 which will allow > > expired tuples to be reunsed without vacuum. > > Sorry, I missed that in prior threads...that would be good. > > > Or is the ANALYZE the issue for you? > > Both, actually. More specifically, blocking end-user access during > vacuum, and degraded end-user performance as pg_statistics diverge from > reality. Both are losses of service from the system. > > > You need hourly statistics? > > My unstated point was that hourly stats have turned out *not* to be > nearly good enough in my case. Better would be if the system was smart > enough to recognize when the outcome of a query/plan was sufficiently > divergent from statistics to warrant a system-initiated analyze (or > whatever form it would take). I'll probably end up doing this detection > from the app/client side, but that's not the right place for it, IMO. Yes, I think eventually, we need to feed information about actual query results back into the optimizer for use in later queries. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
At 15:54 25/05/00 -0400, Bruce Momjian wrote: > >Yes, I think eventually, we need to feed information about actual query >results back into the optimizer for use in later queries. > You could be a little more ambituous and do what Dec/Rdb does - use the results of current query execution to (possibly) cause a change in the current strategy. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
> At 15:54 25/05/00 -0400, Bruce Momjian wrote: > > > >Yes, I think eventually, we need to feed information about actual query > >results back into the optimizer for use in later queries. > > > > You could be a little more ambituous and do what Dec/Rdb does - use the > results of current query execution to (possibly) cause a change in the > current strategy. > yes. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026