Обсуждение: replace all with * in pg_hba.conf
If people are happy with Tom's suggestion of using '*' instead of 'all' in pg_hba.conf I will prepare a patch for it. (I will also replace the ugly long IP6 localhost netmask with a CIDR mask). cheers andrew
Andrew Dunstan wrote: > If people are happy with Tom's suggestion of using '*' instead of > 'all' in pg_hba.conf I will prepare a patch for it. Well, while we're breaking stuff in the name of improvement, what about databases named "*" or databases with spaces in their names?
Peter Eisentraut wrote: >Andrew Dunstan wrote: > > >>If people are happy with Tom's suggestion of using '*' instead of >>'all' in pg_hba.conf I will prepare a patch for it. >> >> > >Well, while we're breaking stuff in the name of improvement, what about >databases named "*" or databases with spaces in their names? > > Good point. Perhaps we need to provide for an escaping mechanism in the routines that parse the file, although personally I have little sympathy for anyone who names a database '*'. I think it comes into the category of "Doctor, it hurts when I do this" ... "Then stop doing that." Spaces are a more likely problem, especially when we get W32 native users. cheers andrew
I wrote: > Peter Eisentraut wrote: > >> Andrew Dunstan wrote: >> >> >>> If people are happy with Tom's suggestion of using '*' instead of >>> 'all' in pg_hba.conf I will prepare a patch for it. >>> >> >> >> Well, while we're breaking stuff in the name of improvement, what >> about databases named "*" or databases with spaces in their names? >> >> > > Good point. Perhaps we need to provide for an escaping mechanism in > the routines that parse the file, although personally I have little > sympathy for anyone who names a database '*'. I think it comes into > the category of "Doctor, it hurts when I do this" ... "Then stop doing > that." Spaces are a more likely problem, especially when we get W32 > native users. Looking at the code I discover that there is already provision covering spaces etc., because you can quote names. It's even documented ;-) The minimal disturbance change might be to teach the parser to distinguish between a quoted 'all' and an unquoted 'all', and forget the '*' idea. Alternatively, do the same sort of thing, but replacing 'all' with '*'. A patch for the first would be quite tiny - similar for '*' except for extra doc and sample file changes. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes:
> The minimal disturbance change might be to teach the parser to 
> distinguish between a quoted 'all' and an unquoted 'all', and forget the 
> '*' idea.
Probably we ought to go with that, on backwards-compatibility grounds.
        regards, tom lane
			
		On Fri, 19 Dec 2003, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > The minimal disturbance change might be to teach the parser to > > distinguish between a quoted 'all' and an unquoted 'all', and forget the > > '*' idea. > > Probably we ought to go with that, on backwards-compatibility grounds. why not do both, but deprecate the use of all in the docs? say with an eventual goal of removing the use of all altogether in 2 releases? ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Marc G. Fournier wrote: >On Fri, 19 Dec 2003, Tom Lane wrote: > > > >>Andrew Dunstan <andrew@dunslane.net> writes: >> >> >>>The minimal disturbance change might be to teach the parser to >>>distinguish between a quoted 'all' and an unquoted 'all', and forget the >>>'*' idea. >>> >>> >>Probably we ought to go with that, on backwards-compatibility grounds. >> >> > >why not do both, but deprecate the use of all in the docs? say with an >eventual goal of removing the use of all altogether in 2 releases? > > > The extra value is fairly low IMNSHO if we can distinguish between a magical and a non-magical 'all' - see the patch I just submitted. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes:
> Marc G. Fournier wrote:
>> why not do both, but deprecate the use of all in the docs?  say with an
>> eventual goal of removing the use of all altogether in 2 releases?
> The extra value is fairly low IMNSHO if we can distinguish between a 
> magical and a non-magical 'all' - see the patch I just submitted.
Also, your point about the special sameuser and samegroup keywords is a
good one.  We should make those all work consistently (ie, quoting makes
it not a keyword).  Going in the "*" direction would only make sense
if that were the only special case --- but it isn't.  I don't think we
want to start choosing random symbols for sameuser, samegroup, and other
stuff we might think of in future.
        regards, tom lane
			
		Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>Marc G. Fournier wrote: >> >> >>>why not do both, but deprecate the use of all in the docs? say with an >>>eventual goal of removing the use of all altogether in 2 releases? >>> >>> > > > >>The extra value is fairly low IMNSHO if we can distinguish between a >>magical and a non-magical 'all' - see the patch I just submitted. >> >> > >Also, your point about the special sameuser and samegroup keywords is a >good one. We should make those all work consistently (ie, quoting makes >it not a keyword). Going in the "*" direction would only make sense >if that were the only special case --- but it isn't. I don't think we >want to start choosing random symbols for sameuser, samegroup, and other >stuff we might think of in future. > > > Right. Revised patch sent to patches list. cheers andrew
Is there a way to force database to load a frequently-accessed table into cache and keep it there? Thanks, Mike.
Michael Brusser <michael@synchronicity.com> writes: > Is there a way to force database to load a frequently-accessed table > into cache and keep it there? No. BTW, this idea has been suggested in the past, so check the archives for the prior discussions on this topic. The usual response is that the PostgreSQL bufmgr should already placing hot pages into the cache, so there isn't really a need for this mechanism. (And if the bufmgr doesn't do this well enough, we should improve the bufmgr -- as Jan has done for 7.5) -Neil
On Fri, 16 Jan 2004, Michael Brusser wrote: > Is there a way to force database to load > a frequently-accessed table into cache and keep it there? Nope. But there is a new cache buffer handler that may make it into 7.5 that would make that happen automagically.
On Fri, Jan 16, 2004 at 12:00:08PM -0500, Michael Brusser wrote: > Is there a way to force database to load > a frequently-accessed table into cache and keep it there? If it is frequently accessed, I guess it would be in the cachke permanently.... -- __________________________________________________ "Nothing is as subjective as reality" Reinoud van Leeuwen reinoud.v@n.leeuwen.net http://www.xs4all.nl/~reinoud __________________________________________________
This discussion seems likely to have a major effect on DBT-3 (DSS-type) performance from PostgreSQL... > On Fri, 16 Jan 2004, Michael Brusser wrote: > > > Is there a way to force database to load > > a frequently-accessed table into cache and keep it there? > > Scott Marlow replied... > > Nope. But there is a new cache buffer handler that may make it into 7.5 > that would make that happen automagically. The important question here is "what forces blocks out of cache?" rather than thinking about how to directly keep them there. > Jeroen T. Vermeulen wrote: > Sent: Friday, January 16, 2004 23:02 > Subject: [HACKERS] Reverse scans? > > Would it be doable, and would it be useful, to try to alternate the > directions of table and index scans every time each table/index was > fully scanned? > > I was thinking that it could help cache performance at various levels > in cases where data at the end of a large table, say, that remained in > memory after a scan, would otherwise be flushed out by a new scan of the > same table. If the next scan of the same table was to go in the other > direction, any remains of the last time around that were still in the > filesystem cache, buffer pool, hard disk cache etc. would stand a greater > chance of being reused. Jereon's idea is a good one when we consider the current behaviour, which is that large scans are placed into block buffer cache, which then forces other data out. I would like to question the latter behaviour, so we can address the cause and not just the symptom. Earlier versions of Oracle had a parameter called something like SMALL_TABLE_THRESHOLD. Any scan of a table larger than that threshold was NOT placed into buffer cache, but was consumed directly by the shadow process (the backend). Teradata also uses a similar buffer control technique for large table scans. If a table is too large to fit into buffer, it clearly wasn't going to be cached properly in the first place; Jereon's idea only works well for tables near to the size of the cache. If the table is MUCH bigger then it will have very little gain. Good original thinking, though I'm not sure its worth it. Oracle 9i now offers some variety for buffer cache management (as does DB2). You can specify at the tablespace and object level whether to use one of two buffer pools, KEEP or RECYCLE. The idea is to separate out the two types of blocks - ones that are there because they're well used and other blocks that are there at the moment, but unlikely to stay. My suggestion would be to: - split the buffer cache into two, just as Oracle does: KEEP & RECYCLE. This could default to KEEP=66% of total memory available, but could also be settable by init parameter. [changes to the memory management routines] - if we do a scan on a table whose size in blocks is more than some fraction (25%?) of KEEP bufferpool then we place the blocks into RECYCLE bufferpool. This can be decided immediately following optimization, rather than including it within the optimizer decision process since we aren't going to change the way the statement executes, we're just going to stop it from having an adverse effect on other current or future statements. [additional test to set parameter, then work out where to note it] Notice that I haven't suggested that the KEEP/RECYCLE option could be specified at table level. That optionality sounds like a lot of extra work, when what is needed is the automatic avoidance of cache-spoiling behaviour. (This would still mean that very large indexes with random request patterns would still spoil cache...maybe implement that later?) This would remove most reasons for spoiling the cache and blocks would then leave the cache only when they were genuinely no longer wanted. Any comments?? Takers?
Simon, have you read src/backend/storage/buffer/README of current CVS tip? The algorithm in the new replacement strategy is an attempt to figure that SMALL_TABLE_THRESHOLD automatically. Do you see anything that can be improved in that algorithm? Jan Simon Riggs wrote: > This discussion seems likely to have a major effect on DBT-3 (DSS-type) > performance from PostgreSQL... > >> On Fri, 16 Jan 2004, Michael Brusser wrote: >> >> > Is there a way to force database to load >> > a frequently-accessed table into cache and keep it there? >> >> Scott Marlow replied... >> >> Nope. But there is a new cache buffer handler that may make it into > 7.5 >> that would make that happen automagically. > > The important question here is "what forces blocks out of cache?" rather > than thinking about how to directly keep them there. > >> Jeroen T. Vermeulen wrote: >> Sent: Friday, January 16, 2004 23:02 >> Subject: [HACKERS] Reverse scans? >> >> Would it be doable, and would it be useful, to try to alternate the >> directions of table and index scans every time each table/index was >> fully scanned? >> >> I was thinking that it could help cache performance at various levels >> in cases where data at the end of a large table, say, that remained in >> memory after a scan, would otherwise be flushed out by a new scan of > the >> same table. If the next scan of the same table was to go in the other >> direction, any remains of the last time around that were still in the >> filesystem cache, buffer pool, hard disk cache etc. would stand a > greater >> chance of being reused. > > Jereon's idea is a good one when we consider the current behaviour, > which is > that large scans are placed into block buffer cache, which then forces > other data out. I would like to question the latter behaviour, so we can > address the cause and not just the symptom. > > Earlier versions of Oracle had a parameter called something like > SMALL_TABLE_THRESHOLD. Any scan of a table larger than that threshold > was NOT placed into buffer cache, but was consumed directly by the > shadow process (the backend). Teradata also uses a similar buffer > control technique for large table scans. > > If a table is too large to fit into buffer, it clearly wasn't going to > be cached properly in the first place; Jereon's idea only works well for > tables near to the size of the cache. If the table is MUCH bigger then > it will have very little gain. Good original thinking, though I'm not > sure its worth it. > > Oracle 9i now offers some variety for buffer cache management (as does > DB2). You can specify at the tablespace and object level whether to use > one of two buffer pools, KEEP or RECYCLE. The idea is to separate out > the two types of blocks - ones that are there because they're well used > and other blocks that are there at the moment, but unlikely to stay. > > My suggestion would be to: > - split the buffer cache into two, just as Oracle does: KEEP & RECYCLE. > This could default to KEEP=66% of total memory available, but could also > be settable by init parameter. > [changes to the memory management routines] > - if we do a scan on a table whose size in blocks is more than some > fraction (25%?) of KEEP bufferpool then we place the blocks into RECYCLE > bufferpool. This can be decided immediately following optimization, > rather than including it within the optimizer decision process since we > aren't going to change the way the statement executes, we're just going > to stop it from having an adverse effect on other current or future > statements. > [additional test to set parameter, then work out where to note it] > > Notice that I haven't suggested that the KEEP/RECYCLE option could be > specified at table level. That optionality sounds like a lot of extra > work, when what is needed is the automatic avoidance of cache-spoiling > behaviour. (This would still mean that very large indexes with random > request patterns would still spoil cache...maybe implement that later?) > > This would remove most reasons for spoiling the cache and blocks would > then leave the cache only when they were genuinely no longer wanted. > > Any comments?? Takers? > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
"Simon Riggs" <simon@2ndquadrant.com> writes:
> Any comments?? Takers?
Um, did you read the discussion of the ARC buffer management algorithm
that's already been implemented for 7.5?
The main objection I have to doing it Oracle's way is that that creates
more parameters that DBAs have to get right for reasonable performance.
Self-tuning algorithms are better, when available.
        regards, tom lane
			
		> Jan Wieck wrote: > > have you read src/backend/storage/buffer/README of current CVS tip? > > The algorithm in the new replacement strategy is an attempt to figure > that SMALL_TABLE_THRESHOLD automatically. Do you see anything that can > be improved in that algorithm? > Jan, I've read src/backend/storage/buffer/README rev 1.6 as you suggest. The new algorithm looks great - many thanks for implementing that. I'm not able to improve on this for the general case - I especially like the automatic management that it gives, allowing you to avoid additional DBA set parameters (and the coding to add these option settings/keywords). My concern was for DBT-3 performance and general Decision Support (DSS) workloads, where large proportion of table scans occur (not on the DBT-3 single-threaded test). The new strategy is much better than the older one and is likely to have a positive effect in this area. I don't think, right now, that anything further should be changed, in the interests of stability. For the record/for the future: My observation was that two commercial databases focused on DSS use a strategy which in terms of the new ARC implementation is effectively: "place blocks in T1 (RECENCY/RECYCLE buffer) and NEVER promote them to T2 (FREQUENCY/KEEP buffer)" when they do large object scans. In the new README, you note that: > StrategyHintVacuum(bool vacuum_active) > > Because vacuum reads all relations of the entire database > through the buffer manager, it can greatly disturb the > buffer replacement strategy. This function is used by vacuum > to inform that all subsequent buffer lookups are caused > by vacuum scanning relations. ...I would say that scans of very large tables also "greatly disturb the buffer replacement strategy", i.e. have exactly the same effect on the cache as the Vacuum utility. You'd clearly thought of the idea before me, though with regard to Vacuum. If we know ahead of time that a large scan is going to have this effect, why wait for the ARC to play its course, why not take exactly the same action? Have large scans call StrategyHint also. (Maybe rename it...?)...of course, some extra code to establish it IS a large scan... ...large table lookup should wait until a shared catalog cache is implemented Anyway, this idea can wait at least until we have extensive performance tuning on DBT-3 with 7.5. Thanks again for adding the new algorithm. Best Regards, Simon ... > Simon Riggs wrote: ... > > > > My suggestion would be to: > > - split the buffer cache into two, just as Oracle does: KEEP & RECYCLE. > > This could default to KEEP=66% of total memory available, but could also > > be settable by init parameter. > > [changes to the memory management routines] > > - if we do a scan on a table whose size in blocks is more than some > > fraction (25%?) of KEEP bufferpool then we place the blocks into RECYCLE > > bufferpool. This can be decided immediately following optimization, > > rather than including it within the optimizer decision process since we > > aren't going to change the way the statement executes, we're just going > > to stop it from having an adverse effect on other current or future > > statements. > > [additional test to set parameter, then work out where to note it] > >
> Jan Wieck wrote: > > have you read src/backend/storage/buffer/README of current CVS tip? > Tom Lane wrote: > > Um, did you read the discussion of the ARC buffer management algorithm > that's already been implemented for 7.5? > Tom, Jan, No, I hadn't read this. Thank you both for your time and trouble to point this out for me, which I was not aware of. My understanding, possibly faulty, was that, if work was completed, then it appears on the TODO list with a dash in front of it. The new cache management strategy isn't mentioned there, so was not aware that any work was completed (or even in progress). No finger pointing, just an observation of how the development process works... If the TODO-list-with-dash isn't the correct place to have looked, is there another list of committed changes for the next release? The latest README in CVS doesn't have a list of "what's new in 7.5" or similar. Do we need such a list? (I'd be happy to compile and maintain this if it agreed that it is a good idea to have such a document or process as separate from TODO - I'll be doing this anyway before I pass further comments!) Regards, Simon
Simon, thanks for the time to give this further thought. Simon Riggs wrote: > If we know ahead of time that a large scan is going to have this effect, > why wait for the ARC to play its course, why not take exactly the same > action? > Have large scans call StrategyHint also. (Maybe rename it...?)...of > course, some extra code to establish it IS a large scan... > ...large table lookup should wait until a shared catalog cache is > implemented The problem with this is a) how to detect that something will be a large scan, and b) how to decide what is a large scan in the first place. Large sequential scans in warehousing are often part of more complex join operations. And just because something returns a large number of result rows doesn't mean that the input data was that much. As for the definition of "large" itself, this depends on the size of the buffer cache and the access pattern of the application. As you surely have noticed, the usual sizes of B1+T1 = T2+B2 = C in the algorithm. Buffers evicted from T1 are remembered in B1, and because of that even repeated sequential scans of the same large relation will only cycle through T1 blocks, never cause any turbulence in T2 or B2. The only thing that will affect T2 and B2 dramatically by adjusting the cache split point is multiple scanning of more than one significantly large but smaller than C table. Scanning the same large but smaller C table over and over will have it after the second scan in T2, where it belongs. But having two tables A and B that are both just smaller C and having an access pattern like A, A, B, B, A, A, ... will cause many B1 hits and thereby increase the target T1 size. And it must be exactly that access pattern, because A, A, A, B, B, B, A, A, A, ... produces a complete MISS on the first, a B1 hit on the second and a B2 hit on the third scan, so it will up and down the split point evenly. Honestly, I don't even know what type of application could possibly produce such a screwed access pattern. And I am absolutely confident one can find corner cases to wring down Oracles complicated configuration harness more easily. > > Anyway, this idea can wait at least until we have extensive performance > tuning on DBT-3 with 7.5. Thanks again for adding the new algorithm. Everyone is always welcome to try and show that something can be improved. And we are in the middle of the 7.5 development cycle, so feel free to hack around. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
"Simon Riggs" <simon@2ndquadrant.com> writes:
> If the TODO-list-with-dash isn't the correct place to have looked, is
> there another list of committed changes for the next release?
We tend to rely on the CVS commit logs as the definitive source.  You
can pull the info from the CVS server (I use cvs2cl.pl to format the
results nicely), or read the archives of pgsql-committers.
In theory there should be a section at the head of release.sgml
mentioning the major changes done-so-far, but for various reasons this
hasn't gotten installed in the 7.5 branch yet.  (Look at the CVS
versions during 7.4 development to see how we did it last time.)
As far as the ARC change goes, I believe Jan still considers it a
work-in-progress, so it may not be appropriate to list yet anyway.
(Jan, where are you on that exactly?)
> Do we need such a list? (I'd be happy to compile and maintain this if it
> agreed that it is a good idea to have such a document or process as
> separate from TODO - I'll be doing this anyway before I pass further
> comments!)
If you wanted to go through the existing 7.5 commits and write up a
new done-so-far section, it'd save someone else (like me or Bruce)
from having to do it sometime soon ...
        regards, tom lane
			
		Tom Lane <tgl@sss.pgh.pa.us> writes: > In theory there should be a section at the head of release.sgml > mentioning the major changes done-so-far, but for various reasons > this hasn't gotten installed in the 7.5 branch yet. (Look at the > CVS versions during 7.4 development to see how we did it last time.) Well, keep in mind we didn't do it very effectively in 7.4 :-) The vast majority of changes weren't recorded there, and the ones that were had to be fleshed out quite a lot in the actual release notes. The last time that someone (Peter and myself, IIRC) suggested that we really incrementally maintain the release notes during the development cycle, Bruce said that he personally finds it more comfortable to summarize the CVS changelogs all at once shortly before we release the first beta. AFAIR that's where the discussion ended. -Neil
Neil Conway <neilc@samurai.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> In theory there should be a section at the head of release.sgml
>> mentioning the major changes done-so-far, but for various reasons
>> this hasn't gotten installed in the 7.5 branch yet.  (Look at the
>> CVS versions during 7.4 development to see how we did it last time.)
> Well, keep in mind we didn't do it very effectively in 7.4 :-) The
> vast majority of changes weren't recorded there, and the ones that
> were had to be fleshed out quite a lot in the actual release notes.
> The last time that someone (Peter and myself, IIRC) suggested that we
> really incrementally maintain the release notes during the development
> cycle, Bruce said that he personally finds it more comfortable to
> summarize the CVS changelogs all at once shortly before we release the
> first beta. AFAIR that's where the discussion ended.
It's fine with me if Bruce prefers to build the release notes directly
from the change logs.  As I saw it, the purpose of the temporary list of
things-done-so-far is not to be the raw material for the release notes.
It's to let alpha testers know about major changes that they might want
to test.  As such, it's fine that it's incomplete.
The other way we could handle this goal is to be a tad more vigorous about
checking off items as "done" in the TODO list.  However, Bruce generally
doesn't bother to make a new entry in the TODO list if someone does
something that wasn't in the list to begin with, and so I'm not sure
it's the right vehicle.
        regards, tom lane
			
		OK, I will attempt to draw together this information as currently stands. If this makes any sense, we can discuss what the requirement/process is for regular maintenance (daily/weekly/monthly etc). Understood to mean "changes in next release (current progress)" - items that have been completed/committed since last release, for the purpose of informing developers/testers what's new PRIOR to full release. Leaving unobstructed the functions of - TODO list - a combined list of desired work items (Bruce) - Release Notes - final list of features of a release (Bruce) This should help alpha testing, which should allow more control of what actually does get released (and therefore what the contents of Release Notes should be) Best Regards, Simon > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Friday, January 23, 2004 20:40 > To: Neil Conway > Cc: simon@2ndquadrant.com; 'Jan Wieck'; 'Postgresql Hackers' > Subject: Re: 7.5 change documentation > > Neil Conway <neilc@samurai.com> writes: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > >> In theory there should be a section at the head of release.sgml > >> mentioning the major changes done-so-far, but for various reasons > >> this hasn't gotten installed in the 7.5 branch yet. (Look at the > >> CVS versions during 7.4 development to see how we did it last time.) > > > Well, keep in mind we didn't do it very effectively in 7.4 :-) The > > vast majority of changes weren't recorded there, and the ones that > > were had to be fleshed out quite a lot in the actual release notes. > > > The last time that someone (Peter and myself, IIRC) suggested that we > > really incrementally maintain the release notes during the development > > cycle, Bruce said that he personally finds it more comfortable to > > summarize the CVS changelogs all at once shortly before we release the > > first beta. AFAIR that's where the discussion ended. > > It's fine with me if Bruce prefers to build the release notes directly > from the change logs. As I saw it, the purpose of the temporary list of > things-done-so-far is not to be the raw material for the release notes. > It's to let alpha testers know about major changes that they might want > to test. As such, it's fine that it's incomplete. > > The other way we could handle this goal is to be a tad more vigorous about > checking off items as "done" in the TODO list. However, Bruce generally > doesn't bother to make a new entry in the TODO list if someone does > something that wasn't in the list to begin with, and so I'm not sure > it's the right vehicle. > > regards, tom lane
Jan, Happy to continue the discussion...though without changing my suggestion that we defer any further more specialised improvements for now. > Jan Wieck replied to... > Simon Riggs wrote: > > If we know ahead of time that a large scan is going to have this effect, > > why wait for the ARC to play its course, why not take exactly the same > > action? > > Have large scans call StrategyHint also. (Maybe rename it...?)...of > > course, some extra code to establish it IS a large scan... > > ...large table lookup should wait until a shared catalog cache is > > implemented > > The problem with this is a) how to detect that something will be a large > scan, and b) how to decide what is a large scan in the first place. > My thoughts are that we know immediately prior to execution whether or not a plan calls for a full table scan (FTS) (or not). We also know the table and therefore its size. A large table in this context is one that would disrupt the cache if it made it onto T2. We can discuss an appropriate and usefully simple rule, perhaps sizeoftable(T) > 2*C??? > Large sequential scans in warehousing are often part of more complex > join operations. Yes, I agree. PostgreSQL is particularly prone to this currently, because of the high number of plans that resolve to FTS. Complexity of plan shouldn't effect the basic situation that we are reading all the blocks of a table and putting them in sequentially into T1 and then working on them. Plan complexity may increase the time that a T1 block stays in memory, with subsequent increase in probability of promotion to T1. > And just because something returns a large number of > result rows doesn't mean that the input data was that much. I agree also that overall execution time may be unrelated to whether a "large" table is involved. The number of output rows shouldn't have any effect on input rows and thus data blocks that need to be cached. (Jan gives a detailed analysis...ending with) > Honestly, I don't even know what type of application could possibly > produce such a screwed access pattern. And I am absolutely confident one > can find corner cases to wring down Oracles complicated configuration > harness more easily. I agree with everything you say. The algorithm copes well with almost every sequential pattern of access and there is significant benefit from ignoring the very very very rare cases that might give it problems. My thoughts are about multiple concurrent accesses, specifically FTS on large tables, rather than sequential ones. > Buffers evicted from T1 are remembered in B1, and because of that even > repeated sequential scans of the same large relation will only cycle > through T1 blocks, never cause any turbulence in T2 or B2. If we have a situation where a single backend makes repeated scans of the same table, these will be sequential and will have no effect on T1. In a DW situation, you are likely to have one or more very popular large tables (maybe think of this as the "Fact table", if you have a dimensional design). The tables are large and therefore query execution times will be extended (and accepted by user). In this situation it is very likely that: i) a single user/app submits multiple requests from other windows/threads Or simply, ii) multiple users access the popular table The common effect will be concurrent, rather than sequential, access to the popular table. Different SQL statements will have different plans and will perform scans of the same table at different rates because of other joins, more complex WHERE clauses etc. Like waves at a beach moving at different rates. Every time one scan catches up with another, it will cause T1 hits for almost the whole of the T1 list, promoting all of these blocks to the top of the T2 MRU and thus spoiling the cache - if it hits one it will probably hit most of them. This will not happen ALL the time, but I don't want it to happen EVER. Even in DW situation, I still want to be inserting data regularly (that's how the table got big!), so I have index blocks and other stuff that I want almost permanently in memory. Concurrent access via an index might have the same effect, though less dramatically. The closer the size of a table I to C, the greater the likelihood that these spurious cache hits will occur. (Of course, it might be argued that these are worthwhile and genuine cache hits - I argue that they are not wanted and this is the main basis of my discussion). Of course, if a table does fit in memory than that is very good. If a table was, say 2*C, then spurious cache hits will occur often and spoil the whole of T2. The DBT-3 workload is very similar to TPC-D/TPC-H workload. The test consists of a power test (all queries sequentially in random order) and a throughput test (1 or more concurrent streams, each stream executing all queries in a random order). When this benchmark first came out most vendors chose to perform the throughput test with only 1 stream (though with parallel processing)...I would say one reason for this is poor cache management...hence recent changes in various commercial products. In summary, I believe there is a reasonably common effect in DW situations where concurrent query access to large and popular tables will result in undesirable cache spoiling. This effect will still occur even after the ARC improvements are introduced - though in every other case I can think of, the ARC code is a major improvement on earlier strategies and should be hailed as a major improvement in automatic performance adaptation. There are two solution ideas: i) change the code so that FTS on large tables use the "no cache" strategy that has already been developed to support Vaccuum. ii) more complex: synchronise the FTS of the large table so that all backends that want scans produce only one set of I/Os and they share the block many times (yet still don't put it in cache!). FTS don't start at "the beginning" every time, they start wherever a current scan has got to, then loop back round at end (so average of two concurrent scans is 1.5 times as much I/O as a single FTS - a 25% saving on I/O). A more detailed explanation may be required - this technique is in commercial use within the Teradata rdbms. Implementing it would take some doing... Best Regards Simon
Simon Riggs wrote: > Jan, [...] > My thoughts are about multiple concurrent accesses, specifically FTS on > large tables, rather than sequential ones. Single or multiple backends is irrelevant here because a data block only exists once, and therefore we have only one shared buffer cache. > >> Buffers evicted from T1 are remembered in B1, and because of that even >> repeated sequential scans of the same large relation will only cycle >> through T1 blocks, never cause any turbulence in T2 or B2. > > If we have a situation where a single backend makes repeated scans of > the same table, these will be sequential and will have no effect on T1. You really have to look at this a bit more global, not table related. The strategy of ARC is this: In an unknown access pattern, if a specific block is accessed less frequently than every C requests, then it will only go into T1, age, get evicted and the CDB moves to B1, will get removed from that and is forgotten. Every block that is accessed more frequently than C will be after it's last access in any of the four queues of the directory and immediately go into T2. The adjustment of the target T1 size is an attempt to catch as many newcomers as possible. If an application does many inserts, it will access new blocks very soon again, so that a small T1 is sufficient to hold them in memory until their next access where they move into T2. An application that does non-uniform random access to blocks (there are always bestsellers and less frequently asked items), then a larger T1 might better satisfy that access pattern. > > In a DW situation, you are likely to have one or more very popular large > tables (maybe think of this as the "Fact table", if you have a > dimensional design). The tables are large and therefore query execution > times will be extended (and accepted by user). In this situation it is > very likely that: i) a single user/app submits multiple requests from > other windows/threads > Or simply, > ii) multiple users access the popular table If that causes that it's blocks are more frequently requested than every C lookups, it belongs into T2. > > The common effect will be concurrent, rather than sequential, access to > the popular table. Different SQL statements will have different plans > and will perform scans of the same table at different rates because of > other joins, more complex WHERE clauses etc. Like waves at a beach > moving at different rates. Every time one scan catches up with another, > it will cause T1 hits for almost the whole of the T1 list, promoting all > of these blocks to the top of the T2 MRU and thus spoiling the cache - > if it hits one it will probably hit most of them. This will not happen > ALL the time, but I don't want it to happen EVER. Even in DW situation, > I still want to be inserting data regularly (that's how the table got > big!), so I have index blocks and other stuff that I want almost > permanently in memory. Concurrent access via an index might have the > same effect, though less dramatically. > > The closer the size of a table I to C, the greater the likelihood that > these spurious cache hits will occur. (Of course, it might be argued > that these are worthwhile and genuine cache hits - I argue that they are > not wanted and this is the main basis of my discussion). Of course, if a > table does fit in memory than that is very good. If a table was, say > 2*C, then spurious cache hits will occur often and spoil the whole of > T2. How can any generic algorithm ever sense that when the application is accessing the same blocks multiple times, it should NOT cache them? Are you asking for a fine granulated tuning of cache priorities and behaviour on a per table basis? > > The DBT-3 workload is very similar to TPC-D/TPC-H workload. The test > consists of a power test (all queries sequentially in random order) and > a throughput test (1 or more concurrent streams, each stream executing > all queries in a random order). When this benchmark first came out most > vendors chose to perform the throughput test with only 1 stream (though > with parallel processing)...I would say one reason for this is poor > cache management...hence recent changes in various commercial products. > > In summary, I believe there is a reasonably common effect in DW > situations where concurrent query access to large and popular tables > will result in undesirable cache spoiling. This effect will still occur > even after the ARC improvements are introduced - though in every other > case I can think of, the ARC code is a major improvement on earlier > strategies and should be hailed as a major improvement in automatic > performance adaptation. > > There are two solution ideas: > i) change the code so that FTS on large tables use the "no cache" > strategy that has already been developed to support Vaccuum. > ii) more complex: synchronise the FTS of the large table so that all > backends that want scans produce only one set of I/Os and they share the > block many times (yet still don't put it in cache!). FTS don't start at > "the beginning" every time, they start wherever a current scan has got > to, then loop back round at end (so average of two concurrent scans is > 1.5 times as much I/O as a single FTS - a 25% saving on I/O). A more > detailed explanation may be required - this technique is in commercial > use within the Teradata rdbms. Implementing it would take some doing... How will the configuration of all that look like? You are using several business terms a human brain can imagine to describe various access patterns you want to be treated specially. In the whole system catalog and all the way down to the buffer cache, we only have some file and block number, maybe the size of it too but that's not guaranteed (think of blind writes by a backend of another DB). So how do we express what you want in some algorithm that we can put into the strategy? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan, I think we should suspend further discussion for now...in summary: ARC Buffer management is an important new performance feature for 7.5; the implementation is a good one and should have positive benefit for everybody's workload. ARC will adapt to a variety of situations and has been designed to allow Vacuum to avoid interfering with user applications. That's the important bit: The implementation notes are detailed; I've read them a few times to ensure I've got it straight. I am confident that the situation I described CAN exist with regard to multiple concurrent queries performing full table scans upon a single large table. Further debate on that point is continuing because of my poor explanation of that situation; forgive me. Thanks very much for your further explanations and examples. I will take a more practical tack on this now: providing evidence of a real query mix that exhibits the described properties and quantifying the effects and their frequency. If it IS worth it, and I accept that it may not be, I'll have a hack at the very specialised improvement I was suggesting, for very specific workload types. Best Regards Simon Riggs
Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > If the TODO-list-with-dash isn't the correct place to have looked, is > > there another list of committed changes for the next release? > > We tend to rely on the CVS commit logs as the definitive source. You > can pull the info from the CVS server (I use cvs2cl.pl to format the > results nicely), or read the archives of pgsql-committers. > > In theory there should be a section at the head of release.sgml > mentioning the major changes done-so-far, but for various reasons this > hasn't gotten installed in the 7.5 branch yet. (Look at the CVS > versions during 7.4 development to see how we did it last time.) > > As far as the ARC change goes, I believe Jan still considers it a > work-in-progress, so it may not be appropriate to list yet anyway. > (Jan, where are you on that exactly?) > > > Do we need such a list? (I'd be happy to compile and maintain this if it > > agreed that it is a good idea to have such a document or process as > > separate from TODO - I'll be doing this anyway before I pass further > > comments!) > > If you wanted to go through the existing 7.5 commits and write up a > new done-so-far section, it'd save someone else (like me or Bruce) > from having to do it sometime soon ... Doesn't Robert Treat's News Bits list all the major changes weekly? That would b e a good source. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: > It's fine with me if Bruce prefers to build the release notes directly > from the change logs. As I saw it, the purpose of the temporary list of > things-done-so-far is not to be the raw material for the release notes. > It's to let alpha testers know about major changes that they might want > to test. As such, it's fine that it's incomplete. > > The other way we could handle this goal is to be a tad more vigorous about > checking off items as "done" in the TODO list. However, Bruce generally > doesn't bother to make a new entry in the TODO list if someone does > something that wasn't in the list to begin with, and so I'm not sure > it's the right vehicle. Right. I see TODO as a way for us to remember our limitations, and to document them for our users. Once an item is completed, it didn't seem necessary to put it on the TODO list. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
> Bruce Momjian wrote > >Tom Lane wrote: > > >"Simon Riggs" <simon@2ndquadrant.com> writes: > > > If the TODO-list-with-dash isn't the correct place to have looked, is > > > there another list of committed changes for the next release? > > > > We tend to rely on the CVS commit logs as the definitive source. You > > can pull the info from the CVS server (I use cvs2cl.pl to format the > > results nicely), or read the archives of pgsql-committers. > > > > In theory there should be a section at the head of release.sgml > > mentioning the major changes done-so-far, but for various reasons this > > hasn't gotten installed in the 7.5 branch yet. (Look at the CVS > > versions during 7.4 development to see how we did it last time.) > > > > As far as the ARC change goes, I believe Jan still considers it a > > work-in-progress, so it may not be appropriate to list yet anyway. > > (Jan, where are you on that exactly?) > > > > > Do we need such a list? (I'd be happy to compile and maintain this if > it > > > agreed that it is a good idea to have such a document or process as > > > separate from TODO - I'll be doing this anyway before I pass further > > > comments!) > > > > If you wanted to go through the existing 7.5 commits and write up a > > new done-so-far section, it'd save someone else (like me or Bruce) > > from having to do it sometime soon ... > > Doesn't Robert Treat's News Bits list all the major changes weekly? > That would b e a good source. > Bruce - The excellent work that both you and Robert do is a slightly different view to what I had in mind - I agree they are all aspects of the same information. I'm posting a first output of this now, so we can discuss whether such a thing is useful, and or whether it can ever be all of useful/accurate/timely. I'll happily add this to the HEAD of release.sgml, though lets agree the content/direction first, before I spend time on a more formal publication mechanism. Best Regards, Simon Riggs