Обсуждение: replace all with * in pg_hba.conf

Поиск
Список
Период
Сортировка

replace all with * in pg_hba.conf

От
Andrew Dunstan
Дата:
If people are happy with Tom's suggestion of using '*' instead of 'all' 
in pg_hba.conf I will prepare a patch for it.

(I will also replace the ugly long IP6 localhost netmask with a CIDR mask).

cheers

andrew



Re: replace all with * in pg_hba.conf

От
Peter Eisentraut
Дата:
Andrew Dunstan wrote:
> If people are happy with Tom's suggestion of using '*' instead of
> 'all' in pg_hba.conf I will prepare a patch for it.

Well, while we're breaking stuff in the name of improvement, what about 
databases named "*" or databases with spaces in their names?



Re: replace all with * in pg_hba.conf

От
Andrew Dunstan
Дата:
Peter Eisentraut wrote:

>Andrew Dunstan wrote:
>  
>
>>If people are happy with Tom's suggestion of using '*' instead of
>>'all' in pg_hba.conf I will prepare a patch for it.
>>    
>>
>
>Well, while we're breaking stuff in the name of improvement, what about 
>databases named "*" or databases with spaces in their names?
>  
>

Good point. Perhaps we need to provide for an escaping mechanism in the 
routines that parse the file, although personally I have little sympathy 
for anyone who names a database '*'. I think it comes into the category 
of "Doctor, it hurts when I do this" ... "Then stop doing that." Spaces 
are a more likely problem, especially when we get W32 native users.

cheers

andrew



Re: replace all with * in pg_hba.conf

От
Andrew Dunstan
Дата:
I wrote:

> Peter Eisentraut wrote:
>
>> Andrew Dunstan wrote:
>>  
>>
>>> If people are happy with Tom's suggestion of using '*' instead of
>>> 'all' in pg_hba.conf I will prepare a patch for it.
>>>   
>>
>>
>> Well, while we're breaking stuff in the name of improvement, what 
>> about databases named "*" or databases with spaces in their names?
>>  
>>
>
> Good point. Perhaps we need to provide for an escaping mechanism in 
> the routines that parse the file, although personally I have little 
> sympathy for anyone who names a database '*'. I think it comes into 
> the category of "Doctor, it hurts when I do this" ... "Then stop doing 
> that." Spaces are a more likely problem, especially when we get W32 
> native users.


Looking at the code I discover that there is already provision covering 
spaces etc., because you can quote names. It's even documented ;-)

The minimal disturbance change might be to teach the parser to 
distinguish between a quoted 'all' and an unquoted 'all', and forget the 
'*' idea. Alternatively, do the same sort of thing, but replacing 'all' 
with '*'. A patch for the first would be quite tiny - similar for '*' 
except for extra doc and sample file changes.

cheers

andrew





Re: replace all with * in pg_hba.conf

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> The minimal disturbance change might be to teach the parser to 
> distinguish between a quoted 'all' and an unquoted 'all', and forget the 
> '*' idea.

Probably we ought to go with that, on backwards-compatibility grounds.
        regards, tom lane


Re: replace all with * in pg_hba.conf

От
"Marc G. Fournier"
Дата:
On Fri, 19 Dec 2003, Tom Lane wrote:

> Andrew Dunstan <andrew@dunslane.net> writes:
> > The minimal disturbance change might be to teach the parser to
> > distinguish between a quoted 'all' and an unquoted 'all', and forget the
> > '*' idea.
>
> Probably we ought to go with that, on backwards-compatibility grounds.

why not do both, but deprecate the use of all in the docs?  say with an
eventual goal of removing the use of all altogether in 2 releases?

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: replace all with * in pg_hba.conf

От
Andrew Dunstan
Дата:
Marc G. Fournier wrote:

>On Fri, 19 Dec 2003, Tom Lane wrote:
>
>  
>
>>Andrew Dunstan <andrew@dunslane.net> writes:
>>    
>>
>>>The minimal disturbance change might be to teach the parser to
>>>distinguish between a quoted 'all' and an unquoted 'all', and forget the
>>>'*' idea.
>>>      
>>>
>>Probably we ought to go with that, on backwards-compatibility grounds.
>>    
>>
>
>why not do both, but deprecate the use of all in the docs?  say with an
>eventual goal of removing the use of all altogether in 2 releases?
>
>  
>

The extra value is fairly low IMNSHO if we can distinguish between a 
magical and a non-magical 'all' - see the patch I just submitted.

cheers

andrew



Re: replace all with * in pg_hba.conf

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> Marc G. Fournier wrote:
>> why not do both, but deprecate the use of all in the docs?  say with an
>> eventual goal of removing the use of all altogether in 2 releases?

> The extra value is fairly low IMNSHO if we can distinguish between a 
> magical and a non-magical 'all' - see the patch I just submitted.

Also, your point about the special sameuser and samegroup keywords is a
good one.  We should make those all work consistently (ie, quoting makes
it not a keyword).  Going in the "*" direction would only make sense
if that were the only special case --- but it isn't.  I don't think we
want to start choosing random symbols for sameuser, samegroup, and other
stuff we might think of in future.
        regards, tom lane


Re: replace all with * in pg_hba.conf

От
Andrew Dunstan
Дата:
Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>Marc G. Fournier wrote:
>>    
>>
>>>why not do both, but deprecate the use of all in the docs?  say with an
>>>eventual goal of removing the use of all altogether in 2 releases?
>>>      
>>>
>
>  
>
>>The extra value is fairly low IMNSHO if we can distinguish between a 
>>magical and a non-magical 'all' - see the patch I just submitted.
>>    
>>
>
>Also, your point about the special sameuser and samegroup keywords is a
>good one.  We should make those all work consistently (ie, quoting makes
>it not a keyword).  Going in the "*" direction would only make sense
>if that were the only special case --- but it isn't.  I don't think we
>want to start choosing random symbols for sameuser, samegroup, and other
>stuff we might think of in future.
>
>  
>

Right. Revised patch sent to patches list.

cheers

andrew





cache control?

От
Michael Brusser
Дата:
Is there a way to force database to load
a frequently-accessed table into cache and keep it there?

Thanks,
Mike.





Re: cache control?

От
Neil Conway
Дата:
Michael Brusser <michael@synchronicity.com> writes:
> Is there a way to force database to load a frequently-accessed table
> into cache and keep it there?

No.

BTW, this idea has been suggested in the past, so check the archives
for the prior discussions on this topic. The usual response is that
the PostgreSQL bufmgr should already placing hot pages into the cache,
so there isn't really a need for this mechanism. (And if the bufmgr
doesn't do this well enough, we should improve the bufmgr -- as Jan
has done for 7.5)

-Neil



Re: cache control?

От
"scott.marlowe"
Дата:
On Fri, 16 Jan 2004, Michael Brusser wrote:

> Is there a way to force database to load
> a frequently-accessed table into cache and keep it there?

Nope.  But there is a new cache buffer handler that may make it into 7.5 
that would make that happen automagically.



Re: cache control?

От
Reinoud van Leeuwen
Дата:
On Fri, Jan 16, 2004 at 12:00:08PM -0500, Michael Brusser wrote:
> Is there a way to force database to load
> a frequently-accessed table into cache and keep it there?

If it is frequently accessed, I guess it would be in the cachke 
permanently....

-- 
__________________________________________________
"Nothing is as subjective as reality"
Reinoud van Leeuwen    reinoud.v@n.leeuwen.net
http://www.xs4all.nl/~reinoud
__________________________________________________


Re: cache control?

От
"Simon Riggs"
Дата:
This discussion seems likely to have a major effect on DBT-3 (DSS-type)
performance from PostgreSQL...

> On Fri, 16 Jan 2004, Michael Brusser wrote:
> 
> > Is there a way to force database to load
> > a frequently-accessed table into cache and keep it there?
> 
> Scott Marlow replied...
>
> Nope.  But there is a new cache buffer handler that may make it into
7.5
> that would make that happen automagically.

The important question here is "what forces blocks out of cache?" rather
than thinking about how to directly keep them there.

> Jeroen T. Vermeulen wrote:
> Sent: Friday, January 16, 2004 23:02
> Subject: [HACKERS] Reverse scans?
> 
> Would it be doable, and would it be useful, to try to alternate the
> directions of table and index scans every time each table/index was
> fully scanned?
> 
> I was thinking that it could help cache performance at various levels
> in cases where data at the end of a large table, say, that remained in
> memory after a scan, would otherwise be flushed out by a new scan of
the
> same table.  If the next scan of the same table was to go in the other
> direction, any remains of the last time around that were still in the
> filesystem cache, buffer pool, hard disk cache etc. would stand a
greater
> chance of being reused.

Jereon's idea is a good one when we consider the current behaviour,
which is 
that large scans are placed into block buffer cache, which then forces
other data out. I would like to question the latter behaviour, so we can
address the cause and not just the symptom.

Earlier versions of Oracle had a parameter called something like
SMALL_TABLE_THRESHOLD. Any scan of a table larger than that threshold
was NOT placed into buffer cache, but was consumed directly by the
shadow process (the backend). Teradata also uses a similar buffer
control technique for large table scans.

If a table is too large to fit into buffer, it clearly wasn't going to
be cached properly in the first place; Jereon's idea only works well for
tables near to the size of the cache. If the table is MUCH bigger then
it will have very little gain. Good original thinking, though I'm not
sure its worth it.

Oracle 9i now offers some variety for buffer cache management (as does
DB2). You can specify at the tablespace and object level whether to use
one of two buffer pools, KEEP or RECYCLE. The idea is to separate out
the two types of blocks - ones that are there because they're well used
and other blocks that are there at the moment, but unlikely to stay.

My suggestion would be to:
- split the buffer cache into two, just as Oracle does: KEEP & RECYCLE.
This could default to KEEP=66% of total memory available, but could also
be settable by init parameter.
[changes to the memory management routines]
- if we do a scan on a table whose size in blocks is more than some
fraction (25%?) of KEEP bufferpool then we place the blocks into RECYCLE
bufferpool. This can be decided immediately following optimization,
rather than including it within the optimizer decision process since we
aren't going to change the way the statement executes, we're just going
to stop it from having an adverse effect on other current or future
statements.
[additional test to set parameter, then work out where to note it]

Notice that I haven't suggested that the KEEP/RECYCLE option could be
specified at table level. That optionality sounds like a lot of extra
work, when what is needed is the automatic avoidance of cache-spoiling
behaviour. (This would still mean that very large indexes with random
request patterns would still spoil cache...maybe implement that later?)

This would remove most reasons for spoiling the cache and blocks would
then leave the cache only when they were genuinely no longer wanted.

Any comments?? Takers?



Re: cache control?

От
Jan Wieck
Дата:
Simon,

have you read src/backend/storage/buffer/README of current CVS tip?

The algorithm in the new replacement strategy is an attempt to figure 
that SMALL_TABLE_THRESHOLD automatically. Do you see anything that can 
be improved in that algorithm?


Jan

Simon Riggs wrote:

> This discussion seems likely to have a major effect on DBT-3 (DSS-type)
> performance from PostgreSQL...
> 
>> On Fri, 16 Jan 2004, Michael Brusser wrote:
>> 
>> > Is there a way to force database to load
>> > a frequently-accessed table into cache and keep it there?
>> 
>> Scott Marlow replied...
>>
>> Nope.  But there is a new cache buffer handler that may make it into
> 7.5
>> that would make that happen automagically.
> 
> The important question here is "what forces blocks out of cache?" rather
> than thinking about how to directly keep them there.
> 
>> Jeroen T. Vermeulen wrote:
>> Sent: Friday, January 16, 2004 23:02
>> Subject: [HACKERS] Reverse scans?
>> 
>> Would it be doable, and would it be useful, to try to alternate the
>> directions of table and index scans every time each table/index was
>> fully scanned?
>> 
>> I was thinking that it could help cache performance at various levels
>> in cases where data at the end of a large table, say, that remained in
>> memory after a scan, would otherwise be flushed out by a new scan of
> the
>> same table.  If the next scan of the same table was to go in the other
>> direction, any remains of the last time around that were still in the
>> filesystem cache, buffer pool, hard disk cache etc. would stand a
> greater
>> chance of being reused.
> 
> Jereon's idea is a good one when we consider the current behaviour,
> which is 
> that large scans are placed into block buffer cache, which then forces
> other data out. I would like to question the latter behaviour, so we can
> address the cause and not just the symptom.
> 
> Earlier versions of Oracle had a parameter called something like
> SMALL_TABLE_THRESHOLD. Any scan of a table larger than that threshold
> was NOT placed into buffer cache, but was consumed directly by the
> shadow process (the backend). Teradata also uses a similar buffer
> control technique for large table scans.
> 
> If a table is too large to fit into buffer, it clearly wasn't going to
> be cached properly in the first place; Jereon's idea only works well for
> tables near to the size of the cache. If the table is MUCH bigger then
> it will have very little gain. Good original thinking, though I'm not
> sure its worth it.
> 
> Oracle 9i now offers some variety for buffer cache management (as does
> DB2). You can specify at the tablespace and object level whether to use
> one of two buffer pools, KEEP or RECYCLE. The idea is to separate out
> the two types of blocks - ones that are there because they're well used
> and other blocks that are there at the moment, but unlikely to stay.
> 
> My suggestion would be to:
> - split the buffer cache into two, just as Oracle does: KEEP & RECYCLE.
> This could default to KEEP=66% of total memory available, but could also
> be settable by init parameter.
> [changes to the memory management routines]
> - if we do a scan on a table whose size in blocks is more than some
> fraction (25%?) of KEEP bufferpool then we place the blocks into RECYCLE
> bufferpool. This can be decided immediately following optimization,
> rather than including it within the optimizer decision process since we
> aren't going to change the way the statement executes, we're just going
> to stop it from having an adverse effect on other current or future
> statements.
> [additional test to set parameter, then work out where to note it]
> 
> Notice that I haven't suggested that the KEEP/RECYCLE option could be
> specified at table level. That optionality sounds like a lot of extra
> work, when what is needed is the automatic avoidance of cache-spoiling
> behaviour. (This would still mean that very large indexes with random
> request patterns would still spoil cache...maybe implement that later?)
> 
> This would remove most reasons for spoiling the cache and blocks would
> then leave the cache only when they were genuinely no longer wanted.
> 
> Any comments?? Takers?
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
>                http://archives.postgresql.org


-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: cache control?

От
Tom Lane
Дата:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> Any comments?? Takers?

Um, did you read the discussion of the ARC buffer management algorithm
that's already been implemented for 7.5?

The main objection I have to doing it Oracle's way is that that creates
more parameters that DBAs have to get right for reasonable performance.
Self-tuning algorithms are better, when available.
        regards, tom lane


Re: cache control?

От
"Simon Riggs"
Дата:
> Jan Wieck wrote:
>
> have you read src/backend/storage/buffer/README of current CVS tip?
> 
> The algorithm in the new replacement strategy is an attempt to figure
> that SMALL_TABLE_THRESHOLD automatically. Do you see anything that can
> be improved in that algorithm?
> 

Jan,

I've read src/backend/storage/buffer/README rev 1.6 as you suggest. The
new algorithm looks great - many thanks for implementing that.

I'm not able to improve on this for the general case - I especially like
the automatic management that it gives, allowing you to avoid additional
DBA set parameters (and the coding to add these option
settings/keywords).

My concern was for DBT-3 performance and general Decision Support (DSS)
workloads, where large proportion of table scans occur (not on the DBT-3
single-threaded test). The new strategy is much better than the older
one and is likely to have a positive effect in this area. I don't think,
right now, that anything further should be changed, in the interests of
stability.

For the record/for the future: My observation was that two commercial
databases focused on DSS use a strategy which in terms of the new ARC
implementation is effectively: "place blocks in T1 (RECENCY/RECYCLE
buffer) and NEVER promote them to T2 (FREQUENCY/KEEP buffer)" when they
do large object scans.

In the new README, you note that:
>    StrategyHintVacuum(bool vacuum_active)
>
>        Because vacuum reads all relations of the entire
database
>        through the buffer manager, it can greatly disturb the
>        buffer replacement strategy. This function is used by
vacuum
>        to inform that all subsequent buffer lookups are caused
>        by vacuum scanning relations.

...I would say that scans of very large tables also "greatly disturb the
buffer replacement strategy", i.e. have exactly the same effect on the
cache as the Vacuum utility. 

You'd clearly thought of the idea before me, though with regard to
Vacuum. 

If we know ahead of time that a large scan is going to have this effect,
why wait for the ARC to play its course, why not take exactly the same
action?
Have large scans call StrategyHint also. (Maybe rename it...?)...of
course, some extra code to establish it IS a large scan...
...large table lookup should wait until a shared catalog cache is
implemented
Anyway, this idea can wait at least until we have extensive performance
tuning on DBT-3 with 7.5. Thanks again for adding the new algorithm.

Best Regards, Simon 

... 
> Simon Riggs wrote:
... 
> >
> > My suggestion would be to:
> > - split the buffer cache into two, just as Oracle does: KEEP &
RECYCLE.
> > This could default to KEEP=66% of total memory available, but could
also
> > be settable by init parameter.
> > [changes to the memory management routines]
> > - if we do a scan on a table whose size in blocks is more than some
> > fraction (25%?) of KEEP bufferpool then we place the blocks into
RECYCLE
> > bufferpool. This can be decided immediately following optimization,
> > rather than including it within the optimizer decision process since
we
> > aren't going to change the way the statement executes, we're just
going
> > to stop it from having an adverse effect on other current or future
> > statements.
> > [additional test to set parameter, then work out where to note it]
> >



Re: cache control?

От
"Simon Riggs"
Дата:
> Jan Wieck wrote:
>
> have you read src/backend/storage/buffer/README of current CVS tip?

> Tom Lane wrote:
>
> Um, did you read the discussion of the ARC buffer management algorithm
> that's already been implemented for 7.5?
> 

Tom, Jan, 

No, I hadn't read this. Thank you both for your time and trouble to
point this out for me, which I was not aware of. 

My understanding, possibly faulty, was that, if work was completed, then
it appears on the TODO list with a dash in front of it. The new cache
management strategy isn't mentioned there, so was not aware that any
work was completed (or even in progress). No finger pointing, just an
observation of how the development process works...

If the TODO-list-with-dash isn't the correct place to have looked, is
there another list of committed changes for the next release? The latest
README in CVS doesn't have a list of "what's new in 7.5" or similar.

Do we need such a list? (I'd be happy to compile and maintain this if it
agreed that it is a good idea to have such a document or process as
separate from TODO - I'll be doing this anyway before I pass further
comments!)

Regards, Simon




Re: cache control?

От
Jan Wieck
Дата:
Simon,

thanks for the time to give this further thought.


Simon Riggs wrote:
> If we know ahead of time that a large scan is going to have this effect,
> why wait for the ARC to play its course, why not take exactly the same
> action?
> Have large scans call StrategyHint also. (Maybe rename it...?)...of
> course, some extra code to establish it IS a large scan...
> ...large table lookup should wait until a shared catalog cache is
> implemented

The problem with this is a) how to detect that something will be a large 
scan, and b) how to decide what is a large scan in the first place.

Large sequential scans in warehousing are often part of more complex 
join operations. And just because something returns a large number of 
result rows doesn't mean that the input data was that much.

As for the definition of "large" itself, this depends on the size of the 
buffer cache and the access pattern of the application. As you surely 
have noticed, the usual sizes of B1+T1 = T2+B2 = C in the algorithm. 
Buffers evicted from T1 are remembered in B1, and because of that even 
repeated sequential scans of the same large relation will only cycle 
through T1 blocks, never cause any turbulence in T2 or B2.

The only thing that will affect T2 and B2 dramatically by adjusting the 
cache split point is multiple scanning of more than one significantly 
large but smaller than C table. Scanning the same large but smaller C 
table over and over will have it after the second scan in T2, where it 
belongs. But having two tables A and B that are both just smaller C and 
having an access pattern like A, A, B, B, A, A, ... will cause many B1 
hits and thereby increase the target T1 size. And it must be exactly 
that access pattern, because A, A, A, B, B, B, A, A, A, ... produces a 
complete MISS on the first, a B1 hit on the second and a B2 hit on the 
third scan, so it will up and down the split point evenly.

Honestly, I don't even know what type of application could possibly 
produce such a screwed access pattern. And I am absolutely confident one 
can find corner cases to wring down Oracles complicated configuration 
harness more easily.

>  
> Anyway, this idea can wait at least until we have extensive performance
> tuning on DBT-3 with 7.5. Thanks again for adding the new algorithm.

Everyone is always welcome to try and show that something can be 
improved. And we are in the middle of the 7.5 development cycle, so feel 
free to hack around.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



7.5 change documentation (was Re: cache control?)

От
Tom Lane
Дата:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> If the TODO-list-with-dash isn't the correct place to have looked, is
> there another list of committed changes for the next release?

We tend to rely on the CVS commit logs as the definitive source.  You
can pull the info from the CVS server (I use cvs2cl.pl to format the
results nicely), or read the archives of pgsql-committers.

In theory there should be a section at the head of release.sgml
mentioning the major changes done-so-far, but for various reasons this
hasn't gotten installed in the 7.5 branch yet.  (Look at the CVS
versions during 7.4 development to see how we did it last time.)

As far as the ARC change goes, I believe Jan still considers it a
work-in-progress, so it may not be appropriate to list yet anyway.
(Jan, where are you on that exactly?)

> Do we need such a list? (I'd be happy to compile and maintain this if it
> agreed that it is a good idea to have such a document or process as
> separate from TODO - I'll be doing this anyway before I pass further
> comments!)

If you wanted to go through the existing 7.5 commits and write up a
new done-so-far section, it'd save someone else (like me or Bruce)
from having to do it sometime soon ...
        regards, tom lane


Re: 7.5 change documentation

От
Neil Conway
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> In theory there should be a section at the head of release.sgml
> mentioning the major changes done-so-far, but for various reasons
> this hasn't gotten installed in the 7.5 branch yet.  (Look at the
> CVS versions during 7.4 development to see how we did it last time.)

Well, keep in mind we didn't do it very effectively in 7.4 :-) The
vast majority of changes weren't recorded there, and the ones that
were had to be fleshed out quite a lot in the actual release notes.

The last time that someone (Peter and myself, IIRC) suggested that we
really incrementally maintain the release notes during the development
cycle, Bruce said that he personally finds it more comfortable to
summarize the CVS changelogs all at once shortly before we release the
first beta. AFAIR that's where the discussion ended.

-Neil



Re: 7.5 change documentation

От
Tom Lane
Дата:
Neil Conway <neilc@samurai.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> In theory there should be a section at the head of release.sgml
>> mentioning the major changes done-so-far, but for various reasons
>> this hasn't gotten installed in the 7.5 branch yet.  (Look at the
>> CVS versions during 7.4 development to see how we did it last time.)

> Well, keep in mind we didn't do it very effectively in 7.4 :-) The
> vast majority of changes weren't recorded there, and the ones that
> were had to be fleshed out quite a lot in the actual release notes.

> The last time that someone (Peter and myself, IIRC) suggested that we
> really incrementally maintain the release notes during the development
> cycle, Bruce said that he personally finds it more comfortable to
> summarize the CVS changelogs all at once shortly before we release the
> first beta. AFAIR that's where the discussion ended.

It's fine with me if Bruce prefers to build the release notes directly
from the change logs.  As I saw it, the purpose of the temporary list of
things-done-so-far is not to be the raw material for the release notes.
It's to let alpha testers know about major changes that they might want
to test.  As such, it's fine that it's incomplete.

The other way we could handle this goal is to be a tad more vigorous about
checking off items as "done" in the TODO list.  However, Bruce generally
doesn't bother to make a new entry in the TODO list if someone does
something that wasn't in the list to begin with, and so I'm not sure
it's the right vehicle.
        regards, tom lane


Re: 7.5 change documentation

От
"Simon Riggs"
Дата:
OK, I will attempt to draw together this information as currently
stands. If this makes any sense, we can discuss what the
requirement/process is for regular maintenance (daily/weekly/monthly
etc).

Understood to mean "changes in next release (current progress)" - items
that have been completed/committed since last release, for the purpose
of informing developers/testers what's new PRIOR to full release. 

Leaving unobstructed the functions of 
- TODO list - a combined list of desired work items (Bruce)
- Release Notes - final list of features of a release (Bruce)

This should help alpha testing, which should allow more control of what
actually does get released (and therefore what the contents of Release
Notes should be)

Best Regards, Simon

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Friday, January 23, 2004 20:40
> To: Neil Conway
> Cc: simon@2ndquadrant.com; 'Jan Wieck'; 'Postgresql Hackers'
> Subject: Re: 7.5 change documentation
> 
> Neil Conway <neilc@samurai.com> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> writes:
> >> In theory there should be a section at the head of release.sgml
> >> mentioning the major changes done-so-far, but for various reasons
> >> this hasn't gotten installed in the 7.5 branch yet.  (Look at the
> >> CVS versions during 7.4 development to see how we did it last
time.)
> 
> > Well, keep in mind we didn't do it very effectively in 7.4 :-) The
> > vast majority of changes weren't recorded there, and the ones that
> > were had to be fleshed out quite a lot in the actual release notes.
> 
> > The last time that someone (Peter and myself, IIRC) suggested that
we
> > really incrementally maintain the release notes during the
development
> > cycle, Bruce said that he personally finds it more comfortable to
> > summarize the CVS changelogs all at once shortly before we release
the
> > first beta. AFAIR that's where the discussion ended.
> 
> It's fine with me if Bruce prefers to build the release notes directly
> from the change logs.  As I saw it, the purpose of the temporary list
of
> things-done-so-far is not to be the raw material for the release
notes.
> It's to let alpha testers know about major changes that they might
want
> to test.  As such, it's fine that it's incomplete.
> 
> The other way we could handle this goal is to be a tad more vigorous
about
> checking off items as "done" in the TODO list.  However, Bruce
generally
> doesn't bother to make a new entry in the TODO list if someone does
> something that wasn't in the list to begin with, and so I'm not sure
> it's the right vehicle.
> 
>             regards, tom lane



Re: cache control?

От
"Simon Riggs"
Дата:
Jan,

Happy to continue the discussion...though without changing my suggestion
that we defer any further more specialised improvements for now.

> Jan Wieck replied to...
> Simon Riggs wrote:
> > If we know ahead of time that a large scan is going to have this
effect,
> > why wait for the ARC to play its course, why not take exactly the
same
> > action?
> > Have large scans call StrategyHint also. (Maybe rename it...?)...of
> > course, some extra code to establish it IS a large scan...
> > ...large table lookup should wait until a shared catalog cache is
> > implemented
> 
> The problem with this is a) how to detect that something will be a
large
> scan, and b) how to decide what is a large scan in the first place.
> 

My thoughts are that we know immediately prior to execution whether or
not a plan calls for a full table scan (FTS) (or not). We also know the
table and therefore its size. A large table in this context is one that
would disrupt the cache if it made it onto T2. We can discuss an
appropriate and usefully simple rule, perhaps sizeoftable(T) > 2*C???

> Large sequential scans in warehousing are often part of more complex
> join operations. 

Yes, I agree. PostgreSQL is particularly prone to this currently,
because of the high number of plans that resolve to FTS. Complexity of
plan shouldn't effect the basic situation that we are reading all the
blocks of a table and putting them in sequentially into T1 and then
working on them. Plan complexity may increase the time that a T1 block
stays in memory, with subsequent increase in probability of promotion to
T1.

> And just because something returns a large number of
> result rows doesn't mean that the input data was that much.

I agree also that overall execution time may be unrelated to whether a
"large" table is involved. The number of output rows shouldn't have any
effect on input rows and thus data blocks that need to be cached.

(Jan gives a detailed analysis...ending with)
> Honestly, I don't even know what type of application could possibly
> produce such a screwed access pattern. And I am absolutely confident
one
> can find corner cases to wring down Oracles complicated configuration
> harness more easily.

I agree with everything you say. The algorithm copes well with almost
every sequential pattern of access and there is significant benefit from
ignoring the very very very rare cases that might give it problems.

My thoughts are about multiple concurrent accesses, specifically FTS on
large tables, rather than sequential ones.

> Buffers evicted from T1 are remembered in B1, and because of that even
> repeated sequential scans of the same large relation will only cycle
> through T1 blocks, never cause any turbulence in T2 or B2.

If we have a situation where a single backend makes repeated scans of
the same table, these will be sequential and will have no effect on T1.

In a DW situation, you are likely to have one or more very popular large
tables (maybe think of this as the "Fact table", if you have a
dimensional design). The tables are large and therefore query execution
times will be extended (and accepted by user). In this situation it is
very likely that: i) a single user/app submits multiple requests from
other windows/threads
Or simply,
ii) multiple users access the popular table

The common effect will be concurrent, rather than sequential, access to
the popular table. Different SQL statements will have different plans
and will perform scans of the same table at different rates because of
other joins, more complex WHERE clauses etc. Like waves at a beach
moving at different rates. Every time one scan catches up with another,
it will cause T1 hits for almost the whole of the T1 list, promoting all
of these blocks to the top of the T2 MRU and thus spoiling the cache -
if it hits one it will probably hit most of them. This will not happen
ALL the time, but I don't want it to happen EVER. Even in DW situation,
I still want to be inserting data regularly (that's how the table got
big!), so I have index blocks and other stuff that I want almost
permanently in memory. Concurrent access via an index might have the
same effect, though less dramatically.

The closer the size of a table I to C, the greater the likelihood that
these spurious cache hits will occur. (Of course, it might be argued
that these are worthwhile and genuine cache hits - I argue that they are
not wanted and this is the main basis of my discussion). Of course, if a
table does fit in memory than that is very good. If a table was, say
2*C, then spurious cache hits will occur often and spoil the whole of
T2.

The DBT-3 workload is very similar to TPC-D/TPC-H workload. The test
consists of a power test (all queries sequentially in random order) and
a throughput test (1 or more concurrent streams, each stream executing
all queries in a random order). When this benchmark first came out most
vendors chose to perform the throughput test with only 1 stream (though
with parallel processing)...I would say one reason for this is poor
cache management...hence recent changes in various commercial products.

In summary, I believe there is a reasonably common effect in DW
situations where concurrent query access to large and popular tables
will result in undesirable cache spoiling. This effect will still occur
even after the ARC improvements are introduced - though in every other
case I can think of, the ARC code is a major improvement on earlier
strategies and should be hailed as a major improvement in automatic
performance adaptation.

There are two solution ideas:
i) change the code so that FTS on large tables use the "no cache"
strategy that has already been developed to support Vaccuum.
ii) more complex: synchronise the FTS of the large table so that all
backends that want scans produce only one set of I/Os and they share the
block many times (yet still don't put it in cache!). FTS don't start at
"the beginning" every time, they start wherever a current scan has got
to, then loop back round at end (so average of two concurrent scans is
1.5 times as much I/O as a single FTS - a 25% saving on I/O). A more
detailed explanation may be required - this technique is in commercial
use within the Teradata rdbms. Implementing it would take some doing...

Best Regards

Simon




Re: cache control?

От
Jan Wieck
Дата:
Simon Riggs wrote:
> Jan,

[...]

> My thoughts are about multiple concurrent accesses, specifically FTS on
> large tables, rather than sequential ones.

Single or multiple backends is irrelevant here because a data block only 
exists once, and therefore we have only one shared buffer cache.

> 
>> Buffers evicted from T1 are remembered in B1, and because of that even
>> repeated sequential scans of the same large relation will only cycle
>> through T1 blocks, never cause any turbulence in T2 or B2.
> 
> If we have a situation where a single backend makes repeated scans of
> the same table, these will be sequential and will have no effect on T1.

You really have to look at this a bit more global, not table related. 
The strategy of ARC is this:

In an unknown access pattern, if a specific block is accessed less 
frequently than every C requests, then it will only go into T1, age, get 
evicted and the CDB moves to B1, will get removed from that and is 
forgotten. Every block that is accessed more frequently than C will be 
after it's last access in any of the four queues of the directory and 
immediately go into T2.

The adjustment of the target T1 size is an attempt to catch as many 
newcomers as possible. If an application does many inserts, it will 
access new blocks very soon again, so that a small T1 is sufficient to 
hold them in memory until their next access where they move into T2. An 
application that does non-uniform random access to blocks (there are 
always bestsellers and less frequently asked items), then a larger T1 
might better satisfy that access pattern.

> 
> In a DW situation, you are likely to have one or more very popular large
> tables (maybe think of this as the "Fact table", if you have a
> dimensional design). The tables are large and therefore query execution
> times will be extended (and accepted by user). In this situation it is
> very likely that: i) a single user/app submits multiple requests from
> other windows/threads
> Or simply,
> ii) multiple users access the popular table

If that causes that it's blocks are more frequently requested than every 
C lookups, it belongs into T2.

> 
> The common effect will be concurrent, rather than sequential, access to
> the popular table. Different SQL statements will have different plans
> and will perform scans of the same table at different rates because of
> other joins, more complex WHERE clauses etc. Like waves at a beach
> moving at different rates. Every time one scan catches up with another,
> it will cause T1 hits for almost the whole of the T1 list, promoting all
> of these blocks to the top of the T2 MRU and thus spoiling the cache -
> if it hits one it will probably hit most of them. This will not happen
> ALL the time, but I don't want it to happen EVER. Even in DW situation,
> I still want to be inserting data regularly (that's how the table got
> big!), so I have index blocks and other stuff that I want almost
> permanently in memory. Concurrent access via an index might have the
> same effect, though less dramatically.
> 
> The closer the size of a table I to C, the greater the likelihood that
> these spurious cache hits will occur. (Of course, it might be argued
> that these are worthwhile and genuine cache hits - I argue that they are
> not wanted and this is the main basis of my discussion). Of course, if a
> table does fit in memory than that is very good. If a table was, say
> 2*C, then spurious cache hits will occur often and spoil the whole of
> T2.

How can any generic algorithm ever sense that when the application is 
accessing the same blocks multiple times, it should NOT cache them? Are 
you asking for a fine granulated tuning of cache priorities and 
behaviour on a per table basis?

> 
> The DBT-3 workload is very similar to TPC-D/TPC-H workload. The test
> consists of a power test (all queries sequentially in random order) and
> a throughput test (1 or more concurrent streams, each stream executing
> all queries in a random order). When this benchmark first came out most
> vendors chose to perform the throughput test with only 1 stream (though
> with parallel processing)...I would say one reason for this is poor
> cache management...hence recent changes in various commercial products.
> 
> In summary, I believe there is a reasonably common effect in DW
> situations where concurrent query access to large and popular tables
> will result in undesirable cache spoiling. This effect will still occur
> even after the ARC improvements are introduced - though in every other
> case I can think of, the ARC code is a major improvement on earlier
> strategies and should be hailed as a major improvement in automatic
> performance adaptation.
> 
> There are two solution ideas:
> i) change the code so that FTS on large tables use the "no cache"
> strategy that has already been developed to support Vaccuum.
> ii) more complex: synchronise the FTS of the large table so that all
> backends that want scans produce only one set of I/Os and they share the
> block many times (yet still don't put it in cache!). FTS don't start at
> "the beginning" every time, they start wherever a current scan has got
> to, then loop back round at end (so average of two concurrent scans is
> 1.5 times as much I/O as a single FTS - a 25% saving on I/O). A more
> detailed explanation may be required - this technique is in commercial
> use within the Teradata rdbms. Implementing it would take some doing...

How will the configuration of all that look like? You are using several 
business terms a human brain can imagine to describe various access 
patterns you want to be treated specially. In the whole system catalog 
and all the way down to the buffer cache, we only have some file and 
block number, maybe the size of it too but that's not guaranteed (think 
of blind writes by a backend of another DB). So how do we express what 
you  want in some algorithm that we can put into the strategy?


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #



Re: cache control?

От
"Simon Riggs"
Дата:
Jan,

I think we should suspend further discussion for now...in summary:

ARC Buffer management is an important new performance feature for 7.5;
the implementation is a good one and should have positive benefit for
everybody's workload. ARC will adapt to a variety of situations and has
been designed to allow Vacuum to avoid interfering with user
applications.

That's the important bit: The implementation notes are detailed; I've
read them a few times to ensure I've got it straight. I am confident
that the situation I described CAN exist with regard to multiple
concurrent queries performing full table scans upon a single large
table. Further debate on that point is continuing because of my poor
explanation of that situation; forgive me. Thanks very much for your
further explanations and examples.

I will take a more practical tack on this now: providing evidence of a
real query mix that exhibits the described properties and quantifying
the effects and their frequency. If it IS worth it, and I accept that it
may not be, I'll have a hack at the very specialised improvement I was
suggesting, for very specific workload types.

Best Regards

Simon Riggs




Re: 7.5 change documentation (was Re: cache control?)

От
Bruce Momjian
Дата:
Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > If the TODO-list-with-dash isn't the correct place to have looked, is
> > there another list of committed changes for the next release?
> 
> We tend to rely on the CVS commit logs as the definitive source.  You
> can pull the info from the CVS server (I use cvs2cl.pl to format the
> results nicely), or read the archives of pgsql-committers.
> 
> In theory there should be a section at the head of release.sgml
> mentioning the major changes done-so-far, but for various reasons this
> hasn't gotten installed in the 7.5 branch yet.  (Look at the CVS
> versions during 7.4 development to see how we did it last time.)
> 
> As far as the ARC change goes, I believe Jan still considers it a
> work-in-progress, so it may not be appropriate to list yet anyway.
> (Jan, where are you on that exactly?)
> 
> > Do we need such a list? (I'd be happy to compile and maintain this if it
> > agreed that it is a good idea to have such a document or process as
> > separate from TODO - I'll be doing this anyway before I pass further
> > comments!)
> 
> If you wanted to go through the existing 7.5 commits and write up a
> new done-so-far section, it'd save someone else (like me or Bruce)
> from having to do it sometime soon ...

Doesn't Robert Treat's News Bits list all the major changes weekly? 
That would b e a good source.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: 7.5 change documentation

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
> It's fine with me if Bruce prefers to build the release notes directly
> from the change logs.  As I saw it, the purpose of the temporary list of
> things-done-so-far is not to be the raw material for the release notes.
> It's to let alpha testers know about major changes that they might want
> to test.  As such, it's fine that it's incomplete.
> 
> The other way we could handle this goal is to be a tad more vigorous about
> checking off items as "done" in the TODO list.  However, Bruce generally
> doesn't bother to make a new entry in the TODO list if someone does
> something that wasn't in the list to begin with, and so I'm not sure
> it's the right vehicle.

Right. I see TODO as a way for us to remember our limitations, and to
document them for our users.  Once an item is completed, it didn't seem
necessary to put it on the TODO list.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: 7.5 change documentation (was Re: cache control?)

От
"Simon Riggs"
Дата:
> Bruce Momjian wrote 
> >Tom Lane wrote:
> > >"Simon Riggs" <simon@2ndquadrant.com> writes:
> > > If the TODO-list-with-dash isn't the correct place to have looked,
is
> > > there another list of committed changes for the next release?
> >
> > We tend to rely on the CVS commit logs as the definitive source.
You
> > can pull the info from the CVS server (I use cvs2cl.pl to format the
> > results nicely), or read the archives of pgsql-committers.
> >
> > In theory there should be a section at the head of release.sgml
> > mentioning the major changes done-so-far, but for various reasons
this
> > hasn't gotten installed in the 7.5 branch yet.  (Look at the CVS
> > versions during 7.4 development to see how we did it last time.)
> >
> > As far as the ARC change goes, I believe Jan still considers it a
> > work-in-progress, so it may not be appropriate to list yet anyway.
> > (Jan, where are you on that exactly?)
> >
> > > Do we need such a list? (I'd be happy to compile and maintain this
if
> it
> > > agreed that it is a good idea to have such a document or process
as
> > > separate from TODO - I'll be doing this anyway before I pass
further
> > > comments!)
> >
> > If you wanted to go through the existing 7.5 commits and write up a
> > new done-so-far section, it'd save someone else (like me or Bruce)
> > from having to do it sometime soon ...
> 
> Doesn't Robert Treat's News Bits list all the major changes weekly?
> That would b e a good source.
> 

Bruce - The excellent work that both you and Robert do is a slightly
different view to what I had in mind - I agree they are all aspects of
the same information. I'm posting a first output of this now, so we can
discuss whether such a thing is useful, and or whether it can ever be
all of useful/accurate/timely.

I'll happily add this to the HEAD of release.sgml, though lets agree the
content/direction first, before I spend time on a more formal
publication mechanism.

Best Regards, Simon Riggs