Обсуждение: Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

Поиск
Список
Период
Сортировка

Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Tom Lane
Дата:
[ redirecting thread to -hackers ]

Neil Conway <neilc@samurai.com> writes:
> On Sun, 2008-01-27 at 21:54 +0000, Gregory Stark wrote:
>> I liked the "synchronized_sequential_scans" idea myself.

> I think that's a bit too long. How about "synchronized_scans", or
> "synchronized_seqscans"?

We have enable_seqscan already, so that last choice seems to fit in.
        regards, tom lane


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Michael Glaesemann
Дата:
On Jan 27, 2008, at 21:04 , Tom Lane wrote:

> [ redirecting thread to -hackers ]
>
> Neil Conway <neilc@samurai.com> writes:
>> On Sun, 2008-01-27 at 21:54 +0000, Gregory Stark wrote:
>>> I liked the "synchronized_sequential_scans" idea myself.
>
>> I think that's a bit too long. How about "synchronized_scans", or
>> "synchronized_seqscans"?
>
> We have enable_seqscan already, so that last choice seems to fit in.

Would it make sense to match the plural as well?

Michael Glaesemann
grzm seespotcode net




Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Tom Lane
Дата:
Michael Glaesemann <grzm@seespotcode.net> writes:
>> Neil Conway <neilc@samurai.com> writes:
>>> I think that's a bit too long. How about "synchronized_scans", or
>>> "synchronized_seqscans"?

> Would it make sense to match the plural as well?

Actually, the best suggestion I've seen so far is Guillaume's
"synchronize_seqscans" --- make it a verb phrase.
        regards, tom lane


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
"Zeugswetter Andreas ADI SD"
Дата:
> >> I liked the "synchronized_sequential_scans" idea myself.
>
> > I think that's a bit too long. How about "synchronized_scans", or
> > "synchronized_seqscans"?
>
> We have enable_seqscan already, so that last choice seems to fit in.

Yes looks good, how about synchronized_seqscan without plural ?

Andreas



Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Simon Riggs
Дата:
On Sun, 2008-01-27 at 21:04 -0500, Tom Lane wrote:
> [ redirecting thread to -hackers ]
> 
> Neil Conway <neilc@samurai.com> writes:
> > On Sun, 2008-01-27 at 21:54 +0000, Gregory Stark wrote:
> >> I liked the "synchronized_sequential_scans" idea myself.
> 
> > I think that's a bit too long. How about "synchronized_scans", or
> > "synchronized_seqscans"?
> 
> We have enable_seqscan already, so that last choice seems to fit in.

If we're going to have a GUC, we may as well make it as useful as
possible.

Currently we set synch scan on when the table is larger than 25% of
shared_buffers. So increasing shared_buffers can actually turn this
feature off.

Rather than having a boolean GUC, we should have a number and make the
parameter "synchronised_scan_threshold". This would then be the size of
a table above which we would perform synch scans. If its set to -1, then
this would be the same as "off" in all cases. The default value would be
25% of shared_buffers. (Think we can only do that at initdb time
currently).

If we do that, its clearly different from the enable_* parameters, so
the name is easier to decide ;-)

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com



Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Hans-Juergen Schoenig
Дата:

On Jan 28, 2008, at 6:14 PM, Simon Riggs wrote:

On Sun, 2008-01-27 at 21:04 -0500, Tom Lane wrote:
[ redirecting thread to -hackers ]

Neil Conway <neilc@samurai.com> writes:
On Sun, 2008-01-27 at 21:54 +0000, Gregory Stark wrote:
I liked the "synchronized_sequential_scans" idea myself.

I think that's a bit too long. How about "synchronized_scans", or
"synchronized_seqscans"?

We have enable_seqscan already, so that last choice seems to fit in.

If we're going to have a GUC, we may as well make it as useful as
possible.

Currently we set synch scan on when the table is larger than 25% of
shared_buffers. So increasing shared_buffers can actually turn this
feature off.

Rather than having a boolean GUC, we should have a number and make the
parameter "synchronised_scan_threshold". This would then be the size of
a table above which we would perform synch scans. If its set to -1, then
this would be the same as "off" in all cases. The default value would be
25% of shared_buffers. (Think we can only do that at initdb time
currently).

If we do that, its clearly different from the enable_* parameters, so
the name is easier to decide ;-)


+1
This is in fact a lot more flexible and transparent.
It gives us a lot more control over the process and it is easy to explain / understand.

best regards,

hans

--
Cybertec Schönig & Schönig GmbH
PostgreSQL Solutions and Support
Gröhrmühlgasse 26, 2700 Wiener Neustadt
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Tom Lane
Дата:
Simon Riggs <simon@2ndquadrant.com> writes:
> Rather than having a boolean GUC, we should have a number and make the
> parameter "synchronised_scan_threshold".

This would open up a can of worms I'd prefer not to touch, having to do
with whether the buffer-access-strategy behavior should track that or
not.  As the note in heapam.c says,
    * If the table is large relative to NBuffers, use a bulk-read access    * strategy and enable synchronized scanning
(seesyncscan.c).  Although    * the thresholds for these features could be different, we make them the    * same so
thatthere are only two behaviors to tune rather than four.
 

It's a bit late in the cycle to be revisiting that choice.  Now we do
already have three behaviors to worry about (BAS on and syncscan off)
but throwing in a randomly settable knob will take it back to four,
and we have no idea how that fourth case will behave.  The other tack we
could take (having the one GUC variable control both thresholds) is
not good since it will result in pg_dump trashing the buffer cache.
        regards, tom lane


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Simon Riggs
Дата:
On Mon, 2008-01-28 at 16:21 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Rather than having a boolean GUC, we should have a number and make the
> > parameter "synchronised_scan_threshold".
> 
> This would open up a can of worms I'd prefer not to touch, having to do
> with whether the buffer-access-strategy behavior should track that or
> not.  As the note in heapam.c says,
> 
>      * If the table is large relative to NBuffers, use a bulk-read access
>      * strategy and enable synchronized scanning (see syncscan.c).  Although
>      * the thresholds for these features could be different, we make them the
>      * same so that there are only two behaviors to tune rather than four.
> 
> It's a bit late in the cycle to be revisiting that choice.  Now we do
> already have three behaviors to worry about (BAS on and syncscan off)
> but throwing in a randomly settable knob will take it back to four,
> and we have no idea how that fourth case will behave.  The other tack we
> could take (having the one GUC variable control both thresholds) is
> not good since it will result in pg_dump trashing the buffer cache.

OK, good points. 

I'm still concerned that the thresholds gets higher as we increase
shared_buffers. We may be removing performance features as fast as we
gain performance when we set shared_buffers higher.

Might we agree that the threshold should be fixed at 8MB, rather than
varying upwards as we try to tune? 

The objective of having a tuning hook would have been simply to
normalise the effects of increasing shared_buffers anyway, so fixing it
would solve the problem I see.

(8MB is the default, based upon 25% of 32MB).

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
"Heikki Linnakangas"
Дата:
Simon Riggs wrote:
> On Mon, 2008-01-28 at 16:21 -0500, Tom Lane wrote:
>> Simon Riggs <simon@2ndquadrant.com> writes:
>>> Rather than having a boolean GUC, we should have a number and make the
>>> parameter "synchronised_scan_threshold".
>> This would open up a can of worms I'd prefer not to touch, having to do
>> with whether the buffer-access-strategy behavior should track that or
>> not.  As the note in heapam.c says,
>>
>>      * If the table is large relative to NBuffers, use a bulk-read access
>>      * strategy and enable synchronized scanning (see syncscan.c).  Although
>>      * the thresholds for these features could be different, we make them the
>>      * same so that there are only two behaviors to tune rather than four.
>>
>> It's a bit late in the cycle to be revisiting that choice.  Now we do
>> already have three behaviors to worry about (BAS on and syncscan off)
>> but throwing in a randomly settable knob will take it back to four,
>> and we have no idea how that fourth case will behave.  The other tack we
>> could take (having the one GUC variable control both thresholds) is
>> not good since it will result in pg_dump trashing the buffer cache.
> 
> OK, good points. 
> 
> I'm still concerned that the thresholds gets higher as we increase
> shared_buffers. We may be removing performance features as fast as we
> gain performance when we set shared_buffers higher.
> 
> Might we agree that the threshold should be fixed at 8MB, rather than
> varying upwards as we try to tune? 

Synchronized scans, and the bulk-read strategy, don't help if the table 
fits in cache. If it fits in shared buffers, you're better off keeping 
it there, than swap pages between the OS cache and shared buffers, or 
spend any effort synchronizing scans. That's why we agreed back then 
that the threshold should be X% of shared_buffers.

It's a good point that we don't want pg_dump to screw up the cluster 
order, but that's the only use case I've seen this far for disabling 
sync scans. Even that wouldn't matter much if our estimate for 
"clusteredness" didn't get screwed up by a table that looks like this: 
"5 6 7 8 9 1 2 3 4"

Now, maybe there's more use cases where you'd want to tune the 
threshold, but I'd like to see some before we add more knobs.

To benefit from a lower threshold, you'd need to have a table large 
enough that its cache footprint matters, but is still smaller than 25% 
of shared_buffers, and have seq scans on it. In that scenario, you might 
benefit from a lower threshold, because that would leave some 
shared_buffers free for other use. Even that is quite hand-wavey; the 
buffer cache LRU algorithm handles that kind of scenarios reasonably 
well already, and whether or not

To benefit from a larger threshold, you'd need to have a table larger 
than 25% of shared_buffers, but still smaller than shared_buffers, and 
seq scan it often enough that you want to keep it in shared buffers. If 
you're frequently seq scanning a table of that size, you're most likely 
suffering from a bad plan. Even then, the performance difference 
shouldn't be that great, the table surely fits in OS cache anyway, with 
typical shared_buffers settings.

Tables that are seq scanned are typically very small, like a summary 
table with just a few rows, or huge tables in a data warehousing 
system. Between the extremes, I don't think the threshold actually has a 
very big impact.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Jeff Davis
Дата:
On Mon, 2008-01-28 at 23:13 +0000, Heikki Linnakangas wrote:
> It's a good point that we don't want pg_dump to screw up the cluster 
> order, but that's the only use case I've seen this far for disabling 
> sync scans. Even that wouldn't matter much if our estimate for 
> "clusteredness" didn't get screwed up by a table that looks like this: 
> "5 6 7 8 9 1 2 3 4"

It doesn't seem like there is any reason for the estimate to get
confused, but it apparently does. I loaded a test table with a similar
distribution to your example, and it shows a correlation of about -0.5,
but it should be as good as something near -1 or +1.

I am not a statistics expert, but it seems like a better measurement
would be: "what is the chance that, if the tuples are close together in
index order, the corresponding heap tuples are close together?".

The answer to that question in your example is "very likely", so there
would be no problem.

Is there a reason we don't do this?

Regards,Jeff Davis



Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Simon Riggs
Дата:
On Mon, 2008-01-28 at 23:13 +0000, Heikki Linnakangas wrote:
> Tables that are seq scanned are typically very small, like a summary 
> table with just a few rows, or huge tables in a data warehousing 
> system. Between the extremes, I don't think the threshold actually has
> a very big impact.

And if you have a partitioned table with partitions inconveniently
sized? You'd need to *reduce* shared_buffers specifically to get synch
scans and BAS to kick in. Or increase partition size. Both of which
reduce the impact of the benefits we've added.

I don't think the argument that "a table is smaller than shared buffers
therefore it is already in shared buffers" holds true in all cases. I/O
does matter.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Ron Mayer
Дата:
Jeff Davis wrote:
> On Mon, 2008-01-28 at 23:13 +0000, Heikki Linnakangas wrote:
>   
>> "clusteredness" didn't get screwed up by a table that looks like this: 
>> "5 6 7 8 9 1 2 3 4"
>>     
> ...test table with a similar
> distribution to your example, and it shows a correlation of about -0.5,
> but it should be as good as something near -1 or +1.
>
> I am not a statistics expert, but it seems like a better measurement
> would be: "what is the chance that, if the tuples are close together in
> index order, the corresponding heap tuples are close together?".
>   
Same applies for data clustered by zip-code.

All rows for any State or City or County or SchoolZone
are "close together" on the same pages; yet postgres's
stats think they're totally unclustered.
> The answer to that question in your example is "very likely", so there
> would be no problem.
> Is there a reason we don't do this?
>   
I've been tempted to do things like
  update pg_statistic set stanumbers3='{1.0}' where starelid=2617 and
staattnum=7;

after every analyze when I have data like this from tables clustered
by zip.  Seems it'd help more plans than it hurts, but haven't been
brave enough to try in production.



Re: [PATCHES] Proposed patch: synchronized_scanningGUCvariable

От
"Heikki Linnakangas"
Дата:
Jeff Davis wrote:
> On Mon, 2008-01-28 at 23:13 +0000, Heikki Linnakangas wrote:
>> It's a good point that we don't want pg_dump to screw up the cluster 
>> order, but that's the only use case I've seen this far for disabling 
>> sync scans. Even that wouldn't matter much if our estimate for 
>> "clusteredness" didn't get screwed up by a table that looks like this: 
>> "5 6 7 8 9 1 2 3 4"
> 
> It doesn't seem like there is any reason for the estimate to get
> confused, but it apparently does. I loaded a test table with a similar
> distribution to your example, and it shows a correlation of about -0.5,
> but it should be as good as something near -1 or +1.
> 
> I am not a statistics expert, but it seems like a better measurement
> would be: "what is the chance that, if the tuples are close together in
> index order, the corresponding heap tuples are close together?".
> 
> The answer to that question in your example is "very likely", so there
> would be no problem.
> 
> Is there a reason we don't do this?

It has been discussed before, but no-one has come up with a good 
measurement for that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
"Zeugswetter Andreas ADI SD"
Дата:
> It's a good point that we don't want pg_dump to screw up the cluster
> order, but that's the only use case I've seen this far for disabling
> sync scans. Even that wouldn't matter much if our estimate for
> "clusteredness" didn't get screwed up by a table that looks
> like this:
> "5 6 7 8 9 1 2 3 4"

I do think the guc to turn it off is useful, only I don't understand the
reasoning that pg_dump needs it to maintain the basic clustered
property.

Sorry, but I don't grok this at all.
Why the heck would we care if we have 2 parts of the table perfectly
clustered,
because we started in the middle ? Surely our stats collector should
recognize
such a table as perfectly clustered. Does it not ? We are talking about
one
breakage in the readahead logic here, this should only bring the
clustered property
from 100% to some 99.99% depending on table size vs readahead window.

Andreas


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Gregory Stark
Дата:
"Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes:

> Sorry, but I don't grok this at all. Why the heck would we care if we have 2
> parts of the table perfectly clustered, because we started in the middle ?
> Surely our stats collector should recognize such a table as perfectly
> clustered. Does it not ? We are talking about one breakage in the readahead
> logic here, this should only bring the clustered property from 100% to some
> 99.99% depending on table size vs readahead window.

Well clusteredness is used or could be used for a few different heuristics,
not all of which this would be quite as well satisfied as readahead. But for
the most common application, namely trying to figure out whether index probes
for sequential ids will be sequential i/o or random i/o you're right.

Currently the statistic we use to estimate this is the correlation of the
column value with the physical location on disk. That's not a perfect metric
for estimating how much random i/o would be needed to scan the table in index
order though.

It would be great if Postgres picked up a serious statistics geek who could
pipe up in discussions like this with "how about using the Euler-Jacobian
Centroid" or some such thing. If you have any suggestions of what metric to
use and how to calculate the info we need from it that would be great.

One suggestion from a long way back was scanning the index and counting how
many times the item pointer moves backward to an earlier block. That would
still require a full index scan though. And it doesn't help for columns which
aren't indexed though I'm not sure we need this info for columns which aren't
indexed. It's also not clear how to interpolate from that the amount of random
access a given query would perform.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Euler Taveira de Oliveira
Дата:
Simon Riggs wrote:

> And if you have a partitioned table with partitions inconveniently
> sized? You'd need to *reduce* shared_buffers specifically to get synch
> scans and BAS to kick in. Or increase partition size. Both of which
> reduce the impact of the benefits we've added.
> 
> I don't think the argument that "a table is smaller than shared buffers
> therefore it is already in shared buffers" holds true in all cases. I/O
> does matter.
> 
+1. If we go with 'enable_sync_seqcans' for 8.3, and in a future release 
cycle we do test the cases Simon described above and we agree we need to 
do a fine tune to benefit from this feature, we will need to deprecate 
'enable_sync_seqscans' and invent another one (sync_seqscans_threshold). 
Looking at this perpective, IMHO we should go with the number (0.25) 
instead of the boolean.


--   Euler Taveira de Oliveira  http://www.timbira.com/


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Kenneth Marshall
Дата:
On Tue, Jan 29, 2008 at 10:40:40AM +0100, Zeugswetter Andreas ADI SD wrote:
> 
> > It's a good point that we don't want pg_dump to screw up the cluster 
> > order, but that's the only use case I've seen this far for disabling 
> > sync scans. Even that wouldn't matter much if our estimate for 
> > "clusteredness" didn't get screwed up by a table that looks 
> > like this: 
> > "5 6 7 8 9 1 2 3 4"
> 
> I do think the guc to turn it off is useful, only I don't understand the
> reasoning that pg_dump needs it to maintain the basic clustered
> property.
> 
> Sorry, but I don't grok this at all.
> Why the heck would we care if we have 2 parts of the table perfectly
> clustered,
> because we started in the middle ? Surely our stats collector should
> recognize
> such a table as perfectly clustered. Does it not ? We are talking about
> one
> breakage in the readahead logic here, this should only bring the
> clustered property
> from 100% to some 99.99% depending on table size vs readahead window.
> 
> Andreas
> 

Andreas,

I agree with your logic. If the process that PostgreSQL uses to determine
how clustered a table is that breaks with such a layout, we may need to
see what should be changed to make it work. Having had pg_dump cause a
database to grind to a halt, I would definitely like the option of using
the synchronized scans even for clustered tables.

Ken


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Tom Lane
Дата:
Euler Taveira de Oliveira <euler@timbira.com> writes:
> +1. If we go with 'enable_sync_seqcans' for 8.3, and in a future release 
> cycle we do test the cases Simon described above and we agree we need to 
> do a fine tune to benefit from this feature, we will need to deprecate 
> 'enable_sync_seqscans' and invent another one (sync_seqscans_threshold). 
> Looking at this perpective, IMHO we should go with the number (0.25) 
> instead of the boolean.

Surely the risk-of-needing-to-deprecate argument applies ten times more
strongly to a number than a boolean.
        regards, tom lane


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
"Zeugswetter Andreas ADI SD"
Дата:
> > +1. If we go with 'enable_sync_seqcans' for 8.3, and in a future
release
> > cycle we do test the cases Simon described above and we agree we
need to
> > do a fine tune to benefit from this feature, we will need to
deprecate
> > 'enable_sync_seqscans' and invent another one
(sync_seqscans_threshold).
> > Looking at this perpective, IMHO we should go with the number (0.25)

> > instead of the boolean.
>
> Surely the risk-of-needing-to-deprecate argument applies ten times
more
> strongly to a number than a boolean.

Yes, I would expect the tuning to be more system than user specific.
So imho a boolean userset would couple well with a tuning guc, that
may usefully not be userset (if we later discover a need for tuning at
all).

so +1 for the bool.

synchronize[d]_seqscan sounds a bit better in my ears than the plural
synchronize_seqscans.
To me the latter somehow suggests influece on the whole cluster,
probably not
worth further discussion though, so if someone says no, ok.

Andreas



Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Jeff Davis
Дата:
On Tue, 2008-01-29 at 10:55 +0000, Gregory Stark wrote:
> "Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes:
> 
> > Sorry, but I don't grok this at all. Why the heck would we care if we have 2
> > parts of the table perfectly clustered, because we started in the middle ?
> > Surely our stats collector should recognize such a table as perfectly
> > clustered. Does it not ? We are talking about one breakage in the readahead
> > logic here, this should only bring the clustered property from 100% to some
> > 99.99% depending on table size vs readahead window.
> 
> Well clusteredness is used or could be used for a few different heuristics,
> not all of which this would be quite as well satisfied as readahead. But for

Can you give an example? Treating a file as a circular structure does
not impose any significant cost that I can see.

> It would be great if Postgres picked up a serious statistics geek who could
> pipe up in discussions like this with "how about using the Euler-Jacobian
> Centroid" or some such thing. If you have any suggestions of what metric to
> use and how to calculate the info we need from it that would be great.

Agreed.

> One suggestion from a long way back was scanning the index and counting how
> many times the item pointer moves backward to an earlier block. That would

An interesting metric. As you say, we really need a statistician to
definitively say what the correct metrics are, and what kind of sampling
we need to make good estimates.

> still require a full index scan though. And it doesn't help for columns which
> aren't indexed though I'm not sure we need this info for columns which aren't
> indexed. It's also not clear how to interpolate from that the amount of random
> access a given query would perform.

I don't think "clusteredness" has any meaning at all in postgres for an
unindexed column. I suppose a table could be clustered without an index,
but currently there's no way to do that in postgresql.

Regards,Jeff Davis





Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Tom Lane
Дата:
"Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes:
> synchronize[d]_seqscan sounds a bit better in my ears than the plural
> synchronize_seqscans.

The plural seems better to me; there's no such thing as a solitary
synchronized scan, no?  The whole point of the feature is to affect
the behavior of multiple scans.

BTW, so far as the rest of the thread goes, I'm not necessarily opposed
to exposing the switchover threshold as a tunable.  But I think it needs
more thought to design than we can give it in time for 8.3 (because of
the interaction with the buffer access strategy stuff).  Also I don't
like having pg_dump manipulating a tuning parameter.  I don't see
anything wrong with having both an on/off feature switch and a tunable
in future releases.  The feature switch can be justified on grounds
of backwards compatibility quite independently of whether pg_dump uses
it.  Or is someone prepared to argue that there are no applications out
there that will be broken if the same query, against the same unchanging
table, yields different results from one trial to the next?
        regards, tom lane


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
"Kevin Grittner"
Дата:
>>> On Tue, Jan 29, 2008 at  1:09 PM, in message <24107.1201633753@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Or is someone prepared to argue that there are no applications out
> there that will be broken if the same query, against the same unchanging
> table, yields different results from one trial to the next?
If geqo kicks in, we're already there, aren't we?
Isn't an application which counts on the order of result rows
without specifying ORDER BY fundamentally broken?
-Kevin




Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Tom Lane
Дата:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
>> Or is someone prepared to argue that there are no applications out
>> there that will be broken if the same query, against the same unchanging
>> table, yields different results from one trial to the next?
> If geqo kicks in, we're already there, aren't we?

Yup, and that's one of the reasons we have a way to turn geqo off.
(geqo is actually a good precedent for this --- notice that it has
an on/off switch that's separate from its tuning knobs.)

> Isn't an application which counts on the order of result rows
> without specifying ORDER BY fundamentally broken?

No doubt, but if it's always worked before, people are going to be
unhappy anyway.

Also, it's not just ordering that's at stake.  Try

regression=# create table foo as select x from generate_series(1,1000000) x;
SELECT
regression=# select * from foo limit 10000;  x   
-------    1    2    3    4....
regression=# select * from foo limit 10000;  x   
------- 7233 7234 7235 7236 ....
regression=# select * from foo limit 10000;  x   
-------14465144661446714468....

Now admittedly we've never promised LIMIT without ORDER BY to be
well-defined either, but not everybody reads the fine print.
This case is particularly nasty because at smaller LIMIT values
the result *is* consistent, so you might never notice the problem
while testing.
        regards, tom lane


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Ron Mayer
Дата:
Tom Lane wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>   
>> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
>>     
>>> Or is someone prepared to argue that there are no applications out
>>> there that will be broken if the same query, against the same unchanging
>>> table, yields different results from one trial to the next? 
>>>       
Won't even autovacuum "analyze" cause this too if the
new stats changes the plan?



Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Tom Lane
Дата:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
> Tom Lane wrote:
> Or is someone prepared to argue that there are no applications out
> there that will be broken if the same query, against the same unchanging
> table, yields different results from one trial to the next? 
> 
> Won't even autovacuum "analyze" cause this too if the
> new stats changes the plan?

Given that the table is unchanging, that's moderately unlikely to happen
(especially for "select * from foo" ;-))
        regards, tom lane


Re: [PATCHES] Proposed patch: synchronized_scanningGUCvariable

От
"Stephen Denne"
Дата:
Jeff Davis wrote
> > Well clusteredness is used or could be used for a few
> different heuristics,
> > not all of which this would be quite as well satisfied as
> readahead. But for
>
> Can you give an example? Treating a file as a circular structure does
> not impose any significant cost that I can see.

(Pure speculation follows... if you prefer facts, skip this noise)

The data used to create pg_stats.correlation is involved in estimating the cost of an index scan.
It could also be used in estimating the cost of a sequential scan, if the query includes a limit.

Consider:
select * from huge_table_clustered_by_A where A<most_As limit 1000

If the correlation for A is close to 1, a sequential scan should be cheaper than an index scan.

(If the query also included an order by clause, the sequential scan would have to read the entire table to ensure it
hadfound the top 1000, instead of any old 1000 returned in order) 

If A is a circular structure, you would have to know where it started, and include this info in the dump/restore (or
loseA's correlation). 

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any
attachmentsis confidential and may be subject to legal privilege.  If it is not intended for you please advise by reply
immediately,destroy it and do not copy, disclose or use it in any way. 

__________________________________________________________________ This email has been scanned by the DMZGlobal
BusinessQuality              Electronic Messaging Suite. 
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________



Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
"Guillaume Smet"
Дата:
On Jan 29, 2008 8:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes:
> > synchronize[d]_seqscan sounds a bit better in my ears than the plural
> > synchronize_seqscans.
>
> The plural seems better to me; there's no such thing as a solitary
> synchronized scan, no?  The whole point of the feature is to affect
> the behavior of multiple scans.

+1. The plural is important IMHO.

> BTW, so far as the rest of the thread goes, I'm not necessarily opposed
> to exposing the switchover threshold as a tunable.  But I think it needs
> more thought to design than we can give it in time for 8.3 (because of
> the interaction with the buffer access strategy stuff).

+1. The current patch is simple and so far in the cycle, I really
think we should keep it that way.

> The feature switch can be justified on grounds
> of backwards compatibility quite independently of whether pg_dump uses
> it.  Or is someone prepared to argue that there are no applications out
> there that will be broken if the same query, against the same unchanging
> table, yields different results from one trial to the next?

As I stated earlier, I don't really like this argument (we already
broke badly designed applications a few times in the past) but we
really need a way to guarantee that the execution of a query is stable
and doesn't depend on external factors. And the original problem was
to guarantee that pg_dump builds a dump as identical as possible to
the existing data by ignoring external factors. It's now the case with
your patch.
The fact that it allows us not to break existing applications relying
too much on physical ordering is a nice side effect though :).

--
Guillaume


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
"Zeugswetter Andreas ADI SD"
Дата:
> > The plural seems better to me; there's no such thing as a solitary
> > synchronized scan, no?  The whole point of the feature is to affect
> > the behavior of multiple scans.
>
> +1. The plural is important IMHO.

ok, good.

> As I stated earlier, I don't really like this argument (we already
> broke badly designed applications a few times in the past) but we
> really need a way to guarantee that the execution of a query is stable
> and doesn't depend on external factors. And the original problem was
> to guarantee that pg_dump builds a dump as identical as possible to
> the existing data by ignoring external factors. It's now the case with
> your patch.
> The fact that it allows us not to break existing applications relying
> too much on physical ordering is a nice side effect though :).

One more question. It would be possible that a session that turned off
the synchronized_seqscans still be a pack leader for other later
sessions.
Do/should we consider that ?

The procedure would be:
start from page 0
iff no other pack is present fill the current scan position for others

Andreas


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Kenneth Marshall
Дата:
On Wed, Jan 30, 2008 at 10:56:47AM +0100, Zeugswetter Andreas ADI SD wrote:
> 
> > > The plural seems better to me; there's no such thing as a solitary
> > > synchronized scan, no?  The whole point of the feature is to affect
> > > the behavior of multiple scans.
> > 
> > +1. The plural is important IMHO.
> 
> ok, good.
> 
> > As I stated earlier, I don't really like this argument (we already
> > broke badly designed applications a few times in the past) but we
> > really need a way to guarantee that the execution of a query is stable
> > and doesn't depend on external factors. And the original problem was
> > to guarantee that pg_dump builds a dump as identical as possible to
> > the existing data by ignoring external factors. It's now the case with
> > your patch.
> > The fact that it allows us not to break existing applications relying
> > too much on physical ordering is a nice side effect though :).
> 
> One more question. It would be possible that a session that turned off
> the synchronized_seqscans still be a pack leader for other later
> sessions.
> Do/should we consider that ?
> 
> The procedure would be:
> start from page 0
> iff no other pack is present fill the current scan position for others
> 

I think that allowing other scans to use the scan started by a query that
disabled the sync scans would have value. It would prevent these types
of queries from completely tanking the I/O.

+1

Ken


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Alvaro Herrera
Дата:
Zeugswetter Andreas ADI SD escribió:
> 
> > > The plural seems better to me; there's no such thing as a solitary
> > > synchronized scan, no?  The whole point of the feature is to affect
> > > the behavior of multiple scans.
> > 
> > +1. The plural is important IMHO.
> 
> ok, good.

Hmm, if you guys are going to add another GUC variable, please hurry
because we have to translate the description text.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Simon Riggs
Дата:
On Mon, 2008-01-28 at 16:21 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > Rather than having a boolean GUC, we should have a number and make the
> > parameter "synchronised_scan_threshold".
> 
> This would open up a can of worms I'd prefer not to touch, having to do
> with whether the buffer-access-strategy behavior should track that or
> not.  As the note in heapam.c says,
> 
>      * If the table is large relative to NBuffers, use a bulk-read access
>      * strategy and enable synchronized scanning (see syncscan.c).  Although
>      * the thresholds for these features could be different, we make them the
>      * same so that there are only two behaviors to tune rather than four.
> 
> It's a bit late in the cycle to be revisiting that choice.  Now we do
> already have three behaviors to worry about (BAS on and syncscan off)
> but throwing in a randomly settable knob will take it back to four,
> and we have no idea how that fourth case will behave.  The other tack we
> could take (having the one GUC variable control both thresholds) is
> not good since it will result in pg_dump trashing the buffer cache.

I'm still not very happy with any of the options here.

BAS is great if you didn't want to trash the cache, but its also
annoying to people that really did want to load a large table into
cache. However we set it, we're going to have problems because not
everybody has the same database.

We're trying to guess which data is in memory and which is on disk and
then act accordinly. The answer to that question cannot be answered
solely by how big shared_buffers is. It really ought to be a combination
of (at least) shared_buffers and total database size. I think we must
either put some more intelligence into the setting of the threshold, or
give it to the user as a parameter, possibly as a parameter not
mentioned in the sample .conf.

If we set the threshold unintelligently or in a way that cannot be
overridden, we will still get weird bug reports from people who set
shared_buffers higher and got a performance drop.

We need to make a final decision on this quickly, so I'll say no more on
this for 8.3 to help that process.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Hmm, if you guys are going to add another GUC variable, please hurry
> because we have to translate the description text.

Yeah, I'm going to put it in today --- just the on/off switch.
Any discussions of exposing threshold parameters will have to wait
for 8.4.
        regards, tom lane


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Tom Lane
Дата:
Simon Riggs <simon@2ndquadrant.com> writes:
> I'm still not very happy with any of the options here.

> BAS is great if you didn't want to trash the cache, but its also
> annoying to people that really did want to load a large table into
> cache. However we set it, we're going to have problems because not
> everybody has the same database.

That argument leads immediately to the conclusion that you need
per-table control over the behavior.  Which maybe you do, but it's
far too late to be proposing it for 8.3.  We should put this whole
area of more-control-over-BAS-and-syncscan on the TODO agenda.
        regards, tom lane


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Tom Lane
Дата:
"Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes:
> One more question. It would be possible that a session that turned off
> the synchronized_seqscans still be a pack leader for other later
> sessions.
> Do/should we consider that ?

Seems like a reasonable thing to consider ... for 8.4.  I'm not willing
to go poking the syncscan code that much at this late point in the 8.3
cycle.  (I'm not sure if it's been mentioned yet on -hackers, but the
current plan is to freeze 8.3.0 tomorrow evening.)
        regards, tom lane


Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
Simon Riggs
Дата:
On Wed, 2008-01-30 at 13:07 -0500, Tom Lane wrote:
> "Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes:
> > One more question. It would be possible that a session that turned off
> > the synchronized_seqscans still be a pack leader for other later
> > sessions.
> > Do/should we consider that ?
> 
> Seems like a reasonable thing to consider ... for 8.4. 

Definitely. I thought about this the other day and decided it had some
strange behaviour in some circumstances, so wouldn't be desirable
overall.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
"Heikki Linnakangas"
Дата:
Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> I'm still not very happy with any of the options here.
> 
>> BAS is great if you didn't want to trash the cache, but its also
>> annoying to people that really did want to load a large table into
>> cache. However we set it, we're going to have problems because not
>> everybody has the same database.
> 
> That argument leads immediately to the conclusion that you need
> per-table control over the behavior.

It's even worse than that. Elsewhere in this thread Simon mentioned a 
partitioned table, where each partition on its own is smaller than the 
threshold, but you're seq scanning several partitions and the total size 
of the seq scans is larger than memory size. In that scenario, you would 
want BAS and synchronized scans, but even a per-table setting wouldn't 
cut it.

One idea would be to look at the access plan and estimate the total size 
of all scans in the plan. Another idea would be to switch to a more seq 
scan resistant cache replacement algorithm, and forget about the threshold.

For synchronized scans to help in the partitioned situation, I guess 
you'd want to synchronize across partitions. If someone is already 
scanning partition 5, you'd want to start from that partition and join 
the pack, instead of starting from partition 1.

I think we'll be wiser after we see some real world use of what we have 
there now..

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Simon Riggs
Дата:
On Wed, 2008-01-30 at 18:42 +0000, Heikki Linnakangas wrote:
> Tom Lane wrote:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> >> I'm still not very happy with any of the options here.
> > 
> >> BAS is great if you didn't want to trash the cache, but its also
> >> annoying to people that really did want to load a large table into
> >> cache. However we set it, we're going to have problems because not
> >> everybody has the same database.
> > 
> > That argument leads immediately to the conclusion that you need
> > per-table control over the behavior.
> 
> It's even worse than that. Elsewhere in this thread Simon mentioned a 
> partitioned table, where each partition on its own is smaller than the 
> threshold, but you're seq scanning several partitions and the total size 
> of the seq scans is larger than memory size. In that scenario, you would 
> want BAS and synchronized scans, but even a per-table setting wouldn't 
> cut it.

> For synchronized scans to help in the partitioned situation, I guess 
> you'd want to synchronize across partitions. If someone is already 
> scanning partition 5, you'd want to start from that partition and join 
> the pack, instead of starting from partition 1.

You're right, but in practice its not quite that bad with the
multi-table route. When you have partitions you generally exclude most
of them, with typically 1-2 per query, usually different ones.

If you were scanning lots of partitions in sequence so frequently that
you'd get benefit from synch scans then your partitioning scheme isn't
working for you - and that is the worst problem by far.

But yes, it does need to be addressed.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: [PATCHES] Proposed patch: synchronized_scanning GUCvariable

От
"Heikki Linnakangas"
Дата:
Simon Riggs wrote:
> On Wed, 2008-01-30 at 18:42 +0000, Heikki Linnakangas wrote:
>> It's even worse than that. Elsewhere in this thread Simon mentioned a 
>> partitioned table, where each partition on its own is smaller than the 
>> threshold, but you're seq scanning several partitions and the total size 
>> of the seq scans is larger than memory size. In that scenario, you would 
>> want BAS and synchronized scans, but even a per-table setting wouldn't 
>> cut it.
> 
>> For synchronized scans to help in the partitioned situation, I guess 
>> you'd want to synchronize across partitions. If someone is already 
>> scanning partition 5, you'd want to start from that partition and join 
>> the pack, instead of starting from partition 1.
> 
> You're right, but in practice its not quite that bad with the
> multi-table route. When you have partitions you generally exclude most
> of them, with typically 1-2 per query, usually different ones.

Yep. And in that case, you *don't'* want BAS or sync scans to kick in, 
because you're only accessing a relatively small chunk of data, and it's 
worthwhile to cache it.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
> > I'm still not very happy with any of the options here.
> 
> > BAS is great if you didn't want to trash the cache, but its also
> > annoying to people that really did want to load a large table into
> > cache. However we set it, we're going to have problems because not
> > everybody has the same database.
> 
> That argument leads immediately to the conclusion that you need
> per-table control over the behavior.  Which maybe you do, but it's
> far too late to be proposing it for 8.3.  We should put this whole
> area of more-control-over-BAS-and-syncscan on the TODO agenda.

Another question --- why don't we just turn off synchronized_seqscans
when we do COPY TO?  That would fix pg_dump and be transparent.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Bruce Momjian
Дата:
Bruce Momjian wrote:
> Tom Lane wrote:
> > Simon Riggs <simon@2ndquadrant.com> writes:
> > > I'm still not very happy with any of the options here.
> > 
> > > BAS is great if you didn't want to trash the cache, but its also
> > > annoying to people that really did want to load a large table into
> > > cache. However we set it, we're going to have problems because not
> > > everybody has the same database.
> > 
> > That argument leads immediately to the conclusion that you need
> > per-table control over the behavior.  Which maybe you do, but it's
> > far too late to be proposing it for 8.3.  We should put this whole
> > area of more-control-over-BAS-and-syncscan on the TODO agenda.
> 
> Another question --- why don't we just turn off synchronized_seqscans
> when we do COPY TO?  That would fix pg_dump and be transparent.

Sorry, I was unclear. I meant don't have a GUC at all but just set an
internal variable to turn off synchronized sequential scans when we do
COPY TO.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> Another question --- why don't we just turn off synchronized_seqscans
> when we do COPY TO?  That would fix pg_dump and be transparent.

Enforcing this from the server side seems a pretty bad idea.  Note that
there were squawks about having pg_dump behave this way at all; if the
control is on the pg_dump side then at least we have the chance to make
it a user option later.

Also, you forgot about pg_dump -d.
        regards, tom lane


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Another question --- why don't we just turn off synchronized_seqscans
> > when we do COPY TO?  That would fix pg_dump and be transparent.
> 
> Enforcing this from the server side seems a pretty bad idea.  Note that
> there were squawks about having pg_dump behave this way at all; if the
> control is on the pg_dump side then at least we have the chance to make
> it a user option later.
> 
> Also, you forgot about pg_dump -d.

OK, but keep in mind if we use synchronized_seqscans in pg_dump we will
have to recognize that GUC forever.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> OK, but keep in mind if we use synchronized_seqscans in pg_dump we will
> have to recognize that GUC forever.

No, because it's being used on the query side, not in the emitted dump.
We have *never* promised that pg_dump version N could dump from server
version N+1 .., in fact, personally I'd like to make that case be a hard
error, rather than something people could override with -i.
        regards, tom lane


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > OK, but keep in mind if we use synchronized_seqscans in pg_dump we will
> > have to recognize that GUC forever.
> 
> No, because it's being used on the query side, not in the emitted dump.
> We have *never* promised that pg_dump version N could dump from server
> version N+1 .., in fact, personally I'd like to make that case be a hard
> error, rather than something people could override with -i.

Oh, yea, interesting, it is part of the connection.  That will help.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Alvaro Herrera
Дата:
> Tom Lane wrote:

> > in fact, personally I'd like to make that case be a hard error,
> > rather than something people could override with -i.

+1 to this idea.  TODO for 8.4?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Simon Riggs
Дата:
On Thu, 2008-01-31 at 11:20 -0300, Alvaro Herrera wrote:
> > Tom Lane wrote:
> 
> > > in fact, personally I'd like to make that case be a hard error,
> > > rather than something people could override with -i.
> 
> +1 to this idea.  TODO for 8.4?

-1 without some more planning about the effects and implications.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)

От
Alvaro Herrera
Дата:
Simon Riggs escribió:
> On Thu, 2008-01-31 at 11:20 -0300, Alvaro Herrera wrote:
> > > Tom Lane wrote:
> > 
> > > > in fact, personally I'd like to make that case be a hard error,
> > > > rather than something people could override with -i.
> > 
> > +1 to this idea.  TODO for 8.4?
> 
> -1 without some more planning about the effects and implications.

Effect: we would stop receiving complaints that an old pg_dump can talk
to a server that most likely is incompatible with it.  People would
learn on the spot that they must install the newer pg_dump.

Implication: you cannot dump a newer database and expect it to load on
an older server (it would work for certain versions, but not all).  So
if people have a bleeding-edge test server, they cannot migrate stuff
from it to their older production server.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: {**Spam**} Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Dimitri Fontaine
Дата:
Hi,

Le jeudi 31 janvier 2008, Tom Lane a écrit :
> We have *never* promised that pg_dump version N could dump from server
> version N+1 .., in fact, personally I'd like to make that case be a hard
> error, rather than something people could override with -i.

Are you thinking about next major or minor version ? All the same?
Is there some real good reason not to dump say 8.2.6 server with the pg_dump
from an 8.2.5 installation?

--
dim

Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)

От
Peter Eisentraut
Дата:
Am Donnerstag, 31. Januar 2008 schrieb Alvaro Herrera:
> Effect: we would stop receiving complaints that an old pg_dump can talk
> to a server that most likely is incompatible with it.  People would
> learn on the spot that they must install the newer pg_dump.

I think a more moderate measure might be to clarify the error message

"aborting because of version mismatch  (Use the -i option to proceed 
anyway.)\n"

I'm not sure how many of the mentioned complaints we are getting, but the 
error message might make people think that it's complaining because the 
versions are not equal rather than that the versions are known not 
compatible.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: {**Spam**} Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Bruce Momjian
Дата:
Dimitri Fontaine wrote:
-- Start of PGP signed section.
> Hi,
> 
> Le jeudi 31 janvier 2008, Tom Lane a ?crit?:
> > We have *never* promised that pg_dump version N could dump from server
> > version N+1 .., in fact, personally I'd like to make that case be a hard
> > error, rather than something people could override with -i.
> 
> Are you thinking about next major or minor version ? All the same?
> Is there some real good reason not to dump say 8.2.6 server with the pg_dump 
> from an 8.2.5 installation?

They are talking about cross-major dumps, 8.2 pg_dump dumping an 8.3
database.  They are saying that should be disallowed.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)

От
Bruce Momjian
Дата:
Peter Eisentraut wrote:
> Am Donnerstag, 31. Januar 2008 schrieb Alvaro Herrera:
> > Effect: we would stop receiving complaints that an old pg_dump can talk
> > to a server that most likely is incompatible with it.  People would
> > learn on the spot that they must install the newer pg_dump.
> 
> I think a more moderate measure might be to clarify the error message
> 
> "aborting because of version mismatch  (Use the -i option to proceed 
> anyway.)\n"
> 
> I'm not sure how many of the mentioned complaints we are getting, but the 
> error message might make people think that it's complaining because the 
> versions are not equal rather than that the versions are known not 
> compatible.

Agreed --- that error message is just prompting people to try something
they shouldn't and not explaining why.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: {**Spam**} Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Tom Lane
Дата:
Dimitri Fontaine <dfontaine@hi-media.com> writes:
> Le jeudi 31 janvier 2008, Tom Lane a écrit :
>> We have *never* promised that pg_dump version N could dump from server
>> version N+1 .., in fact, personally I'd like to make that case be a hard
>> error, rather than something people could override with -i.

> Are you thinking about next major or minor version ? All the same?
> Is there some real good reason not to dump say 8.2.6 server with the pg_dump 
> from an 8.2.5 installation?

I'm thinking next major.  In principle there could be cases where a
minor update could break pg_dump, but it seems unlikely enough that
it's not reasonable to embed such a policy in the code.  As for
next major, though, the mere existence of the -i switch is a foot-gun
with no significant value.
        regards, tom lane


Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> Am Donnerstag, 31. Januar 2008 schrieb Alvaro Herrera:
>> Effect: we would stop receiving complaints that an old pg_dump can talk
>> to a server that most likely is incompatible with it.  People would
>> learn on the spot that they must install the newer pg_dump.

> I think a more moderate measure might be to clarify the error message
> "aborting because of version mismatch  (Use the -i option to proceed 
> anyway.)\n"

I would be satisfied with that if I thought people would actually read
the message.  My complaint is really directed at certain admin packages
(and they know who they are) that invoke pg_dump *by default*, behind
the user's back, with -i.
        regards, tom lane


Re: {**Spam**} Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Dimitri Fontaine
Дата:
Le jeudi 31 janvier 2008, Tom Lane a écrit :
> I'm thinking next major.  In principle there could be cases where a
> minor update could break pg_dump, but it seems unlikely enough that
> it's not reasonable to embed such a policy in the code.  As for
> next major, though, the mere existence of the -i switch is a foot-gun
> with no significant value.

+2 then :)

1. Current behavior is to issue the « -i warning » even when having minor   version mismatch, getting rid of this would
begreat... even more so now   we know there's almost no risk here. 

2. Major version mismatch seems to be one of the major migration pitfalls out   there. The famous "You have to dump
withthe newer pg_dump version against   the current production setup" cookbook entry will certainly be better
embeddedinto the code. 

Regards,
--
dim

Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > Am Donnerstag, 31. Januar 2008 schrieb Alvaro Herrera:
> >> Effect: we would stop receiving complaints that an old pg_dump can talk
> >> to a server that most likely is incompatible with it.  People would
> >> learn on the spot that they must install the newer pg_dump.
> 
> > I think a more moderate measure might be to clarify the error message
> > "aborting because of version mismatch  (Use the -i option to proceed 
> > anyway.)\n"
> 
> I would be satisfied with that if I thought people would actually read
> the message.  My complaint is really directed at certain admin packages
> (and they know who they are) that invoke pg_dump *by default*, behind
> the user's back, with -i.

Oh?  That isn't good.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: {**Spam**} Re: [PATCHES] Proposed patch: synchronized_scanning GUC variable

От
Gregory Stark
Дата:
"Bruce Momjian" <bruce@momjian.us> writes:

> Dimitri Fontaine wrote:
> -- Start of PGP signed section.
>> Hi,
>> 
>> Le jeudi 31 janvier 2008, Tom Lane a ?crit?:
>> > We have *never* promised that pg_dump version N could dump from server
>> > version N+1 .., in fact, personally I'd like to make that case be a hard
>> > error, rather than something people could override with -i.
>> 
>> Are you thinking about next major or minor version ? All the same?
>> Is there some real good reason not to dump say 8.2.6 server with the pg_dump 
>> from an 8.2.5 installation?
>
> They are talking about cross-major dumps, 8.2 pg_dump dumping an 8.3
> database.  They are saying that should be disallowed.

And just to be clearly *only* cross-major dumps with an *older* pg_dump than
the database. Dumping with a newer pg_dump than the database is the
recommended way to do cross-major dumps.

Perhaps we can have something like --force-unsupported-incompatible-connections

1/2 :)

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!


Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)

От
Magnus Hagander
Дата:
On Thu, Jan 31, 2008 at 11:02:03AM -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Peter Eisentraut <peter_e@gmx.net> writes:
> > > Am Donnerstag, 31. Januar 2008 schrieb Alvaro Herrera:
> > >> Effect: we would stop receiving complaints that an old pg_dump can talk
> > >> to a server that most likely is incompatible with it.  People would
> > >> learn on the spot that they must install the newer pg_dump.
> > 
> > > I think a more moderate measure might be to clarify the error message
> > > "aborting because of version mismatch  (Use the -i option to proceed 
> > > anyway.)\n"
> > 
> > I would be satisfied with that if I thought people would actually read
> > the message.  My complaint is really directed at certain admin packages
> > (and they know who they are) that invoke pg_dump *by default*, behind
> > the user's back, with -i.
> 
> Oh?  That isn't good.

Right. Dave - why do we do that? ;-)

//Magnus


Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)

От
"Dave Page"
Дата:
On Feb 5, 2008 3:27 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Thu, Jan 31, 2008 at 11:02:03AM -0500, Bruce Momjian wrote:
> > Tom Lane wrote:
> > > I would be satisfied with that if I thought people would actually read
> > > the message.  My complaint is really directed at certain admin packages
> > > (and they know who they are) that invoke pg_dump *by default*, behind
> > > the user's back, with -i.
> >
> > Oh?  That isn't good.
>
> Right. Dave - why do we do that? ;-)

I didn't realise we did until Tom mentioned it - I didn't write that code.

Please go ahead and remove the -i - it's not like users cannot cannot
specify which set of pg utilities to use if they need a specific
version.

/D


Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)

От
Magnus Hagander
Дата:
Dave Page wrote:
> On Feb 5, 2008 3:27 PM, Magnus Hagander <magnus@hagander.net> wrote:
>> On Thu, Jan 31, 2008 at 11:02:03AM -0500, Bruce Momjian wrote:
>>> Tom Lane wrote:
>>>> I would be satisfied with that if I thought people would actually read
>>>> the message.  My complaint is really directed at certain admin packages
>>>> (and they know who they are) that invoke pg_dump *by default*, behind
>>>> the user's back, with -i.
>>> Oh?  That isn't good.
>> Right. Dave - why do we do that? ;-)
> 
> I didn't realise we did until Tom mentioned it - I didn't write that code.
> 
> Please go ahead and remove the -i - it's not like users cannot cannot
> specify which set of pg utilities to use if they need a specific
> version.

Ok, done!

//Magnus


Re: Remove pg_dump -i option (was Re: Proposed patch: synchronized_scanning GUC variable)

От
"Dave Page"
Дата:
On Feb 5, 2008 6:11 PM, Magnus Hagander <magnus@hagander.net> wrote:

> > Please go ahead and remove the -i - it's not like users cannot cannot
> > specify which set of pg utilities to use if they need a specific
> > version.
>
> Ok, done!

Thanks.

/D