Обсуждение: Re: [PERFORM] Hash Anti Join performance degradation

Поиск
Список
Период
Сортировка

Re: [PERFORM] Hash Anti Join performance degradation

От
Cédric Villemain
Дата:
2011/5/31 Robert Haas <robertmhaas@gmail.com>:
> On Thu, May 26, 2011 at 8:33 AM, panam <panam@gmx.net> wrote:
>> Any third party confirmation?
>
> Yeah, it definitely looks like there is some kind of bug here.  Or if
> not a bug, then a very surprising feature.  EXPLAIN ANALYZE outputs
> from your proposed test attached.  Here's a unified diff of the two
> outputs:
>
>
>  QUERY PLAN
>
 ----------------------------------------------------------------------------------------------------------------------------------------------------------
> - Seq Scan on box b  (cost=0.00..3669095.76 rows=128 width=8) (actual
> time=0.147..431517.693 rows=128 loops=1)
> + Seq Scan on box b  (cost=0.00..3669095.76 rows=128 width=8) (actual
> time=0.047..6938.165 rows=128 loops=1)
>    SubPlan 1
> -     ->  Hash Anti Join  (cost=14742.77..28664.79 rows=19239 width=8)
> (actual time=2960.176..3370.425 rows=1 loops=128)
> +     ->  Hash Anti Join  (cost=14742.77..28664.79 rows=19239 width=8)
> (actual time=48.385..53.361 rows=1 loops=128)
>            Hash Cond: (m1.box_id = m2.box_id)
>            Join Filter: (m1.id < m2.id)
> -           ->  Bitmap Heap Scan on message m1  (cost=544.16..13696.88
> rows=28858 width=16) (actual time=2.320..6.204 rows=18487 loops=128)
> +           ->  Bitmap Heap Scan on message m1  (cost=544.16..13696.88
> rows=28858 width=16) (actual time=1.928..5.502 rows=17875 loops=128)
>                  Recheck Cond: (box_id = b.id)
> -                 ->  Bitmap Index Scan on "message_box_Idx"
> (cost=0.00..536.94 rows=28858 width=0) (actual time=2.251..2.251
> rows=18487 loops=128)
> +                 ->  Bitmap Index Scan on "message_box_Idx"
> (cost=0.00..536.94 rows=28858 width=0) (actual time=1.797..1.797
> rows=18487 loops=128)
>                        Index Cond: (box_id = b.id)
> -           ->  Hash  (cost=13696.88..13696.88 rows=28858 width=16)
> (actual time=12.632..12.632 rows=19720 loops=120)
> -                 Buckets: 4096  Batches: 4 (originally 2)  Memory Usage: 1787kB
> -                 ->  Bitmap Heap Scan on message m2
> (cost=544.16..13696.88 rows=28858 width=16) (actual time=1.668..6.619
> rows=19720 loops=120)
> +           ->  Hash  (cost=13696.88..13696.88 rows=28858 width=16)
> (actual time=11.603..11.603 rows=20248 loops=113)
> +                 Buckets: 4096  Batches: 4 (originally 2)  Memory Usage: 1423kB
> +                 ->  Bitmap Heap Scan on message m2
> (cost=544.16..13696.88 rows=28858 width=16) (actual time=1.838..6.886
> rows=20248 loops=113)
>                        Recheck Cond: (box_id = b.id)
> -                       ->  Bitmap Index Scan on "message_box_Idx"
> (cost=0.00..536.94 rows=28858 width=0) (actual time=1.602..1.602
> rows=19720 loops=120)
> +                       ->  Bitmap Index Scan on "message_box_Idx"
> (cost=0.00..536.94 rows=28858 width=0) (actual time=1.743..1.743
> rows=20903 loops=113)
>                              Index Cond: (box_id = b.id)
> - Total runtime: 431520.186 ms
> + Total runtime: 6940.369 ms
>
> That's pretty odd.

Yes, while here I noticed that the query was long to be killed.
I added a CHECK_FOR_INTERRUPT() in the for(;;) loop in nodeHashjoin.c.
It fixes the delay when trying to kill but I don't know about
performance impact this can have in this place of the code.

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Вложения

Re: [PERFORM] Hash Anti Join performance degradation

От
Robert Haas
Дата:
On Tue, May 31, 2011 at 8:43 PM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:
> Yes, while here I noticed that the query was long to be killed.
> I added a CHECK_FOR_INTERRUPT() in the for(;;) loop in nodeHashjoin.c.
> It fixes the delay when trying to kill but I don't know about
> performance impact this can have in this place of the code.

Well, seems easy enough to find out: just test the query with and
without your patch (and without casserts).  If there's no measurable
difference on this query, there probably won't be one anywhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PERFORM] Hash Anti Join performance degradation

От
Cédric Villemain
Дата:
2011/6/1 Robert Haas <robertmhaas@gmail.com>:
> On Tue, May 31, 2011 at 8:43 PM, Cédric Villemain
> <cedric.villemain.debian@gmail.com> wrote:
>> Yes, while here I noticed that the query was long to be killed.
>> I added a CHECK_FOR_INTERRUPT() in the for(;;) loop in nodeHashjoin.c.
>> It fixes the delay when trying to kill but I don't know about
>> performance impact this can have in this place of the code.
>
> Well, seems easy enough to find out: just test the query with and
> without your patch (and without casserts).  If there's no measurable
> difference on this query, there probably won't be one anywhere

Oh damned, I am currently with an eeepc, I'll need 2 days to bench that :-D
I'll see tomorow.
.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


Re: [PERFORM] Hash Anti Join performance degradation

От
Robert Haas
Дата:
On Tue, May 31, 2011 at 9:11 PM, Cédric Villemain
<cedric.villemain.debian@gmail.com> wrote:
> Oh damned, I am currently with an eeepc, I'll need 2 days to bench that :-D
> I'll see tomorow.

LOL.

With respect to the root of the issue (why does the anti-join take so
long?), my first thought was that perhaps the OP was very unlucky and
had a lot of values that hashed to the same bucket.  But that doesn't
appear to be the case.  There are 120 different box_id values in the
message table, and running hashint8(box_id) % 16384 (which I think is
the right calculation: 4096 buckets * 4 batches) yields 120 different
values.

I think that part of the problem here is that the planner has no
particularly efficient way of executing a non-equijoin.  Each probe
finds the appropriate hash bucket and must then probe the entire
chain, all of which pass the hash condition and some of which fail the
join qual.  So if there are n_1 instances of value v_1, n_2 instances
of value v_2, etc. then the total effort is proportional to n_1^2 +
n_2^2 + ...

But that doesn't really explain the problem because removing the last
few rows only changes that sum by a small amount.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PERFORM] Hash Anti Join performance degradation

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> With respect to the root of the issue (why does the anti-join take so
> long?), my first thought was that perhaps the OP was very unlucky and
> had a lot of values that hashed to the same bucket.  But that doesn't
> appear to be the case.

Well, yes it is.  Notice what the subquery is doing: for each row in
"box", it's pulling all matching "box_id"s from message and running a
self-join across those rows.  The hash join condition is a complete
no-op.  And some of the box_ids have hundreds of thousands of rows.

I'd just write it off as being a particularly stupid way to find the
max(), except I'm not sure why deleting just a few thousand rows
improves things so much.  It looks like it ought to be an O(N^2)
situation, so the improvement should be noticeable but not amazing.

And if you force it to not use a hashjoin, suddenly things are better.
Nestloop should also be O(N^2) in this situation, but seemingly it
avoids whatever weird corner case we are hitting here.

As Cedric says, the lack of any CHECK_FOR_INTERRUPTS in this loop is
also problematic.  I'm not sure that right there is an ideal place
to put it, but we need one somewhere in the loop.
        regards, tom lane


Re: [PERFORM] Hash Anti Join performance degradation

От
Robert Haas
Дата:
On Tue, May 31, 2011 at 11:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> With respect to the root of the issue (why does the anti-join take so
>> long?), my first thought was that perhaps the OP was very unlucky and
>> had a lot of values that hashed to the same bucket.  But that doesn't
>> appear to be the case.
>
> Well, yes it is.  Notice what the subquery is doing: for each row in
> "box", it's pulling all matching "box_id"s from message and running a
> self-join across those rows.  The hash join condition is a complete
> no-op.  And some of the box_ids have hundreds of thousands of rows.
>
> I'd just write it off as being a particularly stupid way to find the
> max(), except I'm not sure why deleting just a few thousand rows
> improves things so much.  It looks like it ought to be an O(N^2)
> situation, so the improvement should be noticeable but not amazing.

Yeah, this is what I was getting at, though perhaps I didn't say it
well.  If the last 78K rows were particularly pathological in some
way, that might explain something, but as far as one can see they are
not a whole heck of a lot different from the rest of the data.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PERFORM] Hash Anti Join performance degradation

От
panam
Дата:
Tom Lane-2 wrote:
> 
> It looks like it ought to be an O(N^2)
> situation, so the improvement should be noticeable but not amazing.
> 

Hm, the performance was reasonable again when doing a cluster...
So I believe this should be more a technical than an
algorithmical/complexity issue. Maybe it is the way the hashtable is built
and that order makes a difference in that case? In short: Why is clustered
data not affected?

Regards,
panam

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Re-PERFORM-Hash-Anti-Join-performance-degradation-tp4443803p4445123.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


Re: [PERFORM] Hash Anti Join performance degradation

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, May 31, 2011 at 11:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'd just write it off as being a particularly stupid way to find the
>> max(), except I'm not sure why deleting just a few thousand rows
>> improves things so much. �It looks like it ought to be an O(N^2)
>> situation, so the improvement should be noticeable but not amazing.

> Yeah, this is what I was getting at, though perhaps I didn't say it
> well.  If the last 78K rows were particularly pathological in some
> way, that might explain something, but as far as one can see they are
> not a whole heck of a lot different from the rest of the data.

Well, I wasted a lot more time on this than I should have, but the
answer is: it's a pathological data ordering.

The given data file loads the message rows approximately in "id" order,
and in fact once you lop off the ones with id > 2550000, it's
sufficiently in order that the highest id is also physically last in
the table, at least for all of the box_ids that have large numbers of
entries.  Now, the tested query plan loads the hash table like this:
  ->  Hash  (cost=13685.86..13685.86 rows=28511 width=16) (actual time=176.286..176.286 rows=211210 loops=1)
Buckets:4096  Batches: 1  Memory Usage: 9901kB        ->  Bitmap Heap Scan on message m2  (cost=537.47..13685.86
rows=28511width=16) (actual time=23.204..124.624 rows=211210 loops=1)              Recheck Cond: (box_id = $1)
   ->  Bitmap Index Scan on "message_box_Idx"  (cost=0.00..530.34 rows=28511 width=0) (actual time=21.974..21.974
rows=211210loops=1)                    Index Cond: (box_id = $1)
 

Because of the way that a bitmap heap scan works, the rows are
guaranteed to be loaded into the hash table in physical order, which
means (in the fast case) that the row with the largest "id" value gets
loaded last.  And because ExecHashTableInsert pushes each row onto the
front of its hash chain, that row ends up at the front of the hash
chain.  Net result: for all the outer rows that aren't the one with
maximum id, we get a joinqual match at the very first entry in the hash
chain.  Since it's an antijoin, we then reject that outer row and go
on to the next.  The join thus ends up costing us only O(N) time.

However, with the additional rows in place, there are a significant
number of outer rows that don't get a match at the first hash chain
entry, and we're spending more like O(N^2) time.  I instrumented things
for the specific case of box_id = 69440, which is the most common
box_id, and got these results:
  2389 got match of join quals at probe 208077     1 got match of join quals at probe 1   175 got match of join quals
atprobe 208077   273 got match of join quals at probe 1    21 got match of join quals at probe 208077     1 got match
ofjoin quals at probe 1    24 got match of join quals at probe 208077     6 got match of join quals at probe 1   157
gotmatch of join quals at probe 208077     1 got match of join quals at probe 1    67 got match of join quals at probe
208077   18 got match of join quals at probe 1     1 generate null-extended tuple after 211211 probes208075 got match
ofjoin quals at probe 1     1 got match of join quals at probe 208077
 

(This is a "uniq -c" summary of a lot of printfs, so the first column
is the number of consecutive occurrences of the same printout.)  Even
though a large majority of the outer rows still manage to find a match
at the first probe, there remain about 2800 that don't match there,
because they've got pretty big ids, and so they traipse through the hash
chain until they find the genuinely largest id, which is unfortunately
way down there ---- the 208077'th chain entry in fact.  That results in
about half a billion more ExecScanHashBucket and ExecQual calls than
occur in the easy case (and that's just for this one box_id).

So it's not that the full data set is pathologically bad, it's that the
reduced set is pathologically good.  O(N^2) performance is what you
should expect for this query, and that's what you're actually getting
with the full data set.

Also, I noted earlier that performance seemed a good deal better with a
NestLoop plan.  The reason for that is that NestLoop doesn't have the
reversal of inner row ordering that's caused by prepending entries to
the hash chain, so the very largest row id isn't 208077 entries into the
list for it, but only 211211-208077 = 3134 entries in.  But it still
manages to eliminate most outer rows at the first probe, because there's
a fairly large value at that end of the dataset too.

I don't see anything much that we could or should do about this.  It
just depends on the order in which things appear in the hash chain,
and even if we fooled around with that ordering, we'd only be moving
the improbably-lucky behavior from one case to some other case.

We do need to look into putting a CHECK_FOR_INTERRUPTS call in here
somewhere, though.  I'm inclined to think that right before the
ExecScanHashBucket is the best place.  The reason that nest and merge
joins don't show a comparable non-responsiveness to cancels is that they
always call a child plan node at the equivalent place, and ExecProcNode
has got a CHECK_FOR_INTERRUPTS.  So we ought to check for interrupts
at the point of "fetching a tuple from the inner child plan", and
ExecScanHashBucket is the equivalent thing in this logic.  Cedric's
suggestion of putting it before the switch would get the job done, but
it would result in wasting cycles during unimportant transitions from
one state machine state to another.
        regards, tom lane


Re: [PERFORM] Hash Anti Join performance degradation

От
Robert Haas
Дата:
On Wed, Jun 1, 2011 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Because of the way that a bitmap heap scan works, the rows are
> guaranteed to be loaded into the hash table in physical order, which
> means (in the fast case) that the row with the largest "id" value gets
> loaded last.  And because ExecHashTableInsert pushes each row onto the
> front of its hash chain, that row ends up at the front of the hash
> chain.  Net result: for all the outer rows that aren't the one with
> maximum id, we get a joinqual match at the very first entry in the hash
> chain.  Since it's an antijoin, we then reject that outer row and go
> on to the next.  The join thus ends up costing us only O(N) time.

Ah!  Make sense.  If I'm reading your explanation right, this means
that we could have hit a similar pathological case on a nestloop as
well, just with a data ordering that is the reverse of what we have
here?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PERFORM] Hash Anti Join performance degradation

От
Cédric Villemain
Дата:
2011/6/1 Tom Lane <tgl@sss.pgh.pa.us>:
> We do need to look into putting a CHECK_FOR_INTERRUPTS call in here
> somewhere, though.  I'm inclined to think that right before the
> ExecScanHashBucket is the best place.  The reason that nest and merge
> joins don't show a comparable non-responsiveness to cancels is that they
> always call a child plan node at the equivalent place, and ExecProcNode
> has got a CHECK_FOR_INTERRUPTS.  So we ought to check for interrupts
> at the point of "fetching a tuple from the inner child plan", and
> ExecScanHashBucket is the equivalent thing in this logic.  Cedric's
> suggestion of putting it before the switch would get the job done, but
> it would result in wasting cycles during unimportant transitions from
> one state machine state to another.


exact, thanks to your last email I read more the code and get the same
conclusion and put it in a more appropriate place : before
ExecScanHashBucket.

I was about sending it, so it is attached.



>
>                        regards, tom lane
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Вложения

Re: [PERFORM] Hash Anti Join performance degradation

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Jun 1, 2011 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Because of the way that a bitmap heap scan works, the rows are
>> guaranteed to be loaded into the hash table in physical order, which
>> means (in the fast case) that the row with the largest "id" value gets
>> loaded last. �And because ExecHashTableInsert pushes each row onto the
>> front of its hash chain, that row ends up at the front of the hash
>> chain. �Net result: for all the outer rows that aren't the one with
>> maximum id, we get a joinqual match at the very first entry in the hash
>> chain. �Since it's an antijoin, we then reject that outer row and go
>> on to the next. �The join thus ends up costing us only O(N) time.

> Ah!  Make sense.  If I'm reading your explanation right, this means
> that we could have hit a similar pathological case on a nestloop as
> well, just with a data ordering that is the reverse of what we have
> here?

Yeah.  It's just chance that this particular data set, with this
particular ordering, happens to work well with a nestloop version
of the query.  On average I'd expect nestloop to suck even more,
because of more per-inner-tuple overhead.
        regards, tom lane


Re: [PERFORM] Hash Anti Join performance degradation

От
Robert Haas
Дата:
On Wed, Jun 1, 2011 at 4:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Wed, Jun 1, 2011 at 4:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Because of the way that a bitmap heap scan works, the rows are
>>> guaranteed to be loaded into the hash table in physical order, which
>>> means (in the fast case) that the row with the largest "id" value gets
>>> loaded last.  And because ExecHashTableInsert pushes each row onto the
>>> front of its hash chain, that row ends up at the front of the hash
>>> chain.  Net result: for all the outer rows that aren't the one with
>>> maximum id, we get a joinqual match at the very first entry in the hash
>>> chain.  Since it's an antijoin, we then reject that outer row and go
>>> on to the next.  The join thus ends up costing us only O(N) time.
>
>> Ah!  Make sense.  If I'm reading your explanation right, this means
>> that we could have hit a similar pathological case on a nestloop as
>> well, just with a data ordering that is the reverse of what we have
>> here?
>
> Yeah.  It's just chance that this particular data set, with this
> particular ordering, happens to work well with a nestloop version
> of the query.  On average I'd expect nestloop to suck even more,
> because of more per-inner-tuple overhead.

I guess the real issue here is that m1.id < m2.id has to be evaluated
as a filter condition rather than a join qual.  That tends to perform
poorly in general, which is why rewriting this using min() or max() or
ORDER BY .. LIMIT 1 was elsewhere recommended.  I've occasionally had
cause to join on something other than equality in cases not
susceptible to such rewriting, so it would be neat to improve this
case, but it's not likely to make it to the top of my list any time
soon.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PERFORM] Hash Anti Join performance degradation

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I guess the real issue here is that m1.id < m2.id has to be evaluated
> as a filter condition rather than a join qual.

Well, if you can invent an optimized join technique that works for
inequalities, go for it ... but I think you should get at least a
PhD thesis out of that.
        regards, tom lane


Re: [PERFORM] Hash Anti Join performance degradation

От
Robert Haas
Дата:
On Wed, Jun 1, 2011 at 4:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I guess the real issue here is that m1.id < m2.id has to be evaluated
>> as a filter condition rather than a join qual.
>
> Well, if you can invent an optimized join technique that works for
> inequalities, go for it ... but I think you should get at least a
> PhD thesis out of that.

Sounds good, except that so far NOT getting a PhD seems like a much
better financial prospect.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: [PERFORM] Hash Anti Join performance degradation

От
Tom Lane
Дата:
Cédric Villemain <cedric.villemain.debian@gmail.com> writes:
> exact, thanks to your last email I read more the code and get the same
> conclusion and put it in a more appropriate place : before
> ExecScanHashBucket.

> I was about sending it, so it is attached.

Applied with cosmetic adjustments.
        regards, tom lane


Re: [PERFORM] Hash Anti Join performance degradation

От
"Ross J. Reedstrom"
Дата:
On Wed, Jun 01, 2011 at 04:58:36PM -0400, Robert Haas wrote:
> On Wed, Jun 1, 2011 at 4:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> I guess the real issue here is that m1.id < m2.id has to be evaluated
> >> as a filter condition rather than a join qual.
> >
> > Well, if you can invent an optimized join technique that works for
> > inequalities, go for it ... but I think you should get at least a
> > PhD thesis out of that.
> 
> Sounds good, except that so far NOT getting a PhD seems like a much
> better financial prospect.  :-)

Yeah, last time I heard of some Uni being so impressed by independent
work that they just sort of handed out a Ph.D. it involved a Swiss
patent clerk ...

Ross
-- 
Ross Reedstrom, Ph.D.                                 reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist        phone: 713-348-6166
Connexions                  http://cnx.org            fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE


Re: [PERFORM] Hash Anti Join performance degradation

От
panam
Дата:
I'd like to thank you all for getting this analyzed, especially Tom!
Your rigor is pretty impressive. Seems like otherwise it'd impossible to
maintain a DBS, though.
In the end, I know a lot more of postgres internals and that this
idiosyncrasy (from a user perspective) could happen again. I guess it is my
first time where I actually encountered an unexpected worst case scenario
like this...
Seems it is up to me know to be a bit more creative with query optimzation.
And in the end, it'll turn out to require an architectural change...
As the only thing to achieve is in fact to obtain the last id (currently
still with the constraint that it has to happen in an isolated subquery), i
wonder whether this requirement (obtaining the last id) is worth a special
technique/instrumentation/strategy ( lacking a good word here),  given the
fact that this data has a full logical ordering (in this case even total)
and the use case is quite common I guess.

Some ideas from an earlier post:

panam wrote:
> 
> ...
> This also made me wonder how the internal plan is carried out. Is the
> engine able to leverage the fact that a part/range of the rows ["/index
> entries"] is totally or partially ordered on disk, e.g. using some kind of
> binary search or even "nearest neighbor"-search in that section (i.e. a
> special "micro-plan" or algorithm)? Or is the speed-up "just" because
> related data is usually "nearby" and most of the standard algorithms work
> best with clustered data?
> If the first is not the case, would that be a potential point for
> improvement? Maybe it would even be more efficient, if there were some
> sort of constraints that guarantee "ordered row" sections on the disk,
> i.e. preventing the addition of a row that had an index value in between
> two row values of an already ordered/clustered section. In the simplest
> case, it would start with the "first" row and end with the "last" row (on
> the time of doing the equivalent of "cluster"). So there would be a small
> list saying rows with id x - rows with id y are guaranteed to be ordered
> on disk (by id for example) now and for all times.
> 
Maybe I am completely off the mark but what's your conclusion? To much
effort for small scenarios? Nothing that should be handled on a DB level? A
try to battle the laws of thermodynamics with small technical dodge?

Thanks again
panam

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Re-PERFORM-Hash-Anti-Join-performance-degradation-tp4443803p4446629.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.