Обсуждение: inheritance: planning time vs children number vs column number

Поиск
Список
Период
Сортировка

inheritance: planning time vs children number vs column number

От
Marc Cousin
Дата:

Hi,


I've been facing a very large (more than 15 seconds) planning time in a partitioned configuration. The amount of partitions wasn't completely crazy, around 500, not in the thousands. The problem was that there were nearly 1000 columns in the parent table (very special use case, there is a reason for this application for having these many columns). The check constraint was extremely simple (for each child, 1 column = 1 constant, always the same column).


As I was surprised by this very large planning time, I have been trying to study the variation of planning time against several parameters:

- number of columns

- number of children tables

- constraint exclusion's value (partition or off)


What (I think) I measured is that the planning time seems to be O(n^2) for the number of columns, and O(n^2) for the number of children tables.


Constraint exclusion had a limited impact on planning time (it added between 20% and 100% planning time when there were many columns).


I'd like to know if this is a known behavior ? And if I could mitigate it somehow ?




Attached is a zipped csv file containing the result of the tests for constraint_exclusion=partition, for children from 100 to 1000 in steps of 100, and for columns from 10 to 1590 in steps of 20.


A few values are a bit off-chart as this was done on my personal computer, and it was sometimes used for other things at the same time.


The tests were done with a parent table made of only integer columns, and every children having a check (col0=id_of_child) constraint (I can also provide the perl script, of course).


The test query was "SELECT * FROM parent_table WHERE col0=id_of_child_0". Replacing it with "SELECT col0 FROM parent_table WHERE col0=id_of_child_0" didn't change the planning time significantly: it was around 5% lower, but still O(n^2). This query returned nothing (every partition is empty).


I've also done an openoffice spreadsheet graphing all this, but as it's 130kB I won't send it before being told to do so :)


The computer running the tests was an Intel core i7 870. Postgresql was 9.0.3.


Anything else I could add ?


Cheers

Вложения

Re: inheritance: planning time vs children number vs column number

От
Heikki Linnakangas
Дата:
On 28.02.2011 11:38, Marc Cousin wrote:
> I've been facing a very large (more than 15 seconds) planning time in a
> partitioned configuration. The amount of partitions wasn't completely crazy,
> around 500, not in the thousands. The problem was that there were nearly 1000
> columns in the parent table (very special use case, there is a reason for this
> application for having these many columns). The check constraint was extremely
> simple (for each child, 1 column = 1 constant, always the same column).
>
> As I was surprised by this very large planning time, I have been trying to
> study the variation of planning time against several parameters:
> - number of columns
> - number of children tables
> - constraint exclusion's value (partition or off)
>
> What (I think) I measured is that the planning time seems to be O(n^2) for the
> number of columns, and O(n^2) for the number of children tables.
>
> Constraint exclusion had a limited impact on planning time (it added between
> 20% and 100% planning time when there were many columns).

Testing here with a table with 1000 columns and 100 partitions, about
80% of the planning time is looking up the statistics on attribute
width, to calculate average tuple width. I don't see O(n^2) behavior,
though, it seems linear.

> I'd like to know if this is a known behavior ? And if I could mitigate it
> somehow ?

I'm out of ideas on how to make it faster, I'm afraid.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: inheritance: planning time vs children number vs column number

От
Marc Cousin
Дата:

The Monday 28 February 2011 13:57:45, Heikki Linnakangas wrote :

> On 28.02.2011 11:38, Marc Cousin wrote:

> > I've been facing a very large (more than 15 seconds) planning time in a

> > partitioned configuration. The amount of partitions wasn't completely

> > crazy, around 500, not in the thousands. The problem was that there were

> > nearly 1000 columns in the parent table (very special use case, there is

> > a reason for this application for having these many columns). The check

> > constraint was extremely simple (for each child, 1 column = 1 constant,

> > always the same column).

> >

> > As I was surprised by this very large planning time, I have been trying

> > to study the variation of planning time against several parameters: -

> > number of columns

> > - number of children tables

> > - constraint exclusion's value (partition or off)

> >

> > What (I think) I measured is that the planning time seems to be O(n^2)

> > for the number of columns, and O(n^2) for the number of children tables.

> >

> > Constraint exclusion had a limited impact on planning time (it added

> > between 20% and 100% planning time when there were many columns).

>

> Testing here with a table with 1000 columns and 100 partitions, about

> 80% of the planning time is looking up the statistics on attribute

> width, to calculate average tuple width. I don't see O(n^2) behavior,

> though, it seems linear.


It is only based on experimentation, for my part, of course…


If you measure the planning time, modifying either the columns or the partitions number, the square root of the planning time is almost perfectly proportional with the parameter you're playing with.


Re: inheritance: planning time vs children number vs column number

От
Tom Lane
Дата:
Marc Cousin <cousinmarc@gmail.com> writes:
> The Monday 28 February 2011 13:57:45, Heikki Linnakangas wrote :
>> Testing here with a table with 1000 columns and 100 partitions, about
>> 80% of the planning time is looking up the statistics on attribute
>> width, to calculate average tuple width. I don't see O(n^2) behavior,
>> though, it seems linear.

> It is only based on experimentation, for my part, of course…

> If you measure the planning time, modifying either the columns or the
> partitions number, the square root of the planning time is almost perfectly
> proportional with the parameter you're playing with.

Could we see a concrete example demonstrating that?  I agree with Heikki
that it's not obvious what you are testing that would have such behavior.
I can think of places that would have O(N^2) behavior in the length of
the targetlist, but it seems unlikely that they'd come to dominate
runtime at a mere 1000 columns.

            regards, tom lane

Query on view radically slower than query on underlying table

От
Craig James
Дата:
We have a medium-sized catalog (about 5 million rows), but some of our customers only want to see portions of it.  I've
beenexperimenting with a customer-specific schema that contains nothing but a "join table" -- just the primary keys of
thatportion of the data that each customer wants to see, which is used to create a view that looks like the original
table.But the most important query, the one that customers use to scan page-by-page through search results, turns out
tobe far too slow (65 seconds versus 55 milliseconds). 

Below are the results of two explain/analyze statements.  The first one uses the view, the second one goes directly to
theoriginal tables.  I thought this would be a slam-dunk, that it would return results in a flash because the view is
createdfrom two tables with the same primary keys. 

My guess (and it's just a wild guess) is that the "left join" is forcing a sequence scan or something.  But we need the
leftjoin, because it's on a "hitlist" that recorded all the matches to a customer's earlier query, and if rows have
beenremoved from the tables, the customer needs to see a blank row. 

Here is the "bad" query, which is run on the view:

em=> explain analyze
select version.version_id, version.isosmiles
from hitlist_rows_reset_140
left join version on (hitlist_rows_reset_140.objectid = version.version_id)
where hitlist_rows_reset_140.sortorder >= 1
and hitlist_rows_reset_140.sortorder <= 10
order by hitlist_rows_reset_140.sortorder;
                                                                         QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------
------------------------------
  Nested Loop Left Join  (cost=23687.51..215315.74 rows=1 width=54) (actual time=2682.662..63680.076 rows=10 loops=1)
    Join Filter: (hitlist_rows_reset_140.objectid = v.version_id)
    ->  Index Scan using hitlist_rows_reset_140_pkey on hitlist_rows_reset_140  (cost=0.00..8.36 rows=1 width=8)
(actualtime= 
0.015..0.049 rows=10 loops=1)
          Index Cond: ((sortorder >= 1) AND (sortorder <= 10))
    ->  Hash Join  (cost=23687.51..204666.54 rows=851267 width=50) (actual time=31.829..6263.403 rows=851267 loops=10)
          Hash Cond: (v.version_id = mv.version_id)
          ->  Seq Scan on version v  (cost=0.00..116146.68 rows=5631968 width=50) (actual time=0.006..859.758
rows=5632191loo 
ps=10)
          ->  Hash  (cost=13046.67..13046.67 rows=851267 width=4) (actual time=317.488..317.488 rows=851267 loops=1)
                ->  Seq Scan on my_version mv  (cost=0.00..13046.67 rows=851267 width=4) (actual time=2.888..115.166
rows=8512
67 loops=1)
  Total runtime: 63680.162 ms

Here is the "good" query, which is run directly on the data tables.

em=> explain analyze
select registry.version.version_id, registry.version.isosmiles
from hitlist_rows_reset_140
left join registry.version on (hitlist_rows_reset_140.objectid = registry.version.version_id)
where hitlist_rows_reset_140.sortorder >= 1
and hitlist_rows_reset_140.sortorder <= 10
order by hitlist_rows_reset_140.SortOrder;
                                                                         QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------
------------------------------
  Nested Loop Left Join  (cost=0.00..17.73 rows=1 width=54) (actual time=36.022..55.558 rows=10 loops=1)
    ->  Index Scan using hitlist_rows_reset_140_pkey on hitlist_rows_reset_140  (cost=0.00..8.36 rows=1 width=8)
(actualtime= 
0.021..0.025 rows=10 loops=1)
          Index Cond: ((sortorder >= 1) AND (sortorder <= 10))
    ->  Index Scan using version_pkey on version  (cost=0.00..9.35 rows=1 width=50) (actual time=5.551..5.552 rows=1
loops=10)
          Index Cond: (hitlist_rows_reset_140.objectid = version.version_id)
  Total runtime: 55.608 ms
(6 rows)


The view is defined like this:

em=> \d my_version
Table "test_schema.my_version"
    Column   |  Type   | Modifiers
------------+---------+-----------
  version_id | integer | not null
Indexes:
     "my_version_pkey" PRIMARY KEY, btree (version_id)

em=> \d version
  View "test_schema.version"
    Column   |  Type   | Modifiers
------------+---------+-----------
  version_id | integer |
  parent_id  | integer |
  isosmiles  | text    |
  coord_2d   | text    |
View definition:
  SELECT v.version_id, v.parent_id, v.isosmiles, v.coord_2d
    FROM registry.version v
    JOIN my_version mv ON mv.version_id = v.version_id;

This is:

   Postgres 8.4.4
   Ubuntu Linux 2.6.32-27
   Database: 8x7200 RAID 10, LSI RAID controller with BBU
   WAL: 2x7200 RAID1

Non-default config parameters:

max_connections = 500
shared_buffers = 1000MB
work_mem = 128MB
synchronous_commit = off
full_page_writes = off
wal_buffers = 256kB
checkpoint_segments = 30
effective_cache_size = 4GB
track_activities = on
track_counts = off
track_functions = none
escape_string_warning = off

Thanks,
Craig


Re: inheritance: planning time vs children number vs column number

От
Marc Cousin
Дата:

The Monday 28 February 2011 16:35:37, Tom Lane wrote :

> Marc Cousin <cousinmarc@gmail.com> writes:

> > The Monday 28 February 2011 13:57:45, Heikki Linnakangas wrote :

> >> Testing here with a table with 1000 columns and 100 partitions, about

> >> 80% of the planning time is looking up the statistics on attribute

> >> width, to calculate average tuple width. I don't see O(n^2) behavior,

> >> though, it seems linear.

> >

> > It is only based on experimentation, for my part, of course


> >

> > If you measure the planning time, modifying either the columns or the

> > partitions number, the square root of the planning time is almost

> > perfectly proportional with the parameter you're playing with.

>

> Could we see a concrete example demonstrating that? I agree with Heikki

> that it's not obvious what you are testing that would have such behavior.

> I can think of places that would have O(N^2) behavior in the length of

> the targetlist, but it seems unlikely that they'd come to dominate

> runtime at a mere 1000 columns.

>

> regards, tom lane


I feel a little silly not having provided a test case from the start…


A script doing a complete test is attached to this email.


It's doing a simple


CREATE TABLE test_father (col0 int,col1 int,col2 int,col3 int,col4 int,col5 int,col6 int,col7 int,col8 int,col9 int,col10 in

t,col11 int,col12 int,col13 int,col14 int,col15 int,col16 int,col17 int,col18 int,col19 int,col20 int,col21 int,col22 int,co

l23 int,…)


Followed by 600

CREATE TABLE test_child_0 (CHECK (col0=0)) INHERITS (test_father);


And a single


SELECT col0 FROM test_father WHERE col0=0;



Here are my results (from the same machine). I've done it with 600 partitions, to have big planning times. If you need a smaller one (this one takes nearly ten minutes to run) tell me.


COLS:100 PARTITIONS:600

Time : 513,764 ms (sqrt : 22.6)

COLS:200 PARTITIONS:600

Time : 906,214 ms (sqrt : 30.1)

COLS:300 PARTITIONS:600

Time : 2255,390 ms (sqrt : 47.48)

COLS:400 PARTITIONS:600

Time : 4816,820 ms (sqrt : 69.4)

COLS:500 PARTITIONS:600

Time : 5736,602 ms (sqrt : 75.73)

COLS:600 PARTITIONS:600

Time : 7659,617 ms (sqrt : 87.51)

COLS:700 PARTITIONS:600

Time : 9313,260 ms (sqrt : 96.5)

COLS:800 PARTITIONS:600

Time : 13700,353 ms (sqrt : 117.04)

COLS:900 PARTITIONS:600

Time : 13914,765 ms (sqrt : 117.95)

COLS:1000 PARTITIONS:600

Time : 20335,750 ms (sqrt : 142.6)

COLS:1100 PARTITIONS:600

Time : 21048,958 ms (sqrt : 145.08)

COLS:1200 PARTITIONS:600

Time : 27619,559 ms (sqrt : 166.18)

COLS:1300 PARTITIONS:600

Time : 31357,353 ms (sqrt : 177.08)

COLS:1400 PARTITIONS:600

Time : 34435,711 ms (sqrt : 185.57)

COLS:1500 PARTITIONS:600

Time : 38954,676 ms (sqrt : 197.37)



As for my previous results, these ones are on a machine doing a bit of other work, so some values may be a bit offset, and it's only one measure each time anyway.


The CSV file I sent from the first email is obtained running the exact same commands, but playing on both columns and partitions, and averaged over 3 measures.


Regards.

Вложения

Re: Query on view radically slower than query on underlying table

От
Tom Lane
Дата:
Craig James <craig_james@emolecules.com> writes:
> My guess (and it's just a wild guess) is that the "left join" is
> forcing a sequence scan or something.

No, that's forcing the other join to be done in toto because it can't
reorder the left join and regular join.

            regards, tom lane

Re: Query on view radically slower than query on underlying table

От
Craig James
Дата:
> Craig James<craig_james@emolecules.com>  writes:
>> Here is the "bad" query, which is run on the view:
>>
>> em=> explain analyze
>> select version.version_id, version.isosmiles
>> from hitlist_rows_reset_140
>> left join version on (hitlist_rows_reset_140.objectid = version.version_id)
>> where hitlist_rows_reset_140.sortorder >= 1
>> and hitlist_rows_reset_140.sortorder <= 10
>> order by hitlist_rows_reset_140.sortorder;
>> QUERY PLAN
>>
>>
-----------------------------------------------------------------------------------------------------------------------------
>> ------------------------------
>> Nested Loop Left Join (cost=23687.51..215315.74 rows=1 width=54) (actual time=2682.662..63680.076 rows=10 loops=1)
>> Join Filter: (hitlist_rows_reset_140.objectid = v.version_id)
>> -> Index Scan using hitlist_rows_reset_140_pkey on hitlist_rows_reset_140 (cost=0.00..8.36 rows=1 width=8) (actual
time=
>> 0.015..0.049 rows=10 loops=1)
>> Index Cond: ((sortorder >= 1) AND (sortorder <= 10))
>> -> Hash Join (cost=23687.51..204666.54 rows=851267 width=50) (actual time=31.829..6263.403 rows=851267 loops=10)
>> Hash Cond: (v.version_id = mv.version_id)
>> -> Seq Scan on version v (cost=0.00..116146.68 rows=5631968 width=50) (actual time=0.006..859.758 rows=5632191 loo
>> ps=10)
>> -> Hash (cost=13046.67..13046.67 rows=851267 width=4) (actual time=317.488..317.488 rows=851267 loops=1)
>> -> Seq Scan on my_version mv (cost=0.00..13046.67 rows=851267 width=4) (actual time=2.888..115.166 rows=8512
>> 67 loops=1)
>> Total runtime: 63680.162 ms

On 2/28/11 10:57 AM, Tom Lane wrote:
>> My guess (and it's just a wild guess) is that the "left join" is
>> forcing a sequence scan or something.
>
> No, that's forcing the other join to be done in toto because it can't
> reorder the left join and regular join.

I change the "left join" to just "join" and confirmed that it's fast -- the join on the view drops from 65 seconds back
downto a few milliseconds. 

Then I thought maybe putting a foreign-key constraint on table "my_version" would solve the problem:

   alter table my_version add constraint fk_my_view foreign key(version_id)
   references registry.version(version_id) on delete cascade;

That way, the planner would know that every key in table "my_version" has to also be in table "version", thus avoiding
thatpart about "forcing the other join to be done in toto".  But the foreign-key constraint makes no difference, it
stilldoes the full join and takes 65 seconds. 

So here's how I see it:

   - The select can only return ten rows from table "hitlist_rows_reset_140"
   - The left join could be applied to table "my_version"
   - The results of that could be joined to table "version"

It seems to me that with the foreign-key constraint, it shouldn't have to examine more than ten rows from any of the
threetables.  Or have I overlooked something? 

Thanks,
Craig

Re: Query on view radically slower than query on underlying table

От
Tom Lane
Дата:
Craig James <craig_james@emolecules.com> writes:
> Then I thought maybe putting a foreign-key constraint on table "my_version" would solve the problem:

>    alter table my_version add constraint fk_my_view foreign key(version_id)
>    references registry.version(version_id) on delete cascade;

> That way, the planner would know that every key in table "my_version" has to also be in table "version", thus
avoidingthat part about "forcing the other join to be done in toto".  But the foreign-key constraint makes no
difference,it still does the full join and takes 65 seconds. 

That's just wishful thinking I'm afraid.  The planner doesn't currently
make any deductions whatsoever from the presence of a foreign key
constraint; and even if it did, I'm not sure that this would help it
decide that a join order constraint could safely be dropped.

            regards, tom lane

Re: inheritance: planning time vs children number vs column number

От
Tom Lane
Дата:
Marc Cousin <cousinmarc@gmail.com> writes:
> The Monday 28 February 2011 16:35:37, Tom Lane wrote :
>> Could we see a concrete example demonstrating that?  I agree with Heikki
>> that it's not obvious what you are testing that would have such behavior.
>> I can think of places that would have O(N^2) behavior in the length of
>> the targetlist, but it seems unlikely that they'd come to dominate
>> runtime at a mere 1000 columns.

> I feel a little silly not having provided a test case from the start…

> A script doing a complete test is attached to this email.

I did some oprofile analysis of this test case.  It's spending
essentially all its time in SearchCatCache, on failed searches of
pg_statistic.  The cache accumulates negative entries for each probed
column, and then the searches take time proportional to the number of
entries, so indeed there is an O(N^2) behavior --- but N is the number
of columns times number of tables in your test case, not just the number
of columns.

The cache is a hash table, so ideally the search time would be more or
less constant as the table grows, but to make that happen we'd need to
reallocate with more buckets as the table grows, and catcache.c doesn't
do that yet.  We've seen a few cases that make that look worth doing,
but they tend to be pretty extreme, like this one.

It's worth pointing out that the only reason this effect is dominating
the runtime is that you don't have any statistics for these toy test
tables.  If you did, the cycles spent using those entries would dwarf
the lookup costs, I think.  So it's hard to get excited about doing
anything based on this test case --- it's likely the bottleneck would be
somewhere else entirely if you'd bothered to load up some data.

            regards, tom lane

Re: inheritance: planning time vs children number vs column number

От
Marc Cousin
Дата:
Le mardi 01 mars 2011 07:20:19, Tom Lane a écrit :
> Marc Cousin <cousinmarc@gmail.com> writes:
> > The Monday 28 February 2011 16:35:37, Tom Lane wrote :
> >> Could we see a concrete example demonstrating that?  I agree with Heikki
> >> that it's not obvious what you are testing that would have such
> >> behavior. I can think of places that would have O(N^2) behavior in the
> >> length of the targetlist, but it seems unlikely that they'd come to
> >> dominate runtime at a mere 1000 columns.
> >
> > I feel a little silly not having provided a test case from the startق€�
> >
> > A script doing a complete test is attached to this email.
>
> I did some oprofile analysis of this test case.  It's spending
> essentially all its time in SearchCatCache, on failed searches of
> pg_statistic.  The cache accumulates negative entries for each probed
> column, and then the searches take time proportional to the number of
> entries, so indeed there is an O(N^2) behavior --- but N is the number
> of columns times number of tables in your test case, not just the number
> of columns.
>
> The cache is a hash table, so ideally the search time would be more or
> less constant as the table grows, but to make that happen we'd need to
> reallocate with more buckets as the table grows, and catcache.c doesn't
> do that yet.  We've seen a few cases that make that look worth doing,
> but they tend to be pretty extreme, like this one.
>
> It's worth pointing out that the only reason this effect is dominating
> the runtime is that you don't have any statistics for these toy test
> tables.  If you did, the cycles spent using those entries would dwarf
> the lookup costs, I think.  So it's hard to get excited about doing
> anything based on this test case --- it's likely the bottleneck would be
> somewhere else entirely if you'd bothered to load up some data.
>
>             regards, tom lane

Yes, for the same test case, with a bit of data in every partition and
statistics up to date, planning time goes from 20 seconds to 125ms for the 600
children/1000 columns case. Which is of course more than acceptable.

Now I've got to check it's the same problem on the real environment. I think
it has quite a few empty partitions, so no statistics for them…

Thanks a lot.

Marc

Re: inheritance: planning time vs children number vs column number

От
Tom Lane
Дата:
Marc Cousin <cousinmarc@gmail.com> writes:
> Le mardi 01 mars 2011 07:20:19, Tom Lane a écrit :
>> It's worth pointing out that the only reason this effect is dominating
>> the runtime is that you don't have any statistics for these toy test
>> tables.  If you did, the cycles spent using those entries would dwarf
>> the lookup costs, I think.  So it's hard to get excited about doing
>> anything based on this test case --- it's likely the bottleneck would be
>> somewhere else entirely if you'd bothered to load up some data.

> Yes, for the same test case, with a bit of data in every partition and
> statistics up to date, planning time goes from 20 seconds to 125ms for the 600
> children/1000 columns case. Which is of course more than acceptable.

[ scratches head ... ]  Actually, I was expecting the runtime to go up
not down.  Maybe there's something else strange going on here.

            regards, tom lane

Re: inheritance: planning time vs children number vs column number

От
Marc Cousin
Дата:

The Tuesday 01 March 2011 16:33:51, Tom Lane wrote :

> Marc Cousin <cousinmarc@gmail.com> writes:

> > Le mardi 01 mars 2011 07:20:19, Tom Lane a écrit :

> >> It's worth pointing out that the only reason this effect is dominating

> >> the runtime is that you don't have any statistics for these toy test

> >> tables. If you did, the cycles spent using those entries would dwarf

> >> the lookup costs, I think. So it's hard to get excited about doing

> >> anything based on this test case --- it's likely the bottleneck would be

> >> somewhere else entirely if you'd bothered to load up some data.

> >

> > Yes, for the same test case, with a bit of data in every partition and

> > statistics up to date, planning time goes from 20 seconds to 125ms for

> > the 600 children/1000 columns case. Which is of course more than

> > acceptable.

>

> [ scratches head ... ] Actually, I was expecting the runtime to go up

> not down. Maybe there's something else strange going on here.

>

> regards, tom lane


Then, what can I do to help ?

Re: inheritance: planning time vs children number vs column number

От
Tom Lane
Дата:
I wrote:
> Marc Cousin <cousinmarc@gmail.com> writes:
>> Yes, for the same test case, with a bit of data in every partition and
>> statistics up to date, planning time goes from 20 seconds to 125ms for the 600
>> children/1000 columns case. Which is of course more than acceptable.

> [ scratches head ... ]  Actually, I was expecting the runtime to go up
> not down.  Maybe there's something else strange going on here.

Oh, doh: the failing pg_statistic lookups are all coming from the part
of estimate_rel_size() where it tries to induce a reasonable tuple width
estimate for an empty table (see get_rel_data_width).  Probably not a
case we need to get really tense about.  Of course, you could also argue
that this code is stupid because it's very unlikely that there will be
any pg_statistic entries either.  Maybe we should just have it go
directly to the datatype-based estimate instead of making a boatload
of useless pg_statistic probes.

            regards, tom lane

Re: Query on view radically slower than query on underlying table

От
Robert Haas
Дата:
On Mon, Feb 28, 2011 at 2:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Craig James <craig_james@emolecules.com> writes:
>> Then I thought maybe putting a foreign-key constraint on table "my_version" would solve the problem:
>
>>    alter table my_version add constraint fk_my_view foreign key(version_id)
>>    references registry.version(version_id) on delete cascade;
>
>> That way, the planner would know that every key in table "my_version" has to also be in table "version", thus
avoidingthat part about "forcing the other join to be done in toto".  But the foreign-key constraint makes no
difference,it still does the full join and takes 65 seconds. 
>
> That's just wishful thinking I'm afraid.  The planner doesn't currently
> make any deductions whatsoever from the presence of a foreign key
> constraint; and even if it did, I'm not sure that this would help it
> decide that a join order constraint could safely be dropped.

I've previously mused on -hackers about teaching the planner the
concept of an inner-or-left-join; that is, a join that's guaranteed to
return the same results whichever way we choose to implement it.
Proving that an inner join is actually inner-or-left would allow the
join removal logic to consider removing it altogether, and would allow
reordering in cases that aren't otherwise known to be safe.  Proving
that a left join is actually inner-or-left doesn't help with join
removal, but it might allow the join to be reordered.  Maybe
"non-row-reducing-join" is better terminology than
"inner-or-left-join", but in any case I have a suspicion that inner
join removal will end up being implemented as a special case of
noticing that an inner join falls into this class.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company