Обсуждение: UNION causes horrible plan on JOIN

Поиск
Список
Период
Сортировка

UNION causes horrible plan on JOIN

От
Craig James
Дата:
On Postgres 9.6 (config below), I have a case I don't understand: three tables that can be separately queried in milliseconds, but when put together into one view using UNION, take 150 seconds to query. Here's the rough idea (actual details below):

create view thesaurus as
            (select id, name from A)
  union (select id, name from B)
  union (select id, name from C);

create table h(i integer);
insert into h values(12345);
select * from thesaurus join h on (thesaurus.id = h.id);

On the other hand, if you do this, it's a millisecond plan:

select * from thesaurus where id in (12345);

Notice that it's effectively the same query since h above contains just this one value.

Here are the actual details. The view being queried:

create view thesaurus2 as
select
  rt.thesaurus_id,
  rt.version_id,
  rt.normalized,
  rt.identifier,
  rt.typecode
  from local_sample s
  join thesaurus_master rt using (sample_id)
union
select c.id as thesaurus_id,
  c.id as version_id,
  c.cas_number as normalized,
  c.cas_number as identifier,
  3 as typecode
  from cas_number c
  join sample s on c.id = s.version_id
union
select m.id as thesaurus_id,
  m.id as version_id,
  lower(m.mfcd) as normalized,
  m.mfcd as identifier,
  4 as typecode
  from mfcd m
  join sample s on m.id = s.version_id;

The bad sort (147 seconds to execute). Note that the "hitlist" table contains exactly one row.
 explain analyze select c.version_id
   from thesaurus2 c
   join hitlist_rows_103710241 h on (c.thesaurus_id = h.objectid);

If I instead just query directly for that value, the answer is almost instant (1.2 msec):

explain analyze select c.version_id
from thesaurus2 c
where c.version_id in (1324511991);

Now if I take any one of the three tables in the UNION view, the query is really fast on each one. For example:

select distinct c.version_id
from (
select distinct c.id as thesaurus_id,
  c.id as version_id,
  c.cas_number as normalized,
  c.cas_number as identifier,
  3 as typecode
  from cas_number c
  join sample s on c.id = s.version_id
) c
join hitlist_rows_103710241 h on (c.thesaurus_id = h.objectid);


The other two subqueries are similarly fast.

This is Postgres9.6 running on Ubuntu 16.04, 64GB memory 16 CPUs. Non-default config values:

max_connections = 2000
shared_buffers = 12073MB
work_mem = 256MB
maintenance_work_mem = 512MB
synchronous_commit = off
effective_cache_size = 32GB
wal_level = logical
wal_keep_segments = 1000
max_wal_senders = 10
hot_standby = on
archive_mode = on
archive_command = '/bin/true'

Thanks!
Craig


--
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
3430 Carmel Mountain Road, Suite 250
San Diego, CA 92121
---------------------------------

Re: UNION causes horrible plan on JOIN

От
Justin Pryzby
Дата:
On Mon, Oct 28, 2019 at 03:40:58PM -0700, Craig James wrote:
> On Postgres 9.6 (config below), I have a case I don't understand: three
> tables that can be separately queried in milliseconds, but when put
> together into one view using UNION, take 150 seconds to query. Here's the
> rough idea (actual details below):

Do you want UNION ALL ?

UNION without ALL distintifies the output.
https://www.postgresql.org/docs/current/sql-select.html#SQL-UNION

Justin



Re: UNION causes horrible plan on JOIN

От
Craig James
Дата:
On Mon, Oct 28, 2019 at 3:45 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Oct 28, 2019 at 03:40:58PM -0700, Craig James wrote:
> On Postgres 9.6 (config below), I have a case I don't understand: three
> tables that can be separately queried in milliseconds, but when put
> together into one view using UNION, take 150 seconds to query. Here's the
> rough idea (actual details below):

Do you want UNION ALL ?

UNION without ALL distintifies the output.
https://www.postgresql.org/docs/current/sql-select.html#SQL-UNION

Interesting idea, thanks. But it makes no difference. Tried it and got the same bad performance.

Craig
 


Justin


--
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
3430 Carmel Mountain Road, Suite 250
San Diego, CA 92121
---------------------------------

Re: UNION causes horrible plan on JOIN

От
Craig James
Дата:
On Mon, Oct 28, 2019 at 4:31 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Oct 28, 2019 at 04:30:24PM -0700, Craig James wrote:
> On Mon, Oct 28, 2019 at 3:45 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> > On Mon, Oct 28, 2019 at 03:40:58PM -0700, Craig James wrote:
> > > On Postgres 9.6 (config below), I have a case I don't understand: three
> > > tables that can be separately queried in milliseconds, but when put
> > > together into one view using UNION, take 150 seconds to query. Here's the
> > > rough idea (actual details below):
> >
> > Do you want UNION ALL ?
> >
> > UNION without ALL distintifies the output.
> > https://www.postgresql.org/docs/current/sql-select.html#SQL-UNION
>
>
> Interesting idea, thanks. But it makes no difference. Tried it and got the
> same bad performance.

Could you mail the list the plan with union ALL ?

Here it is. It is indeed different, but takes 104 seconds instead of 140 seconds.


--
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
3430 Carmel Mountain Road, Suite 250
San Diego, CA 92121
---------------------------------

Re: UNION causes horrible plan on JOIN

От
Kyotaro Horiguchi
Дата:
At Mon, 28 Oct 2019 16:30:24 -0700, Craig James <cjames@emolecules.com> wrote in 
> On Mon, Oct 28, 2019 at 3:45 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> 
> > On Mon, Oct 28, 2019 at 03:40:58PM -0700, Craig James wrote:
> > > On Postgres 9.6 (config below), I have a case I don't understand: three
> > > tables that can be separately queried in milliseconds, but when put
> > > together into one view using UNION, take 150 seconds to query. Here's the
> > > rough idea (actual details below):
> >
> > Do you want UNION ALL ?
> >
> > UNION without ALL distintifies the output.
> > https://www.postgresql.org/docs/current/sql-select.html#SQL-UNION
> 
> 
> Interesting idea, thanks. But it makes no difference. Tried it and got the
> same bad performance.

The join clauses in the view also prevent the query from getting
faster plans. So if you somehow could move the join clauses out of the
UNION leafs in the view in addtion to using UNION ALL, you would get
better performance.

Or if hitlist_rows is known to highly narrow the result from the
element tables, using a function instead of the view might work.

create or replace function the_view(int)
returns table(thesaurus_id int, version_id int, normalized int,
              identifier int, typecode int) as $$
select
  rt.thesaurus_id,
  rt.version_id,
  rt.normalized,
  rt.identifier,
  rt.typecode
  from local_sample s
  join thesaurus_master rt using (sample_id)
  where rt.thesaurus_id = $1
union
...
$$ language sql;

explain analyze select c.version_id
from hitlist_rows_103710241 h,
lateral the_view(h.objectid) as c;

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center