Обсуждение: UNION causes horrible plan on JOIN
On Postgres 9.6 (config below), I have a case I don't understand: three tables that can be separately queried in milliseconds, but when put together into one view using UNION, take 150 seconds to query. Here's the rough idea (actual details below):
create view thesaurus as(select id, name from A)union (select id, name from B)union (select id, name from C);
create table h(i integer);insert into h values(12345);select * from thesaurus join h on (thesaurus.id = h.id);
On the other hand, if you do this, it's a millisecond plan:
select * from thesaurus where id in (12345);
Notice that it's effectively the same query since h above contains just this one value.
Here are the actual details. The view being queried:
create view thesaurus2 as
selectrt.thesaurus_id,rt.version_id,rt.normalized,rt.identifier,rt.typecodefrom local_sample sjoin thesaurus_master rt using (sample_id)unionselect c.id as thesaurus_id,c.id as version_id,c.cas_number as normalized,c.cas_number as identifier,3 as typecodefrom cas_number cjoin sample s on c.id = s.version_idunionselect m.id as thesaurus_id,m.id as version_id,lower(m.mfcd) as normalized,m.mfcd as identifier,4 as typecodefrom mfcd mjoin sample s on m.id = s.version_id;
The bad sort (147 seconds to execute). Note that the "hitlist" table contains exactly one row.
explain analyze select c.version_idfrom thesaurus2 cjoin hitlist_rows_103710241 h on (c.thesaurus_id = h.objectid);
If I instead just query directly for that value, the answer is almost instant (1.2 msec):
explain analyze select c.version_idfrom thesaurus2 cwhere c.version_id in (1324511991);
Now if I take any one of the three tables in the UNION view, the query is really fast on each one. For example:
select distinct c.version_id
from (select distinct c.id as thesaurus_id,c.id as version_id,c.cas_number as normalized,c.cas_number as identifier,3 as typecodefrom cas_number cjoin sample s on c.id = s.version_id) cjoin hitlist_rows_103710241 h on (c.thesaurus_id = h.objectid);
The other two subqueries are similarly fast.
This is Postgres9.6 running on Ubuntu 16.04, 64GB memory 16 CPUs. Non-default config values:
max_connections = 2000shared_buffers = 12073MBwork_mem = 256MBmaintenance_work_mem = 512MBsynchronous_commit = offeffective_cache_size = 32GBwal_level = logicalwal_keep_segments = 1000max_wal_senders = 10hot_standby = onarchive_mode = onarchive_command = '/bin/true'
Thanks!
Craig
---------------------------------
Craig A. James
Chief Technology OfficerCraig A. James
3430 Carmel Mountain Road, Suite 250
San Diego, CA 92121
---------------------------------On Mon, Oct 28, 2019 at 03:40:58PM -0700, Craig James wrote: > On Postgres 9.6 (config below), I have a case I don't understand: three > tables that can be separately queried in milliseconds, but when put > together into one view using UNION, take 150 seconds to query. Here's the > rough idea (actual details below): Do you want UNION ALL ? UNION without ALL distintifies the output. https://www.postgresql.org/docs/current/sql-select.html#SQL-UNION Justin
On Mon, Oct 28, 2019 at 3:45 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Oct 28, 2019 at 03:40:58PM -0700, Craig James wrote:
> On Postgres 9.6 (config below), I have a case I don't understand: three
> tables that can be separately queried in milliseconds, but when put
> together into one view using UNION, take 150 seconds to query. Here's the
> rough idea (actual details below):
Do you want UNION ALL ?
UNION without ALL distintifies the output.
https://www.postgresql.org/docs/current/sql-select.html#SQL-UNION
Interesting idea, thanks. But it makes no difference. Tried it and got the same bad performance.
Craig
Justin
---------------------------------
Craig A. James
Chief Technology OfficerCraig A. James
3430 Carmel Mountain Road, Suite 250
San Diego, CA 92121
---------------------------------On Mon, Oct 28, 2019 at 4:31 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Mon, Oct 28, 2019 at 04:30:24PM -0700, Craig James wrote:
> On Mon, Oct 28, 2019 at 3:45 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> > On Mon, Oct 28, 2019 at 03:40:58PM -0700, Craig James wrote:
> > > On Postgres 9.6 (config below), I have a case I don't understand: three
> > > tables that can be separately queried in milliseconds, but when put
> > > together into one view using UNION, take 150 seconds to query. Here's the
> > > rough idea (actual details below):
> >
> > Do you want UNION ALL ?
> >
> > UNION without ALL distintifies the output.
> > https://www.postgresql.org/docs/current/sql-select.html#SQL-UNION
>
>
> Interesting idea, thanks. But it makes no difference. Tried it and got the
> same bad performance.
Could you mail the list the plan with union ALL ?
Here it is. It is indeed different, but takes 104 seconds instead of 140 seconds.
---------------------------------
Craig A. James
Chief Technology OfficerCraig A. James
3430 Carmel Mountain Road, Suite 250
San Diego, CA 92121
---------------------------------At Mon, 28 Oct 2019 16:30:24 -0700, Craig James <cjames@emolecules.com> wrote in > On Mon, Oct 28, 2019 at 3:45 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > On Mon, Oct 28, 2019 at 03:40:58PM -0700, Craig James wrote: > > > On Postgres 9.6 (config below), I have a case I don't understand: three > > > tables that can be separately queried in milliseconds, but when put > > > together into one view using UNION, take 150 seconds to query. Here's the > > > rough idea (actual details below): > > > > Do you want UNION ALL ? > > > > UNION without ALL distintifies the output. > > https://www.postgresql.org/docs/current/sql-select.html#SQL-UNION > > > Interesting idea, thanks. But it makes no difference. Tried it and got the > same bad performance. The join clauses in the view also prevent the query from getting faster plans. So if you somehow could move the join clauses out of the UNION leafs in the view in addtion to using UNION ALL, you would get better performance. Or if hitlist_rows is known to highly narrow the result from the element tables, using a function instead of the view might work. create or replace function the_view(int) returns table(thesaurus_id int, version_id int, normalized int, identifier int, typecode int) as $$ select rt.thesaurus_id, rt.version_id, rt.normalized, rt.identifier, rt.typecode from local_sample s join thesaurus_master rt using (sample_id) where rt.thesaurus_id = $1 union ... $$ language sql; explain analyze select c.version_id from hitlist_rows_103710241 h, lateral the_view(h.objectid) as c; regards. -- Kyotaro Horiguchi NTT Open Source Software Center