Обсуждение: Problem with slow query with WHERE conditions with OR clause on primary keys

Поиск
Список
Период
Сортировка

Problem with slow query with WHERE conditions with OR clause on primary keys

От
Krzysztof Olszewski
Дата:
Hi,

my sql is very simple, 
returns one row,
where conditions are assigned to primary keys 


select g.gd_index, gd.full_name 
from gd g join gd_data gd on (g.id_gd = gd.id_gd)
where gd.id_gd_data = 1111 OR g.id_gd = 1111;


but generates "crazy" plan with Merge Join on big amount of rows (both tables contains 500000 rows)
because Index scans ignore conditions, conditions are processed after index sacans on Merge Join

Merge Join  (cost=0.00..46399.80 rows=2 width=115) (actual time=3.881..644.409 rows=1 loops=1)
  Merge Cond: (g.id_gd = gd.id_gd)
  Join Filter: ((gd.id_gd_data = 1111) OR (g.id_gd = 1111))
  ->  Index Scan using pk_gd on gd g  (cost=0.00..14117.79 rows=500001 width=40) (actual time=0.019..146.521 rows=500001 loops=1)
  ->  Index Scan using fki_gd on gd_data gd  (cost=0.00..22282.04 rows=500001 width=99) (actual time=0.016..157.384 rows=500001 loops=1)
Total runtime: 644.460 ms


model is very simple


CREATE TABLE gd (
  id_gd bigint NOT NULL,
  gd_index character varying(60) NOT NULL,
  notes text,
  notes_exists integer NOT NULL DEFAULT 0,
  CONSTRAINT pk_gd PRIMARY KEY (id_gd )
)


CREATE TABLE gd_data (
  id_gd_data bigint NOT NULL,
  id_gd bigint NOT NULL,
  short_name character varying(120) NOT NULL,
  full_name character varying(512) NOT NULL,
  notes text,
  notes_exists integer NOT NULL DEFAULT 0,
  CONSTRAINT pk_gd_data PRIMARY KEY (id_gd_data ),
  CONSTRAINT fk_gd FOREIGN KEY (id_gd)
      REFERENCES gd (id_gd) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

CREATE INDEX fki_gd
  ON gd_data
  USING btree
  (id_gd );



my configuration from (select * from pg_settings):

"server_version";"9.1.10"
"block_size";"8192"
"cpu_index_tuple_cost";"0.005"
"cpu_operator_cost";"0.0025"
"cpu_tuple_cost";"0.01"
"cursor_tuple_fraction";"0.1"
"default_statistics_target";"1000"
"enable_bitmapscan";"on"
"enable_hashagg";"on"
"enable_hashjoin";"on"
"enable_indexscan";"on"
"enable_material";"on"
"enable_mergejoin";"on"
"enable_nestloop";"on"
"enable_seqscan";"on"
"enable_sort";"on"
"enable_tidscan";"on"
"maintenance_work_mem";"262144"
"max_connections";"10"
"max_files_per_process";"1000"
"max_locks_per_transaction";"64"
"max_pred_locks_per_transaction";"64"
"max_prepared_transactions";"10"
"random_page_cost";"1.5"
"seq_page_cost";"1"
"shared_buffers";"65536"
"temp_buffers";"1024"
"work_mem";"131072"





Thank you for your help.
 

Kris Olszewski


Re: Problem with slow query with WHERE conditions with OR clause on primary keys

От
David Johnston
Дата:
Krzysztof Olszewski wrote
> Hi,
>
> my sql is very simple,
> returns one row,
> where conditions are assigned to primary keys
>
>
> */select g.gd_index, gd.full_name/**/
> /**/from gd g join gd_data gd on (g.id_gd = gd.id_gd)/**/
> /**/where gd.id_gd_data = 1111 OR g.id_gd = 1111;/*
>
>
> but generates "crazy" plan with Merge Join on big amount of rows (both
> tables contains 500000 rows)
> because Index scans ignore conditions, conditions are processed after
> index sacans on Merge Join
>
> */Merge Join  (cost=0.00..46399.80 rows=2 width=115) (actual
> time=3.881..644.409 rows=1 loops=1)/**/
> /**/   Merge Cond: (g.id_gd = gd.id_gd)/**/
> /**/   Join Filter: ((gd.id_gd_data = 1111) OR (g.id_gd = 1111))/**/
> /**/   ->  Index Scan using pk_gd on gd g  (cost=0.00..14117.79
> rows=500001 width=40) (actual time=0.019..146.521 rows=500001 loops=1)/**/
> /**/   ->  Index Scan using fki_gd on gd_data gd  (cost=0.00..22282.04
> rows=500001 width=99) (actual time=0.016..157.384 rows=500001 loops=1)/**/
> /**/Total runtime: 644.460 ms/*
>
>
> model is very simple
>
>
> /CREATE TABLE gd (//
> //   id_gd bigint NOT NULL,//
> //   gd_index character varying(60) NOT NULL,//
> //   notes text,//
> //   notes_exists integer NOT NULL DEFAULT 0,//
> //   CONSTRAINT pk_gd PRIMARY KEY (id_gd )//
> //)//
> //
> //
> //CREATE TABLE gd_data (//
> //   id_gd_data bigint NOT NULL,//
> //   id_gd bigint NOT NULL,//
> //   short_name character varying(120) NOT NULL,//
> //   full_name character varying(512) NOT NULL,//
> //   notes text,//
> //   notes_exists integer NOT NULL DEFAULT 0,//
> //   CONSTRAINT pk_gd_data PRIMARY KEY (id_gd_data ),//
> //   CONSTRAINT fk_gd FOREIGN KEY (id_gd)//
> //       REFERENCES gd (id_gd) MATCH SIMPLE//
> //       ON UPDATE NO ACTION ON DELETE NO ACTION//
> //)//
> //
> //CREATE INDEX fki_gd//
> //   ON gd_data//
> //   USING btree//
> //   (id_gd );//
> /
>
>
> my configuration from (select * from pg_settings):
>
> "server_version";"9.1.10"
>
> Thank you for your help.
>
>
> Kris Olszewski

It cannot do any better since it cannot pre-filter either table using the
where condition without risking removing rows that would meet the other
table's condition post-join.

The query you are executing makes no sense to me: I don't understand why you
would ever filter on gd.id_gd_data given the model you are showing.

I believe your understanding of your model - or the model itself - is flawed
but as you have only provided code it is impossible to pinpoint where
exactly the disconnect resides.  You can either fix the model or the query -
the later by implementing sub-selects with where clauses manually - which
then encodes an assumption about your data that the current query cannot
make.

Your model implies that a single gd record can have multiple gd_data records
associated with it.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Problem-with-slow-query-with-WHERE-conditions-with-OR-clause-on-primary-keys-tp5782803p5782822.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: Problem with slow query with WHERE conditions with OR clause on primary keys

От
David Johnston
Дата:
kolszew73@gmail.com wrote
> Thanx for your answer
>
> My example is trivial because i want to show strange (for me) postgres
> behavior with dealing with primary keys (extreme example), in real
> situation user put search condition e.g.  "Panas" and this generates query
> ...
> where gd.other_code like 'Panas%' OR g.code like 'Panas%'
> ..
>
> both columns has very good indexes and selectivity for "like 'Panas%'" ...
>
> I have experience from Oracle with this type of queries, and Oracle have
> no problem with it,
> executes select on index on other_code from gd and join g
> in next step executes select on index on code from g and join gd
> and this two results are connected in last step (like union)
> very fast on minimal cost
>
> and in my opinion read whole huge tables only for 10 rows in result where
> conditions are very good  ... is strange

I suppose the equivalent query that you'd want would be:

SELECT ... FROM gd JOIN gd_data USING (id_gd)
WHERE id_gd IN (

SELECT id_gd FROM gd WHERE ...
UNION ALL -distinct not required in this situation
SELECT id_gd FROM gd_data WHERE ...

) --ignoring NULL implications

It does make sense conceptually...

David J.






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Problem-with-slow-query-with-WHERE-conditions-with-OR-clause-on-primary-keys-tp5782803p5783942.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: Problem with slow query with WHERE conditions with OR clause on primary keys

От
Andreas Karlsson
Дата:
On 12/11/2013 12:30 AM, Krzysztof Olszewski wrote:
> select g.gd_index, gd.full_name
> from gd g join gd_data gd on (g.id_gd = gd.id_gd)
> where gd.id_gd_data = 1111 OR g.id_gd = 1111;

Have you tried writing the query to filter on gd.id_gd rather than
g.id_gd? I am not sure if the query planner will realize that it can
replace g.id_gd with gd.id_gd in the where clause.

select g.gd_index, gd.full_name
from gd g join gd_data gd on (g.id_gd = gd.id_gd)
where gd.id_gd_data = 1111 OR gd.id_gd = 1111;

--
Andreas Karlsson


Re: Problem with slow query with WHERE conditions with OR clause on primary keys

От
"kolszew73@gmail.com"
Дата:
Thanx for your answer

My example is trivial because i want to show strange (for me) postgres
behavior with dealing with primary keys (extreme example), in real situation
user put search condition e.g.  "Panas" and this generates query
...
where gd.other_code like 'Panas%' OR g.code like 'Panas%'
..

both columns has very good indexes and selectivity for "like 'Panas%'" ...

I have experience from Oracle with this type of queries, and Oracle have no
problem with it,
executes select on index on other_code from gd and join g
in next step executes select on index on code from g and join gd
and this two results are connected in last step (like union)
very fast on minimal cost

and in my opinion read whole huge tables only for 10 rows in result where
conditions are very good  ... is strange









--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Problem-with-slow-query-with-WHERE-conditions-with-OR-clause-on-primary-keys-tp5782803p5783927.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


Re: Re: Problem with slow query with WHERE conditions with OR clause on primary keys

От
Pavel Stehule
Дата:



2013/12/18 kolszew73@gmail.com <kolszew73@gmail.com>
Thanx for your answer

My example is trivial because i want to show strange (for me) postgres
behavior with dealing with primary keys (extreme example), in real situation
user put search condition e.g.  "Panas" and this generates query
...
where gd.other_code like 'Panas%' OR g.code like 'Panas%'
..

both columns has very good indexes and selectivity for "like 'Panas%'" ...

I have experience from Oracle with this type of queries, and Oracle have no
problem with it,
executes select on index on other_code from gd and join g
in next step executes select on index on code from g and join gd
and this two results are connected in last step (like union)
very fast on minimal cost

and in my opinion read whole huge tables only for 10 rows in result where
conditions are very good  ... is strange


Maybe index is not in good form

try to build index with varchar_pattern_ops flag

http://postgres.cz/wiki/PostgreSQL_SQL_Tricks_I#LIKE_optimalization

CREATE INDEX like_index ON people(surname varchar_pattern_ops);

Regards

Pavel Stehule
 








--
View this message in context: http://postgresql.1045698.n5.nabble.com/Problem-with-slow-query-with-WHERE-conditions-with-OR-clause-on-primary-keys-tp5782803p5783927.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance