Обсуждение: Query too slow with "not in" condition

Поиск
Список
Период
Сортировка

Query too slow with "not in" condition

От
"சிவகுமார் மா"
Дата:
Dear Friends,

I have loaded the backup from a live database in a test system. Both run 8.3.5 versions. The plan for a query varies in these systems.

Test System
A. PostgreSQL 8.3.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20061115 (prerelease) (SUSE Linux)

B. explain select * from stock_transaction_detail_106 where transaction_id not in (select transaction_id from transaction_value);
Seq Scan on stock_transaction_detail_106  (cost=1829.78..2867.74 rows=16478 width=128)
   Filter: (NOT (hashed subplan))
   SubPlan
     ->  Seq Scan on transaction_value  (cost=0.00..1598.02 rows=92702 width=4)

The query takes about 300 ms to run.

Production System

1. PostgreSQL 8.3.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.2.1 (SUSE Linux)

2. explain select * from stock_transaction_detail_106 where transaction_id not in (select transaction_id from transaction_value);
  Seq Scan on stock_transaction_detail_106  (cost=2153.95..25245478.39 rows=17064 width=122)
   Filter: (NOT (subplan))
   SubPlan
     ->  Materialize  (cost=2153.95..3401.01 rows=92905 width=4)
           ->  Seq Scan on transaction_value  (cost=0.00..1743.05 rows=92905 width=4)

Here the query did not return any results after 1hour.

In both the computers same query with in condition runs fast (520 ms and 290 ms respectively)

Please help me to resolve this issue. (One configuration difference  between these machines are pg_hba.conf file. In production machine it is password enabled. In test machine it is trust mode.)

Thanks and regards,

Ma Sivakumar

மா சிவகுமார்
எல்லோரும் எல்லாமும் பெற வேண்டும்
http://masivakumar.blogspot.com

Re: Query too slow with "not in" condition

От
"David Rowley"
Дата:
> I have loaded the backup from a live database in a test system. Both run
> 8.3.5 versions. The plan for a query varies in these systems.

> Test System
> A. PostgreSQL 8.3.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2
> 20061115 (prerelease) (SUSE Linux)

> B. explain select * from stock_transaction_detail_106 where transaction_id
> not in (select transaction_id from transaction_value);
> Seq Scan on stock_transaction_detail_106  (cost=1829.78..2867.74
rows=16478 width=128)
>   Filter: (NOT (hashed subplan))
>   SubPlan
>     ->  Seq Scan on transaction_value  (cost=0.00..1598.02 rows=92702
width=4)

> The query takes about 300 ms to run.

> Production System

> 1. PostgreSQL 8.3.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.2.1
(SUSE Linux)

> 2. explain select * from stock_transaction_detail_106 where transaction_id
not in (select transaction_id from transaction_value);
>  Seq Scan on stock_transaction_detail_106  (cost=2153.95..25245478.39
rows=17064 width=122)
>   Filter: (NOT (subplan))
>   SubPlan
>     ->  Materialize  (cost=2153.95..3401.01 rows=92905 width=4)
>           ->  Seq Scan on transaction_value  (cost=0.00..1743.05
rows=92905 width=4)

> Here the query did not return any results after 1hour.

> In both the computers same query with in condition runs fast (520 ms and
290 ms respectively)

> Please help me to resolve this issue. (One configuration difference
> between these machines are pg_hba.conf file. In production machine it is
> password enabled. In test machine it is trust mode.)


You might find this page interesting:
http://www.depesz.com/index.php/2008/08/13/nulls-vs-not-in/

I assume workmem, effective_cache_size and random_page_cost are all the same
in the 2 postgresql.conf?

Do you get the same after you ANALYZE the database?

If you read the page above you might realise NOT IN is not really what you
want. Maybe NOT EXISTS or a LEFT OUTER JOIN ... WHERE transaction_id IS NULL
?

David.



Re: Query too slow with "not in" condition

От
Tom Lane
Дата:
"David Rowley" <dgrowley@gmail.com> writes:
> I assume workmem, effective_cache_size and random_page_cost are all the same
> in the 2 postgresql.conf?

Indeed, work_mem is probably the problem.  The critical difference
between the two plans seems to be that the first one is using a
"hashed subplan" and the second one isn't.  Assuming the same datatypes
in both databases, the only reason not to use a hashed subplan is if
the hashtable is estimated not to fit in work_mem.

            regards, tom lane

Re: Query too slow with "not in" condition

От
"சிவகுமார் மா"
Дата:
On Sun, Nov 30, 2008 at 3:28 PM, David Rowley <dgrowley@gmail.com> wrote:
You might find this page interesting:
http://www.depesz.com/index.php/2008/08/13/nulls-vs-not-in/


Thanks David. Another issue I was faced with was exactly what the link you provided discusses.

Best regards,

Ma Sivakumar

மா சிவகுமார்
எல்லோரும் எல்லாமும் பெற வேண்டும்
http://masivakumar.blogspot.com