Re: Differences in identical queries

Поиск
Список
Период
Сортировка
От Richard Huxton
Тема Re: Differences in identical queries
Дата
Msg-id 45E844FE.6050802@archonet.com
обсуждение исходный текст
Ответ на Re: Differences in identical queries  (Rob Schall <rschall@callone.net>)
Ответы Re: Differences in identical queries
Список pgsql-general
Rob Schall wrote:
> When I reanalyzed the anitmp table with just the 4 entries (2 with
> istf=true and 2 with istf=false), both queries then ran the same way/time.
> So it would appear, if you want to do a join or a subselect (IN), then
> the number of items if will be comparing it to must be less than a
> certain number.

I'm still not sure how you're concluding that.

 > In my case, the full amount of false's that fail is
> actually 40. So if you have a table of 2 million items (in current) and
> want to join it to a table of 40 items, then it must do the longer
> hashed join version and not the nested loop.
>
> Below are the results with the anitmp table with 42 items. 40 set as
> false, and 2 set as true. Is there a way to rewrite my query to have it
> run as quick as the others?

> calldetail=> EXPLAIN ANALYZE SELECT current.* FROM current JOIN anitmp
> ON current.orignum=anitmp.ani AND istf=false;
>                                                             QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=1.63..1031833.26 rows=256563 width=108) (actual
> time=1889.469..155380.749 rows=653 loops=1)

OK - so here's the root of the problem. The planner thinks it'll get
back 256,563 rows but actually gets 653. If you actually got more than
200,000 rows back then a seq-scan on current might well make sense.

>    Hash Cond: ("outer".orignum = "inner".ani)
>    ->  Seq Scan on current  (cost=0.00..920527.00 rows=10873900
> width=108) (actual time=670.402..136192.991 rows=10681150 loops=1)
>    ->  Hash  (cost=1.52..1.52 rows=41 width=8) (actual time=0.187..0.187
> rows=0 loops=1)
>          ->  Seq Scan on anitmp  (cost=0.00..1.52 rows=41 width=8)
> (actual time=0.014..0.108 rows=40 loops=1)
>                Filter: (istf = false)

Hmm - what sort of distribution of values do you have in "orignum" -
telephone numbers, so presumably they're quite distinct.

I'd be tempted to up the statistics on that column, reanalyse both
tables and see what happens.
   ALTER TABLE current ALTER COLUMN orignum SET STATISTICS=100;
You can set values up to 1000, start at 100 and step up. Not checked the
syntax on that last statement btw.

You can see the before and after effects by looking at:
SELECT * FROM pg_stats WHERE tablename='current' AND attname='orignum';

--
   Richard Huxton
   Archonet Ltd

В списке pgsql-general по дате отправления:

Предыдущее
От: Rob Schall
Дата:
Сообщение: Re: Differences in identical queries
Следующее
От: Vincenzo Romano
Дата:
Сообщение: psql client quits after 1st command