Re: Differences in identical queries

Поиск

Список

Период

Сортировка

От	Richard Huxton
Тема	Re: Differences in identical queries
Дата	2 марта 2007 г. 11:40:00
Msg-id	45E844FE.6050802@archonet.com обсуждение исходный текст
Ответ на	Re: Differences in identical queries (Rob Schall <rschall@callone.net>)
Ответы	Re: Differences in identical queries
Список	pgsql-general

Дерево обсуждения

Rob Schall wrote:
> When I reanalyzed the anitmp table with just the 4 entries (2 with
> istf=true and 2 with istf=false), both queries then ran the same way/time.
> So it would appear, if you want to do a join or a subselect (IN), then
> the number of items if will be comparing it to must be less than a
> certain number.

I'm still not sure how you're concluding that.

 > In my case, the full amount of false's that fail is
> actually 40. So if you have a table of 2 million items (in current) and
> want to join it to a table of 40 items, then it must do the longer
> hashed join version and not the nested loop.
>
> Below are the results with the anitmp table with 42 items. 40 set as
> false, and 2 set as true. Is there a way to rewrite my query to have it
> run as quick as the others?

> calldetail=> EXPLAIN ANALYZE SELECT current.* FROM current JOIN anitmp
> ON current.orignum=anitmp.ani AND istf=false;
>                                                             QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=1.63..1031833.26 rows=256563 width=108) (actual
> time=1889.469..155380.749 rows=653 loops=1)

OK - so here's the root of the problem. The planner thinks it'll get
back 256,563 rows but actually gets 653. If you actually got more than
200,000 rows back then a seq-scan on current might well make sense.

>    Hash Cond: ("outer".orignum = "inner".ani)
>    ->  Seq Scan on current  (cost=0.00..920527.00 rows=10873900
> width=108) (actual time=670.402..136192.991 rows=10681150 loops=1)
>    ->  Hash  (cost=1.52..1.52 rows=41 width=8) (actual time=0.187..0.187
> rows=0 loops=1)
>          ->  Seq Scan on anitmp  (cost=0.00..1.52 rows=41 width=8)
> (actual time=0.014..0.108 rows=40 loops=1)
>                Filter: (istf = false)

Hmm - what sort of distribution of values do you have in "orignum" -
telephone numbers, so presumably they're quite distinct.

I'd be tempted to up the statistics on that column, reanalyse both
tables and see what happens.
   ALTER TABLE current ALTER COLUMN orignum SET STATISTICS=100;
You can set values up to 1000, start at 100 and step up. Not checked the
syntax on that last statement btw.

You can see the before and after effects by looking at:
SELECT * FROM pg_stats WHERE tablename='current' AND attname='orignum';

--
   Richard Huxton
   Archonet Ltd

В списке pgsql-general по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Differences in identical queries