Re: Merge join vs merge semi join against primary key

Поиск

Список

Период

Сортировка

От	Sean Rhea
Тема	Re: Merge join vs merge semi join against primary key
Дата	13 октября 2015 г. 21:08:27
Msg-id	CACZYdDjP95eM6xtu8R85_mXcKO0VdHchEQvQfL5tTdkru4Yc5w@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Merge join vs merge semi join against primary key (David Rowley <david.rowley@2ndquadrant.com>)
Ответы	Re: Merge join vs merge semi join against primary key
Список	pgsql-general

Дерево обсуждения

All,

No, the customers table is not 100% the same. This is a live production system, so the data is (unfortunately) changing under us a bit here. That said, there are still some strange things going on. I just reran everything. The query plan time hasn't changed, but as Jeremy, Igor, and David all pointed out, there's something funky going on with the apparent size of the customers table. These queries were all run within 5 minutes of each other:

production=> explain analyze SELECT ac.* FROM balances ac JOIN customers o ON (o.id= ac.customer_id AND o.group_id = 45);

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------

Merge Join (cost=2475.89..20223.08 rows=7 width=80) (actual time=157.437..243670.853 rows=7318 loops=1)

Merge Cond: (ac.customer_id = o.id) -> Index Scan using balances_customer_id_index on balances ac (cost=0.00..727.42 rows=16876 width=80) (actual time=0.489..30.573 rows=16876 loops=1)

-> Index Scan using customers_pkey on customers o (cost=0.00..65080.01 rows=184 width=8) (actual time=127.266..243582.767 rows=7672 loops=1)

Filter: (group_id = 45)

Rows Removed by Filter: 212699113

Total runtime: 243674.288 ms

(7 rows)

production=> select count(*) from customers where group_id = 45;

count

-------

430

(1 row)

production=> select count(*) from customers;

count

--------

476645

(1 row)

Is it possible for explain analyze to somehow produce bad stats? I can't figure out where that 212699113 number is coming from at all.

Sean

On Mon, Oct 12, 2015 at 5:43 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:

On 10 October 2015 at 08:52, Sean Rhea <sean.c.rhea@gmail.com> wrote:

1. When I join two tables with "WHERE id IN (...)" versus with an explicit
join, and the join column for the inner table is a primary key, I would expect
the same behavior in both cases, but the optimizer is choosing a merge join in
one case and a merge semi join in the other. There's at most one customer
with a given id. Why not do a semi join?

Unfortunately the 9.2 planner does not make any checks to verify that customers.id is unique to perform a semi join. There is a pending patch in the 9.6 cycle to add this optimisation.

production=> select count(*) from customers;
count
--------
473733
(1 row)

...

-> Index Scan using customers_pkey on customers o (cost=0.00..63836.61 rows=836 width=8) (actual time=159.840..201915.765 rows=7672 loops=1)
Filter: (group_id = 45)
Rows Removed by Filter: 212699113

Rows Removed by Filter: 212699113 seems to indicate that your 473733 row count for "customers" is incorrect.

If you're doing lots of filtering on group_id, then perhaps you should think about adding an index on customers (group_id,id)

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-general по дате отправления:

Предыдущее

От: "David G. Johnston"
Дата: 13 октября 2015 г., 20:58:11
Сообщение: Re: ID column naming convention

Следующее

От: Tom Lane
Дата: 13 октября 2015 г., 21:21:00
Сообщение: Re: Merge join vs merge semi join against primary key

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Merge join vs merge semi join against primary key

Предыдущее

Следующее