Re: same plan, add 1 condition, 1900x slower

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: same plan, add 1 condition, 1900x slower
Дата
Msg-id 19095.1131724394@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: same plan, add 1 condition, 1900x slower  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: same plan, add 1 condition, 1900x slower  (Mitchell Skinner <mitch@egcrc.net>)
Список pgsql-performance
I wrote:
> No, the visibility check happens first.  The timing does seem consistent
> with the idea that the comparison is being done at all 15M rows, but
> your other EXPLAIN shows that only 2K rows are actually retrieved, which
> presumably is because the merge doesn't need the rest.  (Merge will stop
> scanning either input when it runs out of rows on the other side; so
> this sort of plan is very fast if the range of keys on one side is
> smaller than the range on the other.  The numbers from the no-comparison
> EXPLAIN ANALYZE indicate that that is happening for your case.)  So the
> comparison should happen for at most 2K rows too.

After re-reading your explanation of what you're doing with the data,
I thought of a possible explanation.  Is the "source" value exactly
correlated with the external_id_map primary key?  What could be
happening is this:

1. We can see from the EXPLAIN ANALYZE for the no-comparison case that
the merge join stops after fetching only 2175 rows from external_id_map.
This implies that the subject table joins to the first couple thousand
entries in external_id_map and nothing beyond that.  In particular, the
merge join must have observed that the join key in the 2175'th row (in
index order) of external_id_map was larger than the last (largest) join
key in subject.

2. Let's suppose that source = 'SCH' is false for the 2175'th row of
external_id_map and every one after that.  Then what will happen is that
the index scan will vainly seek through the entire external_id_map,
looking for a row that its filter allows it to return, not knowing that
the merge join has no use for any of those rows.

If this is the story, and you need to make this sort of query fast,
then what you need to do is incorporate the "source" value into the
external_id_map index key somehow.  Then the index scan would be able to
realize that there is no possibility of finding another row with source
= 'SCH'.  The simplest way is just to make a 2-column index, but I
wonder whether the source isn't actually redundant with the
external_id_map primary key already ...

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: same plan, add 1 condition, 1900x slower
Следующее
От: Mitchell Skinner
Дата:
Сообщение: Re: same plan, add 1 condition, 1900x slower