Re: Merge Join chooses very slow index scan

Поиск
Список
Период
Сортировка
От Jake Magner
Тема Re: Merge Join chooses very slow index scan
Дата
Msg-id 1426783249710-5842603.post@n5.nabble.com
обсуждение исходный текст
Ответ на Re: Merge Join chooses very slow index scan  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Merge Join chooses very slow index scan  (Jake Magner <jakemagner90@gmail.com>)
Список pgsql-performance
Thanks Tom, that sounds like what is happening. Some additional
comments/questions inline.


Tom Lane-2 wrote
> I think what must be happening is that the planner notes the maximum
> possible value of v.id and supposes that the mergejoin will stop far short
> of completion because v.id spans just a small part of the range of
> usagestats.tid.  Which it does, when you have only the nonselective filter
> condition on usagestats.  However, the executor cannot stop until it's
> fetched a usagestats row that has a tid value larger than the last v.id
> value; otherwise it can't be sure it's emitted all the required join rows.

Ok, that makes a lot of sense. It is scanning the tid index though, so once
it gets past the last value in v.id isn't it guaranteed that there can be no
more required join rows? Even if it sees tid = 5000 and type = 'aircraft'
then it can know there are no more tids less than 5000. It must be that it
waits to do this check until it gets a row that matches the filter, maybe
this is an optimization in most cases? Seems like the cost of the check
would be small enough compared to the cost of looking up the next row to do
it every time.


Tom Lane-2 wrote
> I'm guessing that the "type = 'vehicle'" condition eliminates all such
> rows, or at least enough of them that a very large part of the usagestats
> table has to be scanned to find the first can't-possibly-match row.

Yes, you are exactly right.


Tom Lane-2 wrote
> I'm not sure there's anything much we can do to improve this situation
> in Postgres.  It seems like a sufficiently bizarre corner case that it
> wouldn't be appropriate to spend planner cycles checking for it, and
> I'm not sure how we'd check for it even if we were willing to spend those
> cycles.  You might consider altering the query, or inserting some kind of
> dummy sentinel row in the data, or changing the schema (is it really
> sensible to keep vehicle usagestats in the same table as other
> usagestats?).  A brute-force fix would be "enable_mergejoin = off", but
> that would prevent selecting this plan type even when it actually is
> a significant win.

I agree it may make sense to change the schema, although there are some good
reasons to have it this way (I obfuscated the table names). If we
partitioned the table on "type" then would the planner be able to stop after
finishing the 'vehicle' type partition?


Tom Lane-2 wrote
> Actually, an easy fix might be to create a 2-column index on
> usagestats(type, tid).  I think the planner should be able to
> use that to produce sorted output for the mergejoin, and you'd
> get the best of both worlds, because the indexscan will stop
> immediately when it's exhausted the rows with type = 'vehicle'.

Actually there is an index on (type, tid) and it doesn't help. I just tried
adding an index on (tid, type) and it partially fixed the issue, judging by
the page hits, it looks like it is still scanning all the rows of the
compound index, but no longer needs to go to the actual table. This takes
600ms instead of the 11,500ms of the original query, but still much more
than the 60ms when you change the type='vehicle' condition to freq > -1. So
it isn't a perfect solution. We could also switch to enum values for the
type field which may reduce the (tid, type) index size enough to make the
performance adequate, but it would be best if we can just get it to quit the
scan early, so the performance doesn't degrade if the table grows
significantly.

Best,
Jake



--
View this message in context:
http://postgresql.nabble.com/Merge-Join-chooses-very-slow-index-scan-tp5842523p5842603.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Merge Join chooses very slow index scan
Следующее
От: "Carson, Leonard"
Дата:
Сообщение: Re: views much slower in 9.3 than 8.4