Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes

Поиск
Список
Период
Сортировка
От Marti Raudsepp
Тема Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes
Дата
Msg-id CABRT9RBP92ep4vzr3MHMOoMFP4GDmkTQ3fTmFnq-82X2o12rfA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Tue, Dec 20, 2011 at 07:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> it'd likely be better if this code ignored unrecognized qual expression
> types rather than Assert'ing they're not there.

The patch replaced that Assert with an elog(ERROR)

> Hmm.  I am reminded of how utterly unreadable "diff -u" format is for
> anything longer than single-line changes :-( ...

Sorry, the new patch is in context (-C) diff format proper. I also
moved around code a bit and removed an unused variable that was left
around from the refactoring.

> but I think I don't
> like this refactoring much.  Will take a closer look tomorrow.

I was afraid you'd say that, especially for a change that should be
backpatched. But I couldn't think of alternative ways to do it that
give non-bogus estimates.

----

While writing this patch, the largest dilemma was where to account for
the multiple array scans. Given that this code is mostly a heuristic
and I lack a deep understanding of GIN indexes, it's likely that I got
this part wrong.

Currently I'm doing this:
    partialEntriesInQuals *= array_scans;
    exactEntriesInQuals   *= array_scans;
    searchEntriesInQuals  *= array_scans;

Which seems to be the right thing as far as random disk accesses are
concerned (successive scans are more likely to hit the cache) and also
works well with queries that don't touch most of the index. But this
fails spectacularly when multiple full scans are performed e.g. LIKE
ANY ('{%,%,%}'). Because index_pages_fetched() ends up removing all of
the rescan costs.

Another approach is multiplying the total cost from the number of
scans. This overestimates random accesses from rescans, but fixes the
above case:
    *indexTotalCost = (*indexStartupCost + dataPagesFetched *
spc_random_page_cost) * array_scans;

Regards,
Marti

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Replication timeout units
Следующее
От: Noah Misch
Дата:
Сообщение: Re: ALTER TABLE lock strength reduction patch is unsafe