Re: Bloom index cost model seems to be wrong

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Bloom index cost model seems to be wrong
Дата
Msg-id CAMkU=1yQRri4-uiFDE=HoBPDDR4PWXB9coS5_kN=7f+vWVL9GQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Bloom index cost model seems to be wrong  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers
I've moved this to the hackers list, and added Teodor and Alexander of the bloom extension, as I would like to hear their opinions on the costing.

On Tue, Feb 12, 2019 at 4:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

It's possible that a good cost model for bloom is so far outside
genericcostestimate's ideas that trying to use it is not a good
idea anyway.


I'm attaching a crude patch based over your refactored header files.

I just copied genericcostestimate into bloom, and made a few changes.

I think one change should be conceptually uncontroversial, which is to change the IO cost from random_page_cost to seq_page_cost.  Bloom indexes are always scanned in their entirety.

The other change is not to charge any cpu_operator_cost per tuple.  Bloom doesn't go through the ADT machinery, it just does very fast bit-twiddling.  I could assign a fraction of a cpu_operator_cost, multiplied by bloomLength rather than list_length(indexQuals), to this bit-twiddling. But I think that that fraction would need to be quite close to zero, so I just removed it.

When everything is in memory, Bloom still gets way overcharged for CPU usage even without the cpu_operator_cost.  This patch doesn't do anything about that.  I don't know if the default value of cpu_index_tuple_cost is way too high, or if Bloom just shouldn't be charging the full value of it in the first place given the way it accesses index tuples.  For comparison, when using a Btree as an equality filter on a non-leading column, most of the time goes to index_getattr.  Should the time spent there be loaded on cpu_operator_cost or onto cpu_index_tuple_cost?  It is not strictly spent in the operator, but fetching the parameter to be used in an operator is more closely a per-operator problem than a per-tuple problem.

Most of genericcostestimate still applies.  For example, ScalarArrayOpExpr handling, and Mackert-Lohman formula.  It is a shame that all of that has to be copied.  

There are some other parts of genericcostestimate that probably don't apply (OrderBy, for example) but I left them untouched for now to make it easier to reconcile changes to the real genericcostestimate with the copy.

For ScalarArrayOpExpr, it would be nice to scan the index once and add to the bitmap all branches of the =ANY in one index scan, but I don't see the machinery to do that.  It would be a matter for another patch anyway, other than the way it would change the cost estimate.

Cheers,

Jeff
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrey Borodin
Дата:
Сообщение: Re: amcheck verification for GiST
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: proposal: variadic argument support for least, greatest function