Re: Hash-based MCV matching for large IN-lists
| От | David Geier |
|---|---|
| Тема | Re: Hash-based MCV matching for large IN-lists |
| Дата | |
| Msg-id | ff7d491d-f084-4935-8417-c0b7285cdd89@gmail.com обсуждение исходный текст |
| Ответ на | Re: Hash-based MCV matching for large IN-lists (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>) |
| Список | pgsql-hackers |
Hi!
> Attached v4 patch with above fixes.
Good progress!
I did another pass over the code, focusing on structure:
- MCVHasContext and MCVInHashContext are identical. MCVHashEntry and
MCVInHashEntry only differ by the count member. I would, as said before,
merge them and simply not use the count member for the join case.
- hash_mcv_in() and mcvs_in_equal() are identical to hash_mcv() and
mcvs_equal(). Let's remove the new functions and use the existing ones
instead, in the spirit of the previous point.
- The threshold constants are also identical. I would merge them into a
single, e.g. MCV_HASH_THRESHOLD, in the spirit of the previous two points.
- MCVHashTable_hash will then be interchangable with
MCVInHashTable_hash. So let's remove MCVInHashTable_hash, in the spirit
of the previous three points.
- Use palloc_array() instead of palloc() when allocating arrays.
- We can avoid allocating the all-true elem_const array by passing NULL
for elem_const to scalararray_mcv_hash_match(), and considering a NULL
pointer to mean "all elements are constant".
- The following comment got copy&pasted from eqsel_internal() twice. It
reads a little strange now because we're not punting here by immediately
returning like in eqsel_internal() but instead fallback to the original
code path. Maybe say instead "... falling back to default code path to
compute default selectivity" or something like that.
/*
* If expression is not variable = something or something =
* variable, then punt and return a default estimate.
*/
- The call to fmgr_info(opfuncoid, &eqproc) is currently under have_mcvs
but can be moved into the next if.
- elem_nulls and elem_const does have to be 0-initialized via palloc0().
All elements are set in the subsequent for-loop. I believe elem_values
also doesn't have to be 0-initialized via palloc0().
- Have you checked there there's test coverage for the special cases
(nvalues_non_mcv > 0, nvalues_nonconst > 0, IN contains NULL,
isEnequality==true, etc.)? If not let's add tests for these.
I'll do a 2nd iteration, focusing on correctness, once these comments
are addressed and I've got the SQL from you so that I can test the
corner cases manually.
--
David Geier
В списке pgsql-hackers по дате отправления: