Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

Поиск
Список
Период
Сортировка
От Dilip Kumar
Тема Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages
Дата
Msg-id CAFiTN-vrRu7EcsKEC9b5urvWXDRY_tWy0jFrgK11bFAiyirrqg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, Jul 26, 2017 at 10:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jun 8, 2017 at 10:44 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> On Thu, May 18, 2017 at 8:07 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>

Thanks for the feedback.  I haven't yet worked on optimizer side
feedback but I have done some work for improving the executor part,
please find my comment inline.

> There are two main considerations here.  One, it's better to lossify
> pages with many bits set than with few bits set, because the
> additional work we thereby incur is less.  Two, it's better to lossify
> pages that are in the same chunk as other pages which we are also
> going to lossify, because that's how we actually save memory.  The
> current code will cheerfully lossify a chunk that contains no other
> pages, or will lossify one page from a chunk but not the others,
> saving no memory but hurting performance.
>
> Maybe the simplest change here would be to make it so that when we
> decide to lossify a chunk, we lossify all pages in the chunk, but only
> if there's more than one.  In other words, given a proposed page P to
> lossify, probe the hash table for all keys in the same chunk as P and
> build up a words[] array for the proposed chunk.  If that words[]
> array will end up with only 1 bit set, then forget the whole thing;
> otherwise, delete all of the entries for the individual pages and
> insert the new chunk instead.

I have attempted a very simple POC with this approach just to see how
many lossy pages we can save if we lossify all the pages falling in
the same chunk first, before moving to the next page.  I have taken
some data on TPCH scale 20 with different work_mem (only calculated
lossy pages did not test performance).  I did not see a significant
reduction in lossy pages.  (POC patch attached with the mail in case
someone is interested to test or more experiment).

64MB

TPCH Query    Head Lossy_pages       Patch Lossy_pages
lossy_page_reduce
Q6                           534984                    529745
                                  5239
Q15                         535072 529785 5287
Q20                       1586933 1584731 2202

40MB
TPCH Query   Head Lossy_pages          Patch Lossy_pages       lossy_page_reduce
Q6                          995223                      993490
                              1733
Q14                        337894                       332890
                             5004
Q15                        995417                       993511
                             1906
Q20                      1654016                     1652982
                           1034

20MB
TPCH Query    Head Lossy_pages            Patch Lossy_pages
lossy_page_reduce
Q4                        166551                         165280
                               1271
Q5                        330679                         330028
                                 651
Q6                       1160339                       1159937
                               402
Q14                       666897                        666032
                               865
Q15                     1160518                       1160017
                              501
Q20                     1982981                       1982828
                             153


> As far as the patch itself is concerned, tbm_calculate_exact_pages()
> is computing the number of "exact pages" which will fit into the
> TIDBitmap, but I think that instead of tbm_calculate_exact_pages() you
> should have something like tbm_calculate_entries() that just returns
> nbuckets, and then let the caller work out how many entries are going
> to be exact and how many are going to be inexact.  An advantage of
> that approach is that the new function could be used by tbm_create()
> instead of duplicating the logic.

Ok
  I'm not sure that the way you are
> doing the rest of the calculation is wrong, but I've got no confidence
> that it's right, either: the way WORDS_PER_CHUNK is used looks pretty
> random, and the comments aren't enough for me to figure it out.

+ * Eq1: nbuckets = exact_bucket + lossy_buckets
+ * Eq2: total_pages = exact_bucket + (lossy_buckets * WORDS_PER_CHUNK)

I have derived my formulae based on these two equations.  But, it
assumes that all the lossy_buckets(chunk) will hold a WORDS_PER_CHUNK
number of pages, which seems very optimistic.

>
> It's unclear what assumptions we should make while trying to estimate
> the number of lossy pages.  The effectiveness of lossification depends
> on the average number of pages that get folded into a chunk; but how
> many will that be?  If we made some of the improvements proposed
> above, it would probably be higher than it is now, but either way it's
> not clear what number to use.  One possible assumption is that the
> pages that get lossified are exactly average, so:
>
> double entries_saved_per_lossy_page = Max(heap_pages_fetched /
> tbm_max_entries - 1, 1.0);
> lossy_pages = (heap_pages_fetched - tbm_max_entries) /
> pages_saved_per_lossy_page;
> exact_pages = heap_pages_fetched - lossy_pages;

Seems ok until "entries_saved_per_lossy_page is 2" but if this become
more than 2 then this calculation seems problamatic. Consider below
examples:

heap_pages_fetched = 100 and tbm_max_entries = 25
then with the above formulae
lossy_pages = (100-25)/3 = 25, exact_pages=75

heap_pages_fetched = 100 and tbm_max_entries = 10
lossy_pages = (100-10)/9 = 10 and exact_pages=90

So by reducing the tbm_max_entries I am getting more exact pages,
which seems wrong.  It seems to me that if
entries_saved_per_lossy_page is > 2 then if we calculate the
exact_pages the same way we are calculating lossy_pages then it will
be more accurate.
i.e.
exact_pages = (heap_pages_fetched - tbm_max_entries)
/pages_saved_per_lossy_page;

>
> If the TBM fits into work_mem, heap_pages_fetched / tbm_max_entries is
> the average number of entries per chunk, so one less than that value
> is the number of pages we expect to save by lossifying an average
> chunk and all of its entries.  This might even be too optimistic given
> the way tbm_lossify() works today, since there's currently no
> guarantee we'd save anything at all; we might lossify a bunch of extra
> stuff just for fun.
>
> Another possible assumption is that the pages that get lossified are
> particularly good candidates for lossification -- they are, say, twice
> as dense as the typical page.  To reflect such an assumption, you'd
> just make entries_saved_per_lossy_page bigger e.g. by inserting "2 *"
> at the front of the formula.
>
> There could be other ways of computing this, too -- you've got one! --
> but I'm not sure that WORDS_PER_CHUNK should be involved at all.  The
> number of entries saved per lossy page will only be WORDS_PER_CHUNK -
> 1 in the really fortunate case where not only does the algorithm
> always pick the chunk with the most pages as the next one to lossify,
> but also that chunk always has the maximum number of possible pages in
> it.  That isn't likely on real data distributions.
>
> Curious to hear more of your (or anyone's) thoughts on this.  This is
> a tricky problem and the performance gains you've gotten seem to show
> this area is clearly worth some effort.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] recovery_target_time = 'now' is not an error but stillimpractical setting
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] recovery_target_time = 'now' is not an error but still impractical setting