Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)

Поиск
Список
Период
Сортировка
От Yura Sokolov
Тема Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)
Дата
Msg-id edd13f14-2b28-cd06-5d70-4f4be3586f97@gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
16.03.2018 04:23, Tomas Vondra пишет:
> 
> 
> On 03/10/2018 03:11 AM, Yura Sokolov wrote:
>> 08.03.2018 03:42, Tomas Vondra пишет:
>>> On 03/06/2018 06:23 AM, Yura Sokolov wrote:
>>>> 05.03.2018 18:00, Tom Lane пишет:
>>>>> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>>>>>> Snapshots are static (we don't really add new XIDs into existing ones,
>>>>>> right?), so why don't we simply sort the XIDs and then use bsearch to
>>>>>> lookup values? That should fix the linear search, without need for any
>>>>>> local hash table.
>>>>>
>>>>> +1 for looking into that, since it would avoid adding any complication
>>>>> to snapshot copying.  In principle, with enough XIDs in the snap, an
>>>>> O(1) hash probe would be quicker than an O(log N) bsearch ... but I'm
>>>>> dubious that we are often in the range where that would matter.
>>>>> We do need to worry about the cost of snapshot copying, too.
>>>>
>>>> Sorting - is the first thing I've tried. Unfortunately, sorting
>>>> itself eats too much cpu. Filling hash table is much faster.
>>>>
>>>
>>> I've been interested in the sort-based approach, so I've spent a bit of
>>> time hacking on it (patch attached). It's much less invasive compared to
>>> the hash-table, but Yura is right the hashtable gives better results.
>>>
>>> I've tried to measure the benefits using the same script I shared on
>>> Tuesday, but I kept getting really strange numbers. In fact, I've been
>>> unable to even reproduce the results I shared back then. And after a bit
>>> of head-scratching I think the script is useless - it can't possibly
>>> generate snapshots with many XIDs because all the update threads sleep
>>> for exactly the same time. And first one to sleep is first one to wake
>>> up and commit, so most of the other XIDs are above xmax (and thus not
>>> included in the snapshot). I have no idea why I got the numbers :-/
>>>
>>> But with this insight, I quickly modified the script to also consume
>>> XIDs by another thread (which simply calls txid_current). With that I'm
>>> getting snapshots with as many XIDs as there are UPDATE clients (this
>>> time I actually checked that using gdb).
>>>
>>> And for a 60-second run the tps results look like this (see the attached
>>> chart, showing the same data):
>>>
>>>     writers     master      hash       sort
>>>    -----------------------------------------
>>>     16           1068       1057       1070
>>>     32           1005        995       1033
>>>     64           1064       1042       1117
>>>     128          1058       1021       1051
>>>     256           977       1004        928
>>>     512           773        935        808
>>>     768           576        815        670
>>>     1000          413        752        573
>>>
>>> The sort certainly does improve performance compared to master, but it's
>>> only about half of the hashtable improvement.
>>>
>>> I don't know how much this simple workload resembles the YCSB benchmark,
>>> but I seem to recall it's touching individual rows. In which case it's
>>> likely worse due to the pg_qsort being more expensive than building the
>>> hash table.
>>>
>>> So I agree with Yura the sort is not a viable alternative to the hash
>>> table, in this case.
>>>
>>>> Can I rely on snapshots being static? May be then there is no need
>>>> in separate raw representation and hash table. I may try to fill hash
>>>> table directly in GetSnapshotData instead of lazy filling. Though it
>>>> will increase time under lock, so it is debatable and should be
>>>> carefully measured.
>>>>
>>>
>>> Yes, I think you can rely on snapshots not being modified later. That's
>>> pretty much the definition of a snapshot.
>>>
>>> You may do that in GetSnapshotData, but you certainly can't do that in
>>> the for loop there. Not only we don't want to add more work there, but
>>> you don't know the number of XIDs in that loop. And we don't want to
>>> build hash tables for small number of XIDs.
>>>
>>> One reason against building the hash table in GetSnapshotData is that
>>> we'd build it even when the snapshot is never queried. Or when it is
>>> queried, but we only need to check xmin/xmax.
>>
>> Thank you for analyze, Tomas.
>>
>> Stephen is right about bug in snapmgr.c
>> Attached version fixes bug, and also simplifies XidInXip a bit.
>>
> 
> OK, a few more comments.
> 
> 1) The code in ExtendXipSizeForHash seems somewhat redundant with
> my_log2 (that is, we could just call the existing function).

Yes, I could call my_log2 from ExtendXipSizeForHash. But wouldn't one
more call be more expensive than loop itself?

> 2) Apparently xhlog/subxhlog fields are used for two purposes - to store
> log2 of the two XID counts, and to remember if the hash table is built.
> That's a bit confusing (at least it should be explained in a comment)
> but most importantly it seems a bit unnecessary.

Ok, I'll add comment.

> I assume it was done to save space, but I very much doubt that makes any
> difference. So we could easily keep a separate flag. I'm pretty sure
> there are holes in the SnapshotData struct, so we could squeeze it the
> flags in one of those.

There's hole just right after xhlog. But it will be a bit strange to
move fields around.

> But do we even need a separate flag? We could just as easily keep the
> log2 fields set to 0 and only set them after actually building the hash
> table.

There is a need to signal that there is space for hash. It is not always
allocated. iirc, I didn't cover the case where snapshot were restored
from file, and some other place or two.
Only if all places where snapshot is allocated are properly modified to
allocate space for hash, then flag could be omitted, and log2 itself
used as a flag.

> Or even better, why not to store the mask so that XidInXip does
> not need to compute it over and over (granted, that's uint32 instead of
> just uint8, but I don't think SnapshotData is particularly sensitive to
> this, especially considering how much larger the xid hash table is).

I don't like unnecessary sizeof struct increase. And I doubt that
computation matters. I could be mistaken though, because it is hot place.
Do you think it will significantly improve readability?

With regards,
Sokolov Yura aka funny_falcon.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: PATCH: Configurable file mode mask
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: [HACKERS] Lazy hash table for XidInMVCCSnapshot (helps Zipfian abit)