Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

Поиск
Список
Период
Сортировка
От Andrey Borodin
Тема Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)
Дата
Msg-id 88B43B73-FB99-4969-8138-4D2C8263283F@yandex-team.ru
обсуждение исходный текст
Ответ на Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers


9 апр. 2018 г., в 23:04, Heikki Linnakangas <hlinnaka@iki.fi> написал(а):

On 09/04/18 18:21, Andrey Borodin wrote:
9 апр. 2018 г., в 19:50, Teodor Sigaev <teodor@sigaev.ru>
написал(а):
3. Why do we *not* lock the entry leaf page, if there is no
match? We still need a lock to remember that we probed for that
value and there was no match, so that we conflict with a tuple
that might be inserted later. At least #3 is a bug. The attached
patch adds an isolation test that demonstrates it. #1 and #2 are
weird, and cause unnecessary locking, so I think we should fix
those too, even if they don't lead to incorrect results.
I can't find a hole here. Agree.
Please correct me if I'm wrong. Let's say we have posting trees for
word A and word B. We are looking for a document that contains both. We will read through all posting tree of A, but only through some
segments of B. If we will not find anything in B, we have to lock
only segments where we actually were looking, not all the posting
tree of B.

True, that works. It was not clear from the code or comments that that was intended. I'm not sure if that's worthwhile, compared to locking just the posting tree root block.
From the text search POV this is kind of bulky granularity: if you have frequent words like "the", "a", "in", conflicts are inevitable.
I'm not sure we have means for picking optimal granularity: should it be ranges of postings, ranges of pages of posting trees, entries, pages of entries or whole index.
Technically, [time for locking] should be less than [time of transaction retry]*[probability of conflict]. Holding this constraint we should minimize [time for locking] + [time of transaction retry]*[probability of conflict].
I suspect that [time for locking] is some orders of magnitude less than time of transaction. So, efforts should be skewed towards smaller granularity to reduce [probability of conflict].
But all this is not real math and have no strength of science.

I'll let Teodor decide..
+1. I belive this is very close to optimal solution :)

Best regards, Andrey Borodin.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Следующее
От: Craig Ringer
Дата:
Сообщение: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS