Re: mdnblocks() sabotages error checking in _mdfd_getseg()

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: mdnblocks() sabotages error checking in _mdfd_getseg()
Дата
Msg-id CA+TgmoZY0U+XCMzs+iBw8PnrNi7E4+uD4Fnxbr9YmFk+P-KFYA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: mdnblocks() sabotages error checking in _mdfd_getseg()  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: mdnblocks() sabotages error checking in _mdfd_getseg()  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Thu, Dec 10, 2015 at 1:22 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 10 December 2015 at 16:47, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Thu, Dec 10, 2015 at 11:36 AM, Andres Freund <andres@anarazel.de>
>> wrote:
>> >> In fact, having no way to get the relation length other than scanning
>> >> 1000 files doesn't seem like an especially good choice even if we used
>> >> a better data structure.  Putting a header page in the heap would make
>> >> getting the length of a relation O(1) instead of O(segments), and for
>> >> a bonus, we'd be able to reliably detect it if a relation file
>> >> disappeared out from under us.  That's a difficult project and
>> >> definitely not my top priority, but this code is old and crufty all
>> >> the same.)
>> >
>> > The md layer doesn't really know whether it's dealing with an index, or
>> > with an index, or ... So handling this via a metapage doesn't seem
>> > particularly straightforward.
>>
>> It's not straightforward, but I don't think that's the reason.  What
>> we could do is look at the call sites that use
>> RelationGetNumberOfBlocks() and change some of them to get the
>> information some other way instead.  I believe get_relation_info() and
>> initscan() are the primary culprits, accounting for some enormous
>> percentage of the system calls we do on a read-only pgbench workload.
>> Those functions certainly know enough to consult a metapage if we had
>> such a thing.
>
> It looks pretty straightforward to me...
>
> The number of relations with >1 file is likely to be fairly small, so we can
> just have an in-memory array to record that. 8 bytes per relation >1 GB
> isn't going to take much shmem, but we can extend using dynshmem as needed.
> We can seq scan the array at relcache build time and invalidate relcache
> when we extend. WAL log any extension to a new segment and write the table
> to disk at checkpoint.

Invaliding the relcache when we extend would be extremely expensive,
but we could probably come up with some variant of this that would
work.  I'm not very excited about this design, though; I think
actually putting a metapage on each relation would be better.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: mdnblocks() sabotages error checking in _mdfd_getseg()
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: [patch] Proposal for \rotate in psql