Re: Unhappy about API changes in the no-fsm-for-small-rels patch

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Дата	17 апреля 2019 г. 13:19:29
Msg-id	CAA4eK1+6adaFUZ_7SpMpYPMiqzUH6FgQKonm845TP0KxeXgtuw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Unhappy about API changes in the no-fsm-for-small-rels patch (John Naylor <john.naylor@2ndquadrant.com>)
Ответы	Re: Unhappy about API changes in the no-fsm-for-small-rels patch
Список	pgsql-hackers

Дерево обсуждения

On Wed, Apr 17, 2019 at 10:39 AM John Naylor
<john.naylor@2ndquadrant.com> wrote:
> On Wed, Apr 17, 2019 at 2:04 AM Andres Freund <andres@anarazel.de> wrote:
>
> > +/* Only create the FSM if the heap has greater than this many blocks */
> > +#define HEAP_FSM_CREATION_THRESHOLD 4
> >
> > Hm, this seems to be tying freespace.c closer to heap than I think is
> > great - think of new AMs like zheap, that also want to use it.
>
> Amit and I kept zheap in mind when working on the patch. You'd have to
> work around the metapage, but everything else should work the same.
>

I think we also need to take care of TPD pages along with meta page.
This might be less effective if we encounter TPD pages as well in
small relation which shouldn't be a common scenario, but it won't
hurt, otherwise.  Those pages are anyway temporary and will be
removed.

BTW there is one other thing which is tied to heap in FSM for which we
might want some handling.
#define MaxFSMRequestSize MaxHeapTupleSize

In general, it will be good if we find some pluggable way for both the
defines, otherwise, also, it shouldn't cause a big problem.

>
> > I'm kinda thinking that this is the wrong architecture.
> >
> > 1) Unless I miss something, this will trigger a
> >    RelationGetNumberOfBlocks(), which in turn ends up doing an lseek(),
> >    once for each page we add to the relation.
>
> That was true previously anyway if the FSM returned InvalidBlockNumber.
>
> >    That strikes me as pretty
> >    suboptimal. I think it's even worse if multiple backends do the
> >    insertion, because the RelationGetTargetBlock(relation) logic will
> >    succeed less often.
>
> Could you explain why it would succeed less often?
>
> > 2) We'll repeatedly re-encounter the first few pages, because we clear
> >    the local map after each successful RelationGetBufferForTuple().
>
> Not exactly sure what you mean? We only set the map if
> RelationGetTargetBlock() returns InvalidBlockNumber, or if it returned
> a valid block, and inserting there already failed. So, not terribly
> often, I imagine.
>
> > 3) The global variable based approach means that we cannot easily do
> >    better. Even if we had a cache that lives across
> >    RelationGetBufferForTuple() calls.
> >
> > 4) fsm_allow_writes() does a smgrexists() for the FSM in some
> >    cases. That's pretty darn expensive if it's already open.
>
> (from earlier)
> > Isn't this like really expensive? mdexists() closes the relations and
> > reopens it from scratch. Shouldn't we at the very least check
> > smgr_fsm_nblocks beforehand, so this is only done once?
>
> Hmm, I can look into that.
>
>
> I think if we want to keep something like this feature, we'd need to
> cache the relation size in a variable similar to how we cache the FSM
> size (like SMgrRelationData.smgr_fsm_nblocks)
>

makes sense. I think we should do this unless we face any problem with it.

> *and* stash the bitmap of
> pages that we think are used/free as a bitmap somewhere below the
> relcache.

I think maintaining at relcache level will be tricky when there are
insertions and deletions happening in the small relation.  We have
considered such a case during development wherein we don't want the
FSM to be created if there are insertions and deletions in small
relation.  The current mechanism addresses both this and normal case
where there are not many deletions.  Sure there is some overhead of
building the map again and rechecking each page.  The first one is a
memory operation and takes a few cycles and for the second we
optimized by checking the pages alternatively which means we won't
check more than two pages at-a-time.  This cost is paid by not
checking FSM and it could be somewhat better in some cases [1].

>  If we cleared that variable at truncations, I think we should
> be able to make that work reasonably well?

Not only that, I think it needs to be cleared whenever we create the
FSM as well which could be tricky as it can be created by the vacuum.

OTOH, if we want to extend it later for whatever reason to a relation
level cache, it shouldn't be that difficult as the implementation is
mostly contained in freespace.c (fsm* functions) and I think the
relation is accessible in most places.  We might need to rip out some
calls to clearlocalmap.

>
> On Wed, Apr 17, 2019 at 3:16 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2019-04-16 14:31:25 -0400, Tom Lane wrote:
>   <snip>
> > > and in any case it seems pretty inefficient for workloads that insert
> > > into multiple tables.
> >
> > As is, it's inefficient for insertions into a *single* relation. The
> > RelationGetTargetBlock() makes it not crazily expensive, but it's still
> > plenty expensive.
>

During development, we were also worried about the performance
regression that can happen due to this patch and we have done many
rounds of performance tests where such a cache could be accessed
pretty frequently.  In the original version, we do see a small
regression as a result of which we came up with an alternate strategy
of not checking every page.  If you want, I can share the links of
emails for performance testing.

> Performance testing didn't reveal any performance regression. If you
> have a realistic benchmark in mind that stresses this logic more
> heavily, I'd be happy to be convinced otherwise.
>

In fact, we have seen some wins.  See the performance testing done [1]
with various approaches during development.

Added this as an open item.

[1] - https://www.postgresql.org/message-id/CAD__Oui5%2BqiVxJSJqiXq2jA60QV8PKxrZA8_W%2BcCxROGAFJMWA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Masahiko Sawada
Дата: 17 апреля 2019 г., 13:17:35
Сообщение: Re: [HACKERS] Transactions involving multiple postgres foreignservers, take 2

Следующее

От: John Naylor
Дата: 17 апреля 2019 г., 13:24:56
Сообщение: Re: jsonpath

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Unhappy about API changes in the no-fsm-for-small-rels patch

Предыдущее

Следующее