Обсуждение: Unhappy about API changes in the no-fsm-for-small-rels patch

Поиск
Список
Период
Сортировка

Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

I'm somewhat unhappy in how much the no-fsm-for-small-rels exposed
complexity that looks like it should be purely in freespacemap.c to
callers.


 extern Size GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk);
-extern BlockNumber GetPageWithFreeSpace(Relation rel, Size spaceNeeded);
+extern BlockNumber GetPageWithFreeSpace(Relation rel, Size spaceNeeded,
+                    bool check_fsm_only);

So now freespace.c has an argument that says we should only check the
fsm. That's confusing. And it's not explained to callers what that
argument means, and when it should be set.


@@ -176,20 +269,44 @@ RecordAndGetPageWithFreeSpace(Relation rel, BlockNumber oldPage,
  * Note that if the new spaceAvail value is higher than the old value stored
  * in the FSM, the space might not become visible to searchers until the next
  * FreeSpaceMapVacuum call, which updates the upper level pages.
+ *
+ * Callers have no need for a local map.
  */
 void
-RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk, Size spaceAvail)
+RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
+                       Size spaceAvail, BlockNumber nblocks)

There's no explanation as to what that "nblocks" argument is. One
basically has to search other callers to figure it out. It's not even
clear to which fork it relates to. Nor that one can set it to
InvalidBlockNumber if one doesn't have the relation size conveniently
reachable.  But it's not exposed to RecordAndGetPageWithFreeSpace(), for
a basically unexplained reason.  There's a comment above
fsm_allow_writes() - but that's  file-local function that external
callers basically have need to know about.

I can't figure out what "Callers have no need for a local map." is
supposed to mean.


+/*
+ * Clear the local map.  We must call this when we have found a block with
+ * enough free space, when we extend the relation, or on transaction abort.
+ */
+void
+FSMClearLocalMap(void)
+{
+   if (FSM_LOCAL_MAP_EXISTS)
+   {
+       fsm_local_map.nblocks = 0;
+       memset(&fsm_local_map.map, FSM_LOCAL_NOT_AVAIL,
+              sizeof(fsm_local_map.map));
+   }
+}
+

So now there's a new function one needs to call after successfully using
the block returned by [RecordAnd]GetPageWithFreeSpace().  But it's not
referenced from those functions, so basically one has to just know that.


+/* Only create the FSM if the heap has greater than this many blocks */
+#define HEAP_FSM_CREATION_THRESHOLD 4

Hm, this seems to be tying freespace.c closer to heap than I think is
great - think of new AMs like zheap, that also want to use it.


I think this is mostly fallout about the prime issue I'm unhappy
about. There's now some global variable in freespacemap.c that code
using freespace.c has to know about and maintain.


+static void
+fsm_local_set(Relation rel, BlockNumber cur_nblocks)
+{
+   BlockNumber blkno,
+               cached_target_block;
+
+   /* The local map must not be set already. */
+   Assert(!FSM_LOCAL_MAP_EXISTS);
+
+   /*
+    * Starting at the current last block in the relation and working
+    * backwards, mark alternating blocks as available.
+    */
+   blkno = cur_nblocks - 1;

That comment explains very little about why this is done, and why it's a
good idea.

+/* Status codes for the local map. */
+
+/* Either already tried, or beyond the end of the relation */
+#define FSM_LOCAL_NOT_AVAIL 0x00
+
+/* Available to try */
+#define FSM_LOCAL_AVAIL        0x01

+/* Local map of block numbers for small heaps with no FSM. */
+typedef struct
+{
+   BlockNumber nblocks;
+   uint8       map[HEAP_FSM_CREATION_THRESHOLD];
+}          FSMLocalMap;
+

Hm, given realistic HEAP_FSM_CREATION_THRESHOLD, and the fact that we
really only need one bit per relation, it seems like map should really
be just a uint32 with one bit per page.


+static bool
+fsm_allow_writes(Relation rel, BlockNumber heapblk,
+                BlockNumber nblocks, BlockNumber *get_nblocks)

+       if (rel->rd_rel->relpages != InvalidBlockNumber &&
+           rel->rd_rel->relpages > HEAP_FSM_CREATION_THRESHOLD)
+           return true;
+       else
+           skip_get_nblocks = false;
+   }

This badly needs a comment explaining that these values can be basically
arbitrarily out of date. Explaining why it's correct to rely on them
anyway (Presumably because creating an fsm unnecessarily is ok, it just
avoid susing this optimization).


+static bool
+fsm_allow_writes(Relation rel, BlockNumber heapblk,
+                BlockNumber nblocks, BlockNumber *get_nblocks)

+   RelationOpenSmgr(rel);
+   if (smgrexists(rel->rd_smgr, FSM_FORKNUM))
+       return true;

Isn't this like really expensive? mdexists() closes the relations and
reopens it from scratch. Shouldn't we at the very least check
smgr_fsm_nblocks beforehand, so this is only done once?


I'm kinda thinking that this is the wrong architecture.

1) Unless I miss something, this will trigger a
   RelationGetNumberOfBlocks(), which in turn ends up doing an lseek(),
   once for each page we add to the relation. That strikes me as pretty
   suboptimal. I think it's even worse if multiple backends do the
   insertion, because the RelationGetTargetBlock(relation) logic will
   succeed less often.

2) We'll repeatedly re-encounter the first few pages, because we clear
   the local map after each successful RelationGetBufferForTuple().

3) The global variable based approach means that we cannot easily do
   better. Even if we had a cache that lives across
   RelationGetBufferForTuple() calls.

4) fsm_allow_writes() does a smgrexists() for the FSM in some
   cases. That's pretty darn expensive if it's already open.


I think if we want to keep something like this feature, we'd need to
cache the relation size in a variable similar to how we cache the FSM
size (like SMgrRelationData.smgr_fsm_nblocks) *and* stash the bitmap of
pages that we think are used/free as a bitmap somewhere below the
relcache.  If we cleared that variable at truncations, I think we should
be able to make that work reasonably well?

Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> I'm kinda thinking that this is the wrong architecture.

The bits of that patch that I've looked at seemed like a mess
to me too.  AFAICT, it's trying to use a single global "map"
for all relations (strike 1) without any clear tracking of
which relation the map currently describes (strike 2).
This can only work at all if an inaccurate map is very fail-soft,
which I'm not convinced it is, and in any case it seems pretty
inefficient for workloads that insert into multiple tables.

I'd have expected any such map to be per-table and be stored in
the relcache.

            regards, tom lane



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-04-16 14:31:25 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I'm kinda thinking that this is the wrong architecture.
> 
> The bits of that patch that I've looked at seemed like a mess
> to me too.  AFAICT, it's trying to use a single global "map"
> for all relations (strike 1) without any clear tracking of
> which relation the map currently describes (strike 2).

Well, strike 2 basically is not a problem right now, because the map is
cleared whenever a search for a target buffer succeeded. But that has
pretty obvious efficiency issues...


> This can only work at all if an inaccurate map is very fail-soft,
> which I'm not convinced it is

I think it better needs to be fail-soft independent of this the no-fsm
patch. Because the fsm is not WAL logged etc, it's pretty easy to get a
pretty corrupted version. And we better deal with that.


> and in any case it seems pretty inefficient for workloads that insert
> into multiple tables.

As is, it's inefficient for insertions into a *single* relation. The
RelationGetTargetBlock() makes it not crazily expensive, but it's still
plenty expensive.


> I'd have expected any such map to be per-table and be stored in
> the relcache.

Same.

Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-04-16 14:31:25 -0400, Tom Lane wrote:
>> This can only work at all if an inaccurate map is very fail-soft,
>> which I'm not convinced it is

> I think it better needs to be fail-soft independent of this the no-fsm
> patch. Because the fsm is not WAL logged etc, it's pretty easy to get a
> pretty corrupted version. And we better deal with that.

Yes, FSM has to be fail-soft from a *correctness* viewpoint; but it's
not fail-soft from a *performance* viewpoint.  It can take awhile for
us to self-heal a busted map.  And this fake map spends almost all its
time busted and in need of (expensive) corrections.  I think this may
actually be the same performance complaint you're making, in different
words.

            regards, tom lane



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Wed, Apr 17, 2019 at 2:04 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> I'm somewhat unhappy in how much the no-fsm-for-small-rels exposed
> complexity that looks like it should be purely in freespacemap.c to
> callers.
>
>
>  extern Size GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk);
> -extern BlockNumber GetPageWithFreeSpace(Relation rel, Size spaceNeeded);
> +extern BlockNumber GetPageWithFreeSpace(Relation rel, Size spaceNeeded,
> +                    bool check_fsm_only);
>
> So now freespace.c has an argument that says we should only check the
> fsm. That's confusing. And it's not explained to callers what that
> argument means, and when it should be set.

When first looking for free space, it's "false": Within
GetPageWithFreeSpace(), we call RelationGetNumberOfBlocks() if the FSM
returns invalid.

If we have to extend, after acquiring the lock to extend the relation,
we call GetPageWithFreeSpace() again to see if another backend already
extended while waiting on the lock. If there's no FSM, the thinking
is, it's not worth it to get the number of blocks again.

> @@ -176,20 +269,44 @@ RecordAndGetPageWithFreeSpace(Relation rel, BlockNumber oldPage,
>   * Note that if the new spaceAvail value is higher than the old value stored
>   * in the FSM, the space might not become visible to searchers until the next
>   * FreeSpaceMapVacuum call, which updates the upper level pages.
> + *
> + * Callers have no need for a local map.
>   */
>  void
> -RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk, Size spaceAvail)
> +RecordPageWithFreeSpace(Relation rel, BlockNumber heapBlk,
> +                       Size spaceAvail, BlockNumber nblocks)
>
> There's no explanation as to what that "nblocks" argument is. One
> basically has to search other callers to figure it out. It's not even
> clear to which fork it relates to. Nor that one can set it to
> InvalidBlockNumber if one doesn't have the relation size conveniently
> reachable.  But it's not exposed to RecordAndGetPageWithFreeSpace(), for
> a basically unexplained reason.  There's a comment above
> fsm_allow_writes() - but that's  file-local function that external
> callers basically have need to know about.

Okay.

> I can't figure out what "Callers have no need for a local map." is
> supposed to mean.

It was meant to contrast with [RecordAnd]GetPageWithFreeSpace(), but I
see how it's confusing.

> +/*
> + * Clear the local map.  We must call this when we have found a block with
> + * enough free space, when we extend the relation, or on transaction abort.
> + */
> +void
> +FSMClearLocalMap(void)
> +{
> +   if (FSM_LOCAL_MAP_EXISTS)
> +   {
> +       fsm_local_map.nblocks = 0;
> +       memset(&fsm_local_map.map, FSM_LOCAL_NOT_AVAIL,
> +              sizeof(fsm_local_map.map));
> +   }
> +}
> +
>
> So now there's a new function one needs to call after successfully using
> the block returned by [RecordAnd]GetPageWithFreeSpace().  But it's not
> referenced from those functions, so basically one has to just know that.

Right.

> +/* Only create the FSM if the heap has greater than this many blocks */
> +#define HEAP_FSM_CREATION_THRESHOLD 4
>
> Hm, this seems to be tying freespace.c closer to heap than I think is
> great - think of new AMs like zheap, that also want to use it.

Amit and I kept zheap in mind when working on the patch. You'd have to
work around the metapage, but everything else should work the same.

> I think this is mostly fallout about the prime issue I'm unhappy
> about. There's now some global variable in freespacemap.c that code
> using freespace.c has to know about and maintain.
>
>
> +static void
> +fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> +{
> +   BlockNumber blkno,
> +               cached_target_block;
> +
> +   /* The local map must not be set already. */
> +   Assert(!FSM_LOCAL_MAP_EXISTS);
> +
> +   /*
> +    * Starting at the current last block in the relation and working
> +    * backwards, mark alternating blocks as available.
> +    */
> +   blkno = cur_nblocks - 1;
>
> That comment explains very little about why this is done, and why it's a
> good idea.

Short answer: performance -- it's too expensive to try every block.
The explanation is in storage/freespace/README -- maybe that should be
referenced here?

> +/* Status codes for the local map. */
> +
> +/* Either already tried, or beyond the end of the relation */
> +#define FSM_LOCAL_NOT_AVAIL 0x00
> +
> +/* Available to try */
> +#define FSM_LOCAL_AVAIL        0x01
>
> +/* Local map of block numbers for small heaps with no FSM. */
> +typedef struct
> +{
> +   BlockNumber nblocks;
> +   uint8       map[HEAP_FSM_CREATION_THRESHOLD];
> +}          FSMLocalMap;
> +
>
> Hm, given realistic HEAP_FSM_CREATION_THRESHOLD, and the fact that we
> really only need one bit per relation, it seems like map should really
> be just a uint32 with one bit per page.

I fail to see the advantage of that.

> +static bool
> +fsm_allow_writes(Relation rel, BlockNumber heapblk,
> +                BlockNumber nblocks, BlockNumber *get_nblocks)
>
> +       if (rel->rd_rel->relpages != InvalidBlockNumber &&
> +           rel->rd_rel->relpages > HEAP_FSM_CREATION_THRESHOLD)
> +           return true;
> +       else
> +           skip_get_nblocks = false;
> +   }
>
> This badly needs a comment explaining that these values can be basically
> arbitrarily out of date. Explaining why it's correct to rely on them
> anyway (Presumably because creating an fsm unnecessarily is ok, it just
> avoid susing this optimization).

Agreed, and yes, your presumption is what I had in mind.

> I'm kinda thinking that this is the wrong architecture.
>
> 1) Unless I miss something, this will trigger a
>    RelationGetNumberOfBlocks(), which in turn ends up doing an lseek(),
>    once for each page we add to the relation.

That was true previously anyway if the FSM returned InvalidBlockNumber.

>    That strikes me as pretty
>    suboptimal. I think it's even worse if multiple backends do the
>    insertion, because the RelationGetTargetBlock(relation) logic will
>    succeed less often.

Could you explain why it would succeed less often?

> 2) We'll repeatedly re-encounter the first few pages, because we clear
>    the local map after each successful RelationGetBufferForTuple().

Not exactly sure what you mean? We only set the map if
RelationGetTargetBlock() returns InvalidBlockNumber, or if it returned
a valid block, and inserting there already failed. So, not terribly
often, I imagine.

> 3) The global variable based approach means that we cannot easily do
>    better. Even if we had a cache that lives across
>    RelationGetBufferForTuple() calls.
>
> 4) fsm_allow_writes() does a smgrexists() for the FSM in some
>    cases. That's pretty darn expensive if it's already open.

(from earlier)
> Isn't this like really expensive? mdexists() closes the relations and
> reopens it from scratch. Shouldn't we at the very least check
> smgr_fsm_nblocks beforehand, so this is only done once?

Hmm, I can look into that.


On Wed, Apr 17, 2019 at 3:16 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-04-16 14:31:25 -0400, Tom Lane wrote:
  <snip>
> > and in any case it seems pretty inefficient for workloads that insert
> > into multiple tables.
>
> As is, it's inefficient for insertions into a *single* relation. The
> RelationGetTargetBlock() makes it not crazily expensive, but it's still
> plenty expensive.

Performance testing didn't reveal any performance regression. If you
have a realistic benchmark in mind that stresses this logic more
heavily, I'd be happy to be convinced otherwise.


-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Wed, Apr 17, 2019 at 10:39 AM John Naylor
<john.naylor@2ndquadrant.com> wrote:
> On Wed, Apr 17, 2019 at 2:04 AM Andres Freund <andres@anarazel.de> wrote:
>
> > +/* Only create the FSM if the heap has greater than this many blocks */
> > +#define HEAP_FSM_CREATION_THRESHOLD 4
> >
> > Hm, this seems to be tying freespace.c closer to heap than I think is
> > great - think of new AMs like zheap, that also want to use it.
>
> Amit and I kept zheap in mind when working on the patch. You'd have to
> work around the metapage, but everything else should work the same.
>

I think we also need to take care of TPD pages along with meta page.
This might be less effective if we encounter TPD pages as well in
small relation which shouldn't be a common scenario, but it won't
hurt, otherwise.  Those pages are anyway temporary and will be
removed.

BTW there is one other thing which is tied to heap in FSM for which we
might want some handling.
#define MaxFSMRequestSize MaxHeapTupleSize

In general, it will be good if we find some pluggable way for both the
defines, otherwise, also, it shouldn't cause a big problem.

>
> > I'm kinda thinking that this is the wrong architecture.
> >
> > 1) Unless I miss something, this will trigger a
> >    RelationGetNumberOfBlocks(), which in turn ends up doing an lseek(),
> >    once for each page we add to the relation.
>
> That was true previously anyway if the FSM returned InvalidBlockNumber.
>
> >    That strikes me as pretty
> >    suboptimal. I think it's even worse if multiple backends do the
> >    insertion, because the RelationGetTargetBlock(relation) logic will
> >    succeed less often.
>
> Could you explain why it would succeed less often?
>
> > 2) We'll repeatedly re-encounter the first few pages, because we clear
> >    the local map after each successful RelationGetBufferForTuple().
>
> Not exactly sure what you mean? We only set the map if
> RelationGetTargetBlock() returns InvalidBlockNumber, or if it returned
> a valid block, and inserting there already failed. So, not terribly
> often, I imagine.
>
> > 3) The global variable based approach means that we cannot easily do
> >    better. Even if we had a cache that lives across
> >    RelationGetBufferForTuple() calls.
> >
> > 4) fsm_allow_writes() does a smgrexists() for the FSM in some
> >    cases. That's pretty darn expensive if it's already open.
>
> (from earlier)
> > Isn't this like really expensive? mdexists() closes the relations and
> > reopens it from scratch. Shouldn't we at the very least check
> > smgr_fsm_nblocks beforehand, so this is only done once?
>
> Hmm, I can look into that.
>
>
> I think if we want to keep something like this feature, we'd need to
> cache the relation size in a variable similar to how we cache the FSM
> size (like SMgrRelationData.smgr_fsm_nblocks)
>

makes sense. I think we should do this unless we face any problem with it.

> *and* stash the bitmap of
> pages that we think are used/free as a bitmap somewhere below the
> relcache.

I think maintaining at relcache level will be tricky when there are
insertions and deletions happening in the small relation.  We have
considered such a case during development wherein we don't want the
FSM to be created if there are insertions and deletions in small
relation.  The current mechanism addresses both this and normal case
where there are not many deletions.  Sure there is some overhead of
building the map again and rechecking each page.  The first one is a
memory operation and takes a few cycles and for the second we
optimized by checking the pages alternatively which means we won't
check more than two pages at-a-time.  This cost is paid by not
checking FSM and it could be somewhat better in some cases [1].


>  If we cleared that variable at truncations, I think we should
> be able to make that work reasonably well?

Not only that, I think it needs to be cleared whenever we create the
FSM as well which could be tricky as it can be created by the vacuum.

OTOH, if we want to extend it later for whatever reason to a relation
level cache, it shouldn't be that difficult as the implementation is
mostly contained in freespace.c (fsm* functions) and I think the
relation is accessible in most places.  We might need to rip out some
calls to clearlocalmap.

>
> On Wed, Apr 17, 2019 at 3:16 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2019-04-16 14:31:25 -0400, Tom Lane wrote:
>   <snip>
> > > and in any case it seems pretty inefficient for workloads that insert
> > > into multiple tables.
> >
> > As is, it's inefficient for insertions into a *single* relation. The
> > RelationGetTargetBlock() makes it not crazily expensive, but it's still
> > plenty expensive.
>

During development, we were also worried about the performance
regression that can happen due to this patch and we have done many
rounds of performance tests where such a cache could be accessed
pretty frequently.  In the original version, we do see a small
regression as a result of which we came up with an alternate strategy
of not checking every page.  If you want, I can share the links of
emails for performance testing.

> Performance testing didn't reveal any performance regression. If you
> have a realistic benchmark in mind that stresses this logic more
> heavily, I'd be happy to be convinced otherwise.
>

In fact, we have seen some wins.  See the performance testing done [1]
with various approaches during development.

Added this as an open item.

[1] - https://www.postgresql.org/message-id/CAD__Oui5%2BqiVxJSJqiXq2jA60QV8PKxrZA8_W%2BcCxROGAFJMWA%40mail.gmail.com


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-04-17 13:09:05 +0800, John Naylor wrote:
> On Wed, Apr 17, 2019 at 2:04 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > I'm somewhat unhappy in how much the no-fsm-for-small-rels exposed
> > complexity that looks like it should be purely in freespacemap.c to
> > callers.
> >
> >
> >  extern Size GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk);
> > -extern BlockNumber GetPageWithFreeSpace(Relation rel, Size spaceNeeded);
> > +extern BlockNumber GetPageWithFreeSpace(Relation rel, Size spaceNeeded,
> > +                    bool check_fsm_only);
> >
> > So now freespace.c has an argument that says we should only check the
> > fsm. That's confusing. And it's not explained to callers what that
> > argument means, and when it should be set.
> 
> When first looking for free space, it's "false": Within
> GetPageWithFreeSpace(), we call RelationGetNumberOfBlocks() if the FSM
> returns invalid.
> 
> If we have to extend, after acquiring the lock to extend the relation,
> we call GetPageWithFreeSpace() again to see if another backend already
> extended while waiting on the lock. If there's no FSM, the thinking
> is, it's not worth it to get the number of blocks again.

I can get that (after reading through the code, grepping through all
callers, etc), but it means that every callsite needs to understand
that. That's making the API more complicated than needed, especially
when we're going to grow more callers.



> > +/* Only create the FSM if the heap has greater than this many blocks */
> > +#define HEAP_FSM_CREATION_THRESHOLD 4
> >
> > Hm, this seems to be tying freespace.c closer to heap than I think is
> > great - think of new AMs like zheap, that also want to use it.
> 
> Amit and I kept zheap in mind when working on the patch. You'd have to
> work around the metapage, but everything else should work the same.

My complaint is basically that it's apparently AM specific (we don't use
the logic for e.g. indexes), and that the name suggest it's specific to
heap. And it's not controllable by the outside, which means it can't be
tuned for the specific usecase.


> > I think this is mostly fallout about the prime issue I'm unhappy
> > about. There's now some global variable in freespacemap.c that code
> > using freespace.c has to know about and maintain.
> >
> >
> > +static void
> > +fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> > +{
> > +   BlockNumber blkno,
> > +               cached_target_block;
> > +
> > +   /* The local map must not be set already. */
> > +   Assert(!FSM_LOCAL_MAP_EXISTS);
> > +
> > +   /*
> > +    * Starting at the current last block in the relation and working
> > +    * backwards, mark alternating blocks as available.
> > +    */
> > +   blkno = cur_nblocks - 1;
> >
> > That comment explains very little about why this is done, and why it's a
> > good idea.
> 
> Short answer: performance -- it's too expensive to try every block.
> The explanation is in storage/freespace/README -- maybe that should be
> referenced here?

Yes. Even just adding "for performance reasons, only try every second
block. See also the README" would be good.

But I'll note that the need to this - and potentially waste space -
counters your claim that there's no performance considerations with this
patch.


> > +/* Status codes for the local map. */
> > +
> > +/* Either already tried, or beyond the end of the relation */
> > +#define FSM_LOCAL_NOT_AVAIL 0x00
> > +
> > +/* Available to try */
> > +#define FSM_LOCAL_AVAIL        0x01
> >
> > +/* Local map of block numbers for small heaps with no FSM. */
> > +typedef struct
> > +{
> > +   BlockNumber nblocks;
> > +   uint8       map[HEAP_FSM_CREATION_THRESHOLD];
> > +}          FSMLocalMap;
> > +
> >
> > Hm, given realistic HEAP_FSM_CREATION_THRESHOLD, and the fact that we
> > really only need one bit per relation, it seems like map should really
> > be just a uint32 with one bit per page.
> 
> I fail to see the advantage of that.

It'd allow different AMs to have different numbers of dont-create-fsm
thresholds without needing additional memory (up to 32 blocks).


> > I'm kinda thinking that this is the wrong architecture.
> >
> > 1) Unless I miss something, this will trigger a
> >    RelationGetNumberOfBlocks(), which in turn ends up doing an lseek(),
> >    once for each page we add to the relation.
> 
> That was true previously anyway if the FSM returned InvalidBlockNumber.

True. That was already pretty annoying though.


> >    That strikes me as pretty
> >    suboptimal. I think it's even worse if multiple backends do the
> >    insertion, because the RelationGetTargetBlock(relation) logic will
> >    succeed less often.
> 
> Could you explain why it would succeed less often?

Two aspects: 1) If more backends access a table, there'll be a higher
chance the target page is full 2) There's more backends that don't have
a target page.


> > 2) We'll repeatedly re-encounter the first few pages, because we clear
> >    the local map after each successful RelationGetBufferForTuple().
> 
> Not exactly sure what you mean? We only set the map if
> RelationGetTargetBlock() returns InvalidBlockNumber, or if it returned
> a valid block, and inserting there already failed. So, not terribly
> often, I imagine.

It's pretty common to have small tables that are modified by a number of
backends. A typical case is tables that implement locks for external
processes and such.


> On Wed, Apr 17, 2019 at 3:16 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2019-04-16 14:31:25 -0400, Tom Lane wrote:
>   <snip>
> > > and in any case it seems pretty inefficient for workloads that insert
> > > into multiple tables.
> >
> > As is, it's inefficient for insertions into a *single* relation. The
> > RelationGetTargetBlock() makes it not crazily expensive, but it's still
> > plenty expensive.
> 
> Performance testing didn't reveal any performance regression. If you
> have a realistic benchmark in mind that stresses this logic more
> heavily, I'd be happy to be convinced otherwise.

Well, try a few hundred relations on nfs (where stat is much more
expensive). Or just pgbench a concurrent workload with a few tables with
one live row each, updated by backends (to simulate lock tables and
such).

But also, my concern here is to a significant degree architectural,
rather than already measurable performance regressions. We ought to work
towards eliminating unnecessary syscalls, not the opposite.

Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-04-17 15:49:29 +0530, Amit Kapila wrote:
> > *and* stash the bitmap of
> > pages that we think are used/free as a bitmap somewhere below the
> > relcache.
> 
> I think maintaining at relcache level will be tricky when there are
> insertions and deletions happening in the small relation.  We have
> considered such a case during development wherein we don't want the
> FSM to be created if there are insertions and deletions in small
> relation.  The current mechanism addresses both this and normal case
> where there are not many deletions.  Sure there is some overhead of
> building the map again and rechecking each page.  The first one is a
> memory operation and takes a few cycles

Yea, I think creating / resetting the map is basically free.

I'm not sure I buy the concurrency issue - ISTM it'd be perfectly
reasonable to cache the local map (in the relcache) and use it for local
FSM queries, and rebuild it from scratch once no space is found. That'd
avoid a lot of repeated checking of relation size for small, but
commonly changed relations.  Add a pre-check of smgr_fsm_nblocks (if >
0, there has to be an fsm), and there should be fewer syscalls.


> and for the second we optimized by checking the pages alternatively
> which means we won't check more than two pages at-a-time.  This cost
> is paid by not checking FSM and it could be somewhat better in some
> cases [1].

Well, it's also paid by potentially higher bloat, because the
intermediate pages aren't tested.


> >  If we cleared that variable at truncations, I think we should
> > be able to make that work reasonably well?
> 
> Not only that, I think it needs to be cleared whenever we create the
> FSM as well which could be tricky as it can be created by the vacuum.

ISTM normal invalidation logic should just take of that kind of thing.


> OTOH, if we want to extend it later for whatever reason to a relation
> level cache, it shouldn't be that difficult as the implementation is
> mostly contained in freespace.c (fsm* functions) and I think the
> relation is accessible in most places.  We might need to rip out some
> calls to clearlocalmap.

But it really isn't contained to freespace.c. That's my primary
concern. You added new parameters (undocumented ones!),
FSMClearLocalMap() needs to be called by callers and xlog, etc.


Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-04-17 15:49:29 +0530, Amit Kapila wrote:
>> OTOH, if we want to extend it later for whatever reason to a relation
>> level cache, it shouldn't be that difficult as the implementation is
>> mostly contained in freespace.c (fsm* functions) and I think the
>> relation is accessible in most places.  We might need to rip out some
>> calls to clearlocalmap.

> But it really isn't contained to freespace.c. That's my primary
> concern. You added new parameters (undocumented ones!),
> FSMClearLocalMap() needs to be called by callers and xlog, etc.

Given where we are in the release cycle, and the major architectural
concerns that have been raised about this patch, should we just
revert it and try again in v13, rather than trying to fix it under
time pressure?  It's not like there's not anything else on our
plates to fix before beta.

            regards, tom lane



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Wed, Apr 17, 2019 at 9:46 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-04-17 15:49:29 +0530, Amit Kapila wrote:
> > > *and* stash the bitmap of
> > > pages that we think are used/free as a bitmap somewhere below the
> > > relcache.
> >
> > I think maintaining at relcache level will be tricky when there are
> > insertions and deletions happening in the small relation.  We have
> > considered such a case during development wherein we don't want the
> > FSM to be created if there are insertions and deletions in small
> > relation.  The current mechanism addresses both this and normal case
> > where there are not many deletions.  Sure there is some overhead of
> > building the map again and rechecking each page.  The first one is a
> > memory operation and takes a few cycles
>
> Yea, I think creating / resetting the map is basically free.
>
> I'm not sure I buy the concurrency issue - ISTM it'd be perfectly
> reasonable to cache the local map (in the relcache) and use it for local
> FSM queries, and rebuild it from scratch once no space is found. That'd
> avoid a lot of repeated checking of relation size for small, but
> commonly changed relations.
>

Okay, so you mean to say that we need to perform additional system
call (to get a number of blocks) only when no space is found in the
existing set of pages?  I think that is a fair point, but can't we
achieve that by updating relpages in relation after a call to
RelationGetNumberofBlocks?

>  Add a pre-check of smgr_fsm_nblocks (if >
> 0, there has to be an fsm), and there should be fewer syscalls.
>

Yes, that check is a good one and I see that we already do this check
in fsm code before calling smgrexists.

>
> > and for the second we optimized by checking the pages alternatively
> > which means we won't check more than two pages at-a-time.  This cost
> > is paid by not checking FSM and it could be somewhat better in some
> > cases [1].
>
> Well, it's also paid by potentially higher bloat, because the
> intermediate pages aren't tested.
>
>
> > >  If we cleared that variable at truncations, I think we should
> > > be able to make that work reasonably well?
> >
> > Not only that, I think it needs to be cleared whenever we create the
> > FSM as well which could be tricky as it can be created by the vacuum.
>
> ISTM normal invalidation logic should just take of that kind of thing.
>

Do you mean to say that we don't need to add any new invalidation call
and the existing invalidation calls will automatically take care of
same?

>
> > OTOH, if we want to extend it later for whatever reason to a relation
> > level cache, it shouldn't be that difficult as the implementation is
> > mostly contained in freespace.c (fsm* functions) and I think the
> > relation is accessible in most places.  We might need to rip out some
> > calls to clearlocalmap.
>
> But it really isn't contained to freespace.c. That's my primary
> concern.

Okay, I get that point.  I think among that also the need to call
FSMClearLocalMap seems to be your main worry which is fair, but OTOH,
the places where it should be called shouldn't be a ton.

> You added new parameters (undocumented ones!),
>

I think this is mostly for compatibility with the old code.  I agree
that is a wart,  but without much inputs during development, it
doesn't seem advisable to change old behavior, that is why we have
added a new parameter to GetPageWithFreeSpace.  However, if we want we
can remove that parameter or add document it in a better way.

> FSMClearLocalMap() needs to be called by callers and xlog, etc.
>

Agreed, that this is an additional requirement, but we have documented
the cases atop of this function where it needs to be called.  We might
have missed something, but we tried to cover all cases that we are
aware of.  Can we make it more clear by adding the comments atop
freespace.c API where this map is used?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Wed, Apr 17, 2019 at 9:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-04-17 15:49:29 +0530, Amit Kapila wrote:
> >> OTOH, if we want to extend it later for whatever reason to a relation
> >> level cache, it shouldn't be that difficult as the implementation is
> >> mostly contained in freespace.c (fsm* functions) and I think the
> >> relation is accessible in most places.  We might need to rip out some
> >> calls to clearlocalmap.
>
> > But it really isn't contained to freespace.c. That's my primary
> > concern. You added new parameters (undocumented ones!),
> > FSMClearLocalMap() needs to be called by callers and xlog, etc.
>
> Given where we are in the release cycle, and the major architectural
> concerns that have been raised about this patch, should we just
> revert it and try again in v13, rather than trying to fix it under
> time pressure?
>

I respect and will follow whatever will be the consensus after
discussion.  However, I request you to wait for some time to let the
discussion conclude.  If we can't get to an
agreement or one of John or me can't implement what is decided, then
we can anyway revert it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Thu, Apr 18, 2019 at 2:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I respect and will follow whatever will be the consensus after
> discussion.  However, I request you to wait for some time to let the
> discussion conclude.  If we can't get to an
> agreement or one of John or me can't implement what is decided, then
> we can anyway revert it.

Agreed. I suspect the most realistic way to address most of the
objections in a short amount of time would be to:

1. rip out the local map
2. restore hio.c to only checking the last block in the relation if
there is no FSM (and lower the threshold to reduce wasted space)
3. reduce calls to smgr_exists()

Thoughts?

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-04-17 12:20:29 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-04-17 15:49:29 +0530, Amit Kapila wrote:
> >> OTOH, if we want to extend it later for whatever reason to a relation
> >> level cache, it shouldn't be that difficult as the implementation is
> >> mostly contained in freespace.c (fsm* functions) and I think the
> >> relation is accessible in most places.  We might need to rip out some
> >> calls to clearlocalmap.
> 
> > But it really isn't contained to freespace.c. That's my primary
> > concern. You added new parameters (undocumented ones!),
> > FSMClearLocalMap() needs to be called by callers and xlog, etc.
> 
> Given where we are in the release cycle, and the major architectural
> concerns that have been raised about this patch, should we just
> revert it and try again in v13, rather than trying to fix it under
> time pressure?  It's not like there's not anything else on our
> plates to fix before beta.

Hm. I'm of split mind here:

It's a nice improvement, and the fixes probably wouldn't be that
hard. And we could have piped up a bit earlier about these concerns (I
only noticed this when rebasing zheap onto the newest version of
postgres).

But as you it's also late, and there's other stuff to do. Although I
think neither Amit nor John is heavily involved in any...

My compromise suggestion would be to try to give John and Amit ~2 weeks
to come up with a cleanup proposal, and then decide whether to 1) revert
2) apply the new patch, 3) decide to live with the warts for 12, and
apply the patch in 13. As we would already have a patch, 3) seems like
it'd be more tenable than without.

Regards,

Andres



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> My compromise suggestion would be to try to give John and Amit ~2 weeks
> to come up with a cleanup proposal, and then decide whether to 1) revert
> 2) apply the new patch, 3) decide to live with the warts for 12, and
> apply the patch in 13. As we would already have a patch, 3) seems like
> it'd be more tenable than without.

Seems reasonable.  I think we should shoot to have this resolved before
the end of the month, but it doesn't have to be done immediately.

            regards, tom lane



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Thu, Apr 18, 2019 at 2:10 PM John Naylor <john.naylor@2ndquadrant.com> wrote:
>
> On Thu, Apr 18, 2019 at 2:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I respect and will follow whatever will be the consensus after
> > discussion.  However, I request you to wait for some time to let the
> > discussion conclude.  If we can't get to an
> > agreement or one of John or me can't implement what is decided, then
> > we can anyway revert it.
>
> Agreed. I suspect the most realistic way to address most of the
> objections in a short amount of time would be to:
>
> 1. rip out the local map
> 2. restore hio.c to only checking the last block in the relation if
> there is no FSM (and lower the threshold to reduce wasted space)
> 3. reduce calls to smgr_exists()
>

Won't you need an extra call to RelationGetNumberofBlocks to find the
last block?  Also won't it be less efficient in terms of dealing with
bloat as compare to current patch?  I think if we go this route, then
we might need to revisit it in the future to optimize it, but maybe
that is the best alternative as of now.

I am thinking that we should at least give it a try to move the map to
rel cache level to see how easy or difficult it is and also let's wait
for a day or two to see if Andres/Tom has to say anything about this
or on the response by me above to improve the current patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Tom Lane
Дата:
Amit Kapila <amit.kapila16@gmail.com> writes:
> I am thinking that we should at least give it a try to move the map to
> rel cache level to see how easy or difficult it is and also let's wait
> for a day or two to see if Andres/Tom has to say anything about this
> or on the response by me above to improve the current patch.

FWIW, it's hard for me to see how moving the map to the relcache isn't
the right thing to do.  You will lose state during a relcache flush,
but that's still miles better than how often the state gets lost now.

            regards, tom lane



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:

On April 18, 2019 7:53:58 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>Amit Kapila <amit.kapila16@gmail.com> writes:
>> I am thinking that we should at least give it a try to move the map
>to
>> rel cache level to see how easy or difficult it is and also let's
>wait
>> for a day or two to see if Andres/Tom has to say anything about this
>> or on the response by me above to improve the current patch.
>
>FWIW, it's hard for me to see how moving the map to the relcache isn't
>the right thing to do.  You will lose state during a relcache flush,
>but that's still miles better than how often the state gets lost now.

+1
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Fri, Apr 19, 2019 at 10:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 18, 2019 at 2:10 PM John Naylor <john.naylor@2ndquadrant.com> wrote:
> > Agreed. I suspect the most realistic way to address most of the
> > objections in a short amount of time would be to:
> >
> > 1. rip out the local map
> > 2. restore hio.c to only checking the last block in the relation if
> > there is no FSM (and lower the threshold to reduce wasted space)
> > 3. reduce calls to smgr_exists()
> >
>
> Won't you need an extra call to RelationGetNumberofBlocks to find the
> last block?

If I understand you correctly, no, the call now in
GetPageWithFreeSpace() just moves it back to where it was in v11. In
the corner case where we just measured the table size and the last
block is full, we can pass nblocks to RecordAndGetPageWithFreeSpace().
There might be further optimizations available if we're not creating a
local map.

> Also won't it be less efficient in terms of dealing with
> bloat as compare to current patch?

Yes. The threshold would have to be 2 or 3 blocks, and it would stay
bloated until it passed the threshold. Not great, but perhaps not bad
either.

> I think if we go this route, then
> we might need to revisit it in the future to optimize it, but maybe
> that is the best alternative as of now.

It's a much lighter-weight API, which has that much going for it.
I have a draft implementation, which I can share if it comes to that
-- it needs some more thought and polish first.

> I am thinking that we should at least give it a try to move the map to
> rel cache level to see how easy or difficult it is and also let's wait
> for a day or two to see if Andres/Tom has to say anything about this
> or on the response by me above to improve the current patch.

Since we have a definite timeline, I'm okay with that, although I'm
afraid I'm not quite knowledgeable enough to help much with the
relcache piece.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Fri, Apr 19, 2019 at 1:17 PM John Naylor <john.naylor@2ndquadrant.com> wrote:
> On Fri, Apr 19, 2019 at 10:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > I think if we go this route, then
> > we might need to revisit it in the future to optimize it, but maybe
> > that is the best alternative as of now.
>
> It's a much lighter-weight API, which has that much going for it.
> I have a draft implementation, which I can share if it comes to that
> -- it needs some more thought and polish first.
>

I understand that it is lighter-weight API, but OTOH, it will be less
efficient as well.  Also, the consensus seems to be that we should
move this to relcache.

> > I am thinking that we should at least give it a try to move the map to
> > rel cache level to see how easy or difficult it is and also let's wait
> > for a day or two to see if Andres/Tom has to say anything about this
> > or on the response by me above to improve the current patch.
>
> Since we have a definite timeline, I'm okay with that, although I'm
> afraid I'm not quite knowledgeable enough to help much with the
> relcache piece.
>

Okay, I can try to help.  I think you can start by looking at
RelationData members (for ex. see how we cache index's metapage in
rd_amcache) and study a bit about routines in relcache.h.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Fri, Apr 19, 2019 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> > > I am thinking that we should at least give it a try to move the map to
> > > rel cache level to see how easy or difficult it is and also let's wait
> > > for a day or two to see if Andres/Tom has to say anything about this
> > > or on the response by me above to improve the current patch.
> >
> > Since we have a definite timeline, I'm okay with that, although I'm
> > afraid I'm not quite knowledgeable enough to help much with the
> > relcache piece.
> >
>
> Okay, I can try to help.  I think you can start by looking at
> RelationData members (for ex. see how we cache index's metapage in
> rd_amcache) and study a bit about routines in relcache.h.
>

Attached is a hacky and work-in-progress patch to move fsm map to
relcache.  This will give you some idea.  I think we need to see if
there is a need to invalidate the relcache due to this patch.  I think
some other comments of Andres also need to be addressed, see if you
can attempt to fix some of them.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-04-22 18:49:44 +0530, Amit Kapila wrote:
> Attached is a hacky and work-in-progress patch to move fsm map to
> relcache.  This will give you some idea.  I think we need to see if
> there is a need to invalidate the relcache due to this patch.  I think
> some other comments of Andres also need to be addressed, see if you
> can attempt to fix some of them.



>  /*
> @@ -1132,9 +1110,6 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
>      BlockNumber blkno,
>                  cached_target_block;
>  
> -    /* The local map must not be set already. */
> -    Assert(!FSM_LOCAL_MAP_EXISTS);
> -
>      /*
>       * Starting at the current last block in the relation and working
>       * backwards, mark alternating blocks as available.
> @@ -1142,7 +1117,7 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
>      blkno = cur_nblocks - 1;
>      while (true)
>      {
> -        fsm_local_map.map[blkno] = FSM_LOCAL_AVAIL;
> +        rel->fsm_local_map->map[blkno] = FSM_LOCAL_AVAIL;
>          if (blkno >= 2)
>              blkno -= 2;
>          else
> @@ -1150,13 +1125,13 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
>      }
>  
>      /* Cache the number of blocks. */
> -    fsm_local_map.nblocks = cur_nblocks;
> +    rel->fsm_local_map->nblocks = cur_nblocks;
>  
>      /* Set the status of the cached target block to 'unavailable'. */
>      cached_target_block = RelationGetTargetBlock(rel);
>      if (cached_target_block != InvalidBlockNumber &&
>          cached_target_block < cur_nblocks)
> -        fsm_local_map.map[cached_target_block] = FSM_LOCAL_NOT_AVAIL;
> +        rel->fsm_local_map->map[cached_target_block] = FSM_LOCAL_NOT_AVAIL;
>  }

I think there shouldn't be any need for this anymore. After this change
we shouldn't invalidate the map until there's no space on it - thereby
addressing the cost overhead, and greatly reducing the likelihood that
the local FSM can lead to increased bloat.


>  /*
> @@ -1168,18 +1143,18 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
>   * This function is used when there is no FSM.
>   */
>  static BlockNumber
> -fsm_local_search(void)
> +fsm_local_search(Relation rel)
>  {
>      BlockNumber target_block;
>  
>      /* Local map must be set by now. */
> -    Assert(FSM_LOCAL_MAP_EXISTS);
> +    Assert(FSM_LOCAL_MAP_EXISTS(rel));
>  
> -    target_block = fsm_local_map.nblocks;
> +    target_block = rel->fsm_local_map->nblocks;
>      do
>      {
>          target_block--;
> -        if (fsm_local_map.map[target_block] == FSM_LOCAL_AVAIL)
> +        if (rel->fsm_local_map->map[target_block] == FSM_LOCAL_AVAIL)
>              return target_block;
>      } while (target_block > 0);
>  
> @@ -1189,7 +1164,22 @@ fsm_local_search(void)
>       * first, which would otherwise lead to the same conclusion again and
>       * again.
>       */
> -    FSMClearLocalMap();
> +    fsm_clear_local_map(rel);

I'm not sure I like this. My inclination would be that we should be able
to check the local fsm repeatedly even if there's no space in the
in-memory representation - otherwise the use of the local FSM increases
bloat.

Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Mon, Apr 22, 2019 at 10:34 PM Andres Freund <andres@anarazel.de> wrote:
> On 2019-04-22 18:49:44 +0530, Amit Kapila wrote:
> >  /*
> > @@ -1132,9 +1110,6 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> >       BlockNumber blkno,
> >                               cached_target_block;
> >
> > -     /* The local map must not be set already. */
> > -     Assert(!FSM_LOCAL_MAP_EXISTS);
> > -
> >       /*
> >        * Starting at the current last block in the relation and working
> >        * backwards, mark alternating blocks as available.
> > @@ -1142,7 +1117,7 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> >       blkno = cur_nblocks - 1;
> >       while (true)
> >       {
> > -             fsm_local_map.map[blkno] = FSM_LOCAL_AVAIL;
> > +             rel->fsm_local_map->map[blkno] = FSM_LOCAL_AVAIL;
> >               if (blkno >= 2)
> >                       blkno -= 2;
> >               else
> > @@ -1150,13 +1125,13 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> >       }
> >
> >       /* Cache the number of blocks. */
> > -     fsm_local_map.nblocks = cur_nblocks;
> > +     rel->fsm_local_map->nblocks = cur_nblocks;
> >
> >       /* Set the status of the cached target block to 'unavailable'. */
> >       cached_target_block = RelationGetTargetBlock(rel);
> >       if (cached_target_block != InvalidBlockNumber &&
> >               cached_target_block < cur_nblocks)
> > -             fsm_local_map.map[cached_target_block] = FSM_LOCAL_NOT_AVAIL;
> > +             rel->fsm_local_map->map[cached_target_block] = FSM_LOCAL_NOT_AVAIL;
> >  }
>
> I think there shouldn't be any need for this anymore. After this change
> we shouldn't invalidate the map until there's no space on it - thereby
> addressing the cost overhead, and greatly reducing the likelihood that
> the local FSM can lead to increased bloat.
>

If we invalidate it only when there's no space on the page, then when
should we set it back to available, because if we don't do that, then
we might miss the space due to concurrent deletes.

>
> >  /*
> > @@ -1168,18 +1143,18 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> >   * This function is used when there is no FSM.
> >   */
> >  static BlockNumber
> > -fsm_local_search(void)
> > +fsm_local_search(Relation rel)
> >  {
> >       BlockNumber target_block;
> >
> >       /* Local map must be set by now. */
> > -     Assert(FSM_LOCAL_MAP_EXISTS);
> > +     Assert(FSM_LOCAL_MAP_EXISTS(rel));
> >
> > -     target_block = fsm_local_map.nblocks;
> > +     target_block = rel->fsm_local_map->nblocks;
> >       do
> >       {
> >               target_block--;
> > -             if (fsm_local_map.map[target_block] == FSM_LOCAL_AVAIL)
> > +             if (rel->fsm_local_map->map[target_block] == FSM_LOCAL_AVAIL)
> >                       return target_block;
> >       } while (target_block > 0);
> >
> > @@ -1189,7 +1164,22 @@ fsm_local_search(void)
> >        * first, which would otherwise lead to the same conclusion again and
> >        * again.
> >        */
> > -     FSMClearLocalMap();
> > +     fsm_clear_local_map(rel);
>
> I'm not sure I like this. My inclination would be that we should be able
> to check the local fsm repeatedly even if there's no space in the
> in-memory representation - otherwise the use of the local FSM increases
> bloat.
>

Do you mean to say that we always check all the pages (say 4)
irrespective of their state in the local map?

I think we should first try to see in this new scheme (a) when to set
the map, (b) when to clear it, (c) how to use.  I have tried to
summarize my thoughts about it, let me know what do you think about
the same?

When to set the map.
At the beginning (the first time relation is used in the backend), the
map will be clear.  When the first time in the backend, we find that
FSM doesn't exist and the number of blocks is lesser than
HEAP_FSM_CREATION_THRESHOLD, we set the map for the total blocks that
exist at that time and mark all or alternate blocks as available.

Also, when we find that none of the blocks are available in the map
(basically they are marked invalid which means we have previously
checked that there is no space in them), we should get the number of
blocks and if they are less than the threshold, then add it to the
map.


When to clear the map?
Once we find out that the particular page doesn't have space, we can
mark the corresponding page in the map as invalid (or not available to
check).  After relation extension, we can check if the latest block is
greater than the threshold value, then we can clear the map.  At
truncate or some other similar times, when relcache entry is
invalidated, automatically the map will be cleared.

How to use the map?
Now, whenever we find the map exists, we can check the blocks that are
marked as available in it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-04-23 15:46:17 +0530, Amit Kapila wrote:
> On Mon, Apr 22, 2019 at 10:34 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2019-04-22 18:49:44 +0530, Amit Kapila wrote:
> > >  /*
> > > @@ -1132,9 +1110,6 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> > >       BlockNumber blkno,
> > >                               cached_target_block;
> > >
> > > -     /* The local map must not be set already. */
> > > -     Assert(!FSM_LOCAL_MAP_EXISTS);
> > > -
> > >       /*
> > >        * Starting at the current last block in the relation and working
> > >        * backwards, mark alternating blocks as available.
> > > @@ -1142,7 +1117,7 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> > >       blkno = cur_nblocks - 1;
> > >       while (true)
> > >       {
> > > -             fsm_local_map.map[blkno] = FSM_LOCAL_AVAIL;
> > > +             rel->fsm_local_map->map[blkno] = FSM_LOCAL_AVAIL;
> > >               if (blkno >= 2)
> > >                       blkno -= 2;
> > >               else
> > > @@ -1150,13 +1125,13 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> > >       }
> > >
> > >       /* Cache the number of blocks. */
> > > -     fsm_local_map.nblocks = cur_nblocks;
> > > +     rel->fsm_local_map->nblocks = cur_nblocks;
> > >
> > >       /* Set the status of the cached target block to 'unavailable'. */
> > >       cached_target_block = RelationGetTargetBlock(rel);
> > >       if (cached_target_block != InvalidBlockNumber &&
> > >               cached_target_block < cur_nblocks)
> > > -             fsm_local_map.map[cached_target_block] = FSM_LOCAL_NOT_AVAIL;
> > > +             rel->fsm_local_map->map[cached_target_block] = FSM_LOCAL_NOT_AVAIL;
> > >  }
> >
> > I think there shouldn't be any need for this anymore. After this change
> > we shouldn't invalidate the map until there's no space on it - thereby
> > addressing the cost overhead, and greatly reducing the likelihood that
> > the local FSM can lead to increased bloat.

> If we invalidate it only when there's no space on the page, then when
> should we set it back to available, because if we don't do that, then
> we might miss the space due to concurrent deletes.

Well, deletes don't traditionally (i.e. with an actual FSM) mark free
space as available (for heap). I think RecordPageWithFreeSpace() should
issue a invalidation if there's no FSM, and the block goes from full to
empty (as there's no half-full IIUC).

> > >  /*
> > > @@ -1168,18 +1143,18 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> > >   * This function is used when there is no FSM.
> > >   */
> > >  static BlockNumber
> > > -fsm_local_search(void)
> > > +fsm_local_search(Relation rel)
> > >  {
> > >       BlockNumber target_block;
> > >
> > >       /* Local map must be set by now. */
> > > -     Assert(FSM_LOCAL_MAP_EXISTS);
> > > +     Assert(FSM_LOCAL_MAP_EXISTS(rel));
> > >
> > > -     target_block = fsm_local_map.nblocks;
> > > +     target_block = rel->fsm_local_map->nblocks;
> > >       do
> > >       {
> > >               target_block--;
> > > -             if (fsm_local_map.map[target_block] == FSM_LOCAL_AVAIL)
> > > +             if (rel->fsm_local_map->map[target_block] == FSM_LOCAL_AVAIL)
> > >                       return target_block;
> > >       } while (target_block > 0);
> > >
> > > @@ -1189,7 +1164,22 @@ fsm_local_search(void)
> > >        * first, which would otherwise lead to the same conclusion again and
> > >        * again.
> > >        */
> > > -     FSMClearLocalMap();
> > > +     fsm_clear_local_map(rel);
> >
> > I'm not sure I like this. My inclination would be that we should be able
> > to check the local fsm repeatedly even if there's no space in the
> > in-memory representation - otherwise the use of the local FSM increases
> > bloat.
> >
> 
> Do you mean to say that we always check all the pages (say 4)
> irrespective of their state in the local map?

I was wondering that, yes. But I think just issuing invalidations is the
right approach instead, see above.


> I think we should first try to see in this new scheme (a) when to set
> the map, (b) when to clear it, (c) how to use.  I have tried to
> summarize my thoughts about it, let me know what do you think about
> the same?
> 
> When to set the map.
> At the beginning (the first time relation is used in the backend), the
> map will be clear.  When the first time in the backend, we find that
> FSM doesn't exist and the number of blocks is lesser than
> HEAP_FSM_CREATION_THRESHOLD, we set the map for the total blocks that
> exist at that time and mark all or alternate blocks as available.

I think the alternate blocks scheme has to go. It's not defensible.


Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-04-23 15:46:17 +0530, Amit Kapila wrote:
>> If we invalidate it only when there's no space on the page, then when
>> should we set it back to available, because if we don't do that, then
>> we might miss the space due to concurrent deletes.

> Well, deletes don't traditionally (i.e. with an actual FSM) mark free
> space as available (for heap). I think RecordPageWithFreeSpace() should
> issue a invalidation if there's no FSM, and the block goes from full to
> empty (as there's no half-full IIUC).

Why wouldn't we implement this just as a mini four-entry FSM stored in
the relcache, and update the entries according to the same rules we'd
use for regular FSM entries?

            regards, tom lane



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-04-23 13:31:25 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-04-23 15:46:17 +0530, Amit Kapila wrote:
> >> If we invalidate it only when there's no space on the page, then when
> >> should we set it back to available, because if we don't do that, then
> >> we might miss the space due to concurrent deletes.
> 
> > Well, deletes don't traditionally (i.e. with an actual FSM) mark free
> > space as available (for heap). I think RecordPageWithFreeSpace() should
> > issue a invalidation if there's no FSM, and the block goes from full to
> > empty (as there's no half-full IIUC).
> 
> Why wouldn't we implement this just as a mini four-entry FSM stored in
> the relcache, and update the entries according to the same rules we'd
> use for regular FSM entries?

I mean the big difference is that it's not shared memory. So there needs
to be some difference. My suggestion to handle that is to just issue an
invalidation when *increasing* the amount of space.

And sure, leaving that aside we could store one byte per block - it's
just not what the patch has done so far (or rather, it used one byte per
block, but only utilized one bit of it).  It's possible that'd come with
some overhead - I've not thought sufficiently about it: I assume we'd
still start out in each backend assuming each page is empty, and we'd
then rely on RelationGetBufferForTuple() to update that. What I wonder
is if we'd need to check if an on-disk FSM has been created every time
the space on a page is reduced?  I think not, if we use invalidations to
notify others of the existance of an on-disk FSM. There's a small race,
but that seems ok.

Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Tue, Apr 23, 2019 at 10:59 PM Andres Freund <andres@anarazel.de> wrote:
> > > On 2019-04-22 18:49:44 +0530, Amit Kapila wrote:
> > > >  /*
> > > > @@ -1132,9 +1110,6 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> > > >       /* Set the status of the cached target block to 'unavailable'. */
> > > >       cached_target_block = RelationGetTargetBlock(rel);
> > > >       if (cached_target_block != InvalidBlockNumber &&
> > > >               cached_target_block < cur_nblocks)
> > > > -             fsm_local_map.map[cached_target_block] = FSM_LOCAL_NOT_AVAIL;
> > > > +             rel->fsm_local_map->map[cached_target_block] = FSM_LOCAL_NOT_AVAIL;
> > > >  }
> > >
> > > I think there shouldn't be any need for this anymore. After this change
> > > we shouldn't invalidate the map until there's no space on it - thereby
> > > addressing the cost overhead, and greatly reducing the likelihood that
> > > the local FSM can lead to increased bloat.
>

I have removed the code that was invalidating cached target block from
the above function.

> > If we invalidate it only when there's no space on the page, then when
> > should we set it back to available, because if we don't do that, then
> > we might miss the space due to concurrent deletes.
>
> Well, deletes don't traditionally (i.e. with an actual FSM) mark free
> space as available (for heap). I think RecordPageWithFreeSpace() should
> issue a invalidation if there's no FSM, and the block goes from full to
> empty (as there's no half-full IIUC).
>

Sure, we can do that.

> > > >  /*
> > > > @@ -1168,18 +1143,18 @@ fsm_local_set(Relation rel, BlockNumber cur_nblocks)
> > > >   * This function is used when there is no FSM.
> > > >   */
> > > >  static BlockNumber
> > > > -fsm_local_search(void)
> > > > +fsm_local_search(Relation rel)
> > > >  {
> > > >       BlockNumber target_block;
> > > >
> > > >       /* Local map must be set by now. */
> > > > -     Assert(FSM_LOCAL_MAP_EXISTS);
> > > > +     Assert(FSM_LOCAL_MAP_EXISTS(rel));
> > > >
> > > > -     target_block = fsm_local_map.nblocks;
> > > > +     target_block = rel->fsm_local_map->nblocks;
> > > >       do
> > > >       {
> > > >               target_block--;
> > > > -             if (fsm_local_map.map[target_block] == FSM_LOCAL_AVAIL)
> > > > +             if (rel->fsm_local_map->map[target_block] == FSM_LOCAL_AVAIL)
> > > >                       return target_block;
> > > >       } while (target_block > 0);
> > > >
> > > > @@ -1189,7 +1164,22 @@ fsm_local_search(void)
> > > >        * first, which would otherwise lead to the same conclusion again and
> > > >        * again.
> > > >        */
> > > > -     FSMClearLocalMap();
> > > > +     fsm_clear_local_map(rel);
> > >
> > > I'm not sure I like this. My inclination would be that we should be able
> > > to check the local fsm repeatedly even if there's no space in the
> > > in-memory representation - otherwise the use of the local FSM increases
> > > bloat.
> > >
> >
> > Do you mean to say that we always check all the pages (say 4)
> > irrespective of their state in the local map?
>
> I was wondering that, yes. But I think just issuing invalidations is the
> right approach instead, see above.
>

Righ issuing invalidations can help with that.

>
> > I think we should first try to see in this new scheme (a) when to set
> > the map, (b) when to clear it, (c) how to use.  I have tried to
> > summarize my thoughts about it, let me know what do you think about
> > the same?
> >
> > When to set the map.
> > At the beginning (the first time relation is used in the backend), the
> > map will be clear.  When the first time in the backend, we find that
> > FSM doesn't exist and the number of blocks is lesser than
> > HEAP_FSM_CREATION_THRESHOLD, we set the map for the total blocks that
> > exist at that time and mark all or alternate blocks as available.
>
> I think the alternate blocks scheme has to go. It's not defensible.
>

Fair enough, I have changed it in the attached patch.  However, I
think we should test it once the patch is ready as we have seen a
small performance regression due to that.

> And sure, leaving that aside we could store one byte per block

Hmm, I think you mean to say one-bit per block, right?

> - it's
> just not what the patch has done so far (or rather, it used one byte per
> block, but only utilized one bit of it).

Right, I think this is an independently useful improvement, provided
it doesn't have any additional overhead or complexity.

>  It's possible that'd come with
> some overhead - I've not thought sufficiently about it: I assume we'd
> still start out in each backend assuming each page is empty, and we'd
> then rely on RelationGetBufferForTuple() to update that. What I wonder
> is if we'd need to check if an on-disk FSM has been created every time
> the space on a page is reduced?  I think not, if we use invalidations to
> notify others of the existance of an on-disk FSM. There's a small race,
> but that seems ok.
>

Do you mean to say that vacuum or some backend should invalidate in
case it first time creates the FSM for a relation? I think it is quite
possible that the vacuum takes time to trigger such invalidation if
there are fewer deletions.  And we won't be able to use newly added
page/s.

IIUC, you are suggesting to issue invalidations when the (a) vacuum
finds there is no FSM and page has more space now, (b) invalidation to
notify the existence of FSM.  IT seems to me that issuing more
invalidations for the purpose of FSM can lead to an increased number
of relation builds in the overall system.  I think this can have an
impact when there are many small relations in the system which in some
scenarios might not be uncommon.

The two improvements in this code which are discussed in this thread
and can be done independently to this patch are:
a. use one bit to represent each block in the map.  This gives us the
flexibility to use the map for the different threshold for some other
storage.
b. improve the usage of smgrexists by checking smgr_fsm_nblocks.

John, can you implement these two improvements either on HEAD or on
top of this patch?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-04-24 11:28:32 +0530, Amit Kapila wrote:
> On Tue, Apr 23, 2019 at 10:59 PM Andres Freund <andres@anarazel.de> wrote:
> > > > On 2019-04-22 18:49:44 +0530, Amit Kapila wrote:
> > > I think we should first try to see in this new scheme (a) when to set
> > > the map, (b) when to clear it, (c) how to use.  I have tried to
> > > summarize my thoughts about it, let me know what do you think about
> > > the same?
> > >
> > > When to set the map.
> > > At the beginning (the first time relation is used in the backend), the
> > > map will be clear.  When the first time in the backend, we find that
> > > FSM doesn't exist and the number of blocks is lesser than
> > > HEAP_FSM_CREATION_THRESHOLD, we set the map for the total blocks that
> > > exist at that time and mark all or alternate blocks as available.
> >
> > I think the alternate blocks scheme has to go. It's not defensible.
> >
> 
> Fair enough, I have changed it in the attached patch.  However, I
> think we should test it once the patch is ready as we have seen a
> small performance regression due to that.

Sure, but that was because you re-scanned from scratch after every
insertion, no?


> > And sure, leaving that aside we could store one byte per block
> 
> Hmm, I think you mean to say one-bit per block, right?

No, I meant byte. The normal FSM saves how much space there is for a
block, but the current local fsm doesn't. That means pages are marked as
unavailble even though other tuples would possibly fit.


> >  It's possible that'd come with
> > some overhead - I've not thought sufficiently about it: I assume we'd
> > still start out in each backend assuming each page is empty, and we'd
> > then rely on RelationGetBufferForTuple() to update that. What I wonder
> > is if we'd need to check if an on-disk FSM has been created every time
> > the space on a page is reduced?  I think not, if we use invalidations to
> > notify others of the existance of an on-disk FSM. There's a small race,
> > but that seems ok.

> Do you mean to say that vacuum or some backend should invalidate in
> case it first time creates the FSM for a relation?

Right.


> I think it is quite possible that the vacuum takes time to trigger
> such invalidation if there are fewer deletions.  And we won't be able
> to use newly added page/s.

I'm not sure I understand what you mean by that? If the backend that
ends up creating the FSM - because it extended the relation beyond 4
pages - issues an invalidation, the time till other backends pick that
up should be minimal?


> IIUC, you are suggesting to issue invalidations when the (a) vacuum
> finds there is no FSM and page has more space now, (b) invalidation to
> notify the existence of FSM.  IT seems to me that issuing more
> invalidations for the purpose of FSM can lead to an increased number
> of relation builds in the overall system.  I think this can have an
> impact when there are many small relations in the system which in some
> scenarios might not be uncommon.

If this becomes an issue I'd just create a separate type of
invalidation, one that just signals that the FSM is being invalidated.


Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Wed, Apr 24, 2019 at 9:49 PM Andres Freund <andres@anarazel.de> wrote:
> On 2019-04-24 11:28:32 +0530, Amit Kapila wrote:
> > On Tue, Apr 23, 2019 at 10:59 PM Andres Freund <andres@anarazel.de> wrote:
> > > > > On 2019-04-22 18:49:44 +0530, Amit Kapila wrote:
> > > > I think we should first try to see in this new scheme (a) when to set
> > > > the map, (b) when to clear it, (c) how to use.  I have tried to
> > > > summarize my thoughts about it, let me know what do you think about
> > > > the same?
> > > >
> > > > When to set the map.
> > > > At the beginning (the first time relation is used in the backend), the
> > > > map will be clear.  When the first time in the backend, we find that
> > > > FSM doesn't exist and the number of blocks is lesser than
> > > > HEAP_FSM_CREATION_THRESHOLD, we set the map for the total blocks that
> > > > exist at that time and mark all or alternate blocks as available.
> > >
> > > I think the alternate blocks scheme has to go. It's not defensible.
> > >
> >
> > Fair enough, I have changed it in the attached patch.  However, I
> > think we should test it once the patch is ready as we have seen a
> > small performance regression due to that.
>
> Sure, but that was because you re-scanned from scratch after every
> insertion, no?
>

Possible.

>
> > > And sure, leaving that aside we could store one byte per block
> >
> > Hmm, I think you mean to say one-bit per block, right?
>
> No, I meant byte.
>

Upthread you have said: "Hm, given realistic
HEAP_FSM_CREATION_THRESHOLD, and the fact that we really only need one
bit per relation, it seems like map should really be just a uint32
with one bit per page.   It'd allow different AMs to have different
numbers of dont-create-fsm thresholds without needing additional
memory (up to 32 blocks)."

I can understand the advantage of one-bit per-page suggestion, but now
you are telling one-byte per-page.  I am confused between those two
options.  Am, I missing something?

> The normal FSM saves how much space there is for a
> block, but the current local fsm doesn't. That means pages are marked as
> unavailble even though other tuples would possibly fit.
>

Sure, in regular FSM, the vacuum can update the available space, but
we don't have such a provision for local map unless we decide to keep
it in shared memory.

>
> > >  It's possible that'd come with
> > > some overhead - I've not thought sufficiently about it: I assume we'd
> > > still start out in each backend assuming each page is empty, and we'd
> > > then rely on RelationGetBufferForTuple() to update that. What I wonder
> > > is if we'd need to check if an on-disk FSM has been created every time
> > > the space on a page is reduced?  I think not, if we use invalidations to
> > > notify others of the existance of an on-disk FSM. There's a small race,
> > > but that seems ok.
>
> > Do you mean to say that vacuum or some backend should invalidate in
> > case it first time creates the FSM for a relation?
>
> Right.
>
>
> > I think it is quite possible that the vacuum takes time to trigger
> > such invalidation if there are fewer deletions.  And we won't be able
> > to use newly added page/s.
>
> I'm not sure I understand what you mean by that? If the backend that
> ends up creating the FSM - because it extended the relation beyond 4
> pages - issues an invalidation, the time till other backends pick that
> up should be minimal?
>

Consider when a backend-1 starts inserting into a relation, it has
just two pages and we create a local map in the relation which has two
pages.  Now, backend-2 extends the relation by one-page, how and when
will backend-1 comes to know about that.  One possibility is that once
all the pages present in backend-1's relation becomes invalid
(used-up), we again check the number of blocks and update the local
map.

>
> > IIUC, you are suggesting to issue invalidations when the (a) vacuum
> > finds there is no FSM and page has more space now, (b) invalidation to
> > notify the existence of FSM.  IT seems to me that issuing more
> > invalidations for the purpose of FSM can lead to an increased number
> > of relation builds in the overall system.  I think this can have an
> > impact when there are many small relations in the system which in some
> > scenarios might not be uncommon.
>
> If this becomes an issue I'd just create a separate type of
> invalidation, one that just signals that the FSM is being invalidated.
>

Oh, clever idea, but I guess that will be some work unless we already
do something similar elsewhere.  Anyway, we can look into it later.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Wed, Apr 24, 2019 at 1:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

> The two improvements in this code which are discussed in this thread
> and can be done independently to this patch are:
> a. use one bit to represent each block in the map.  This gives us the
> flexibility to use the map for the different threshold for some other
> storage.
> b. improve the usage of smgrexists by checking smgr_fsm_nblocks.
>
> John, can you implement these two improvements either on HEAD or on
> top of this patch?

I've done B in the attached. There is a more recent idea of using the
byte to store the actual free space in the same format as the FSM.
That might be v13 material, but in any case, I'll hold off on A for
now.


-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Wed, Apr 24, 2019 at 1:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> <v2 patch>

Sorry for not noticing earlier, but this patch causes a regression
test failure for me (attached)

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Thu, Apr 25, 2019 at 12:39 PM John Naylor
<john.naylor@2ndquadrant.com> wrote:
>
> On Wed, Apr 24, 2019 at 1:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > <v2 patch>
>
> Sorry for not noticing earlier, but this patch causes a regression
> test failure for me (attached)
>

Can you please try to finish the remaining work of the patch (I am bit
tied up with some other things)?  I think the main thing apart from
representation of map as one-byte or one-bit per block is to implement
invalidation.   Also, try to see if there is anything pending which I
might have missed.  As discussed above, we need to issue an
invalidation for following points:  (a) when vacuum finds there is no
FSM and page has more space now, I think you can detect this in
RecordPageWithFreeSpace (b) invalidation to notify the existence of
FSM, this can be done both by vacuum and backend.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Wed, Apr 17, 2019 at 2:04 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> I'm somewhat unhappy in how much the no-fsm-for-small-rels exposed
> complexity that looks like it should be purely in freespacemap.c to
> callers.

I took a stab at untying the free space code from any knowledge about
heaps, and made it the responsibility of each access method that calls
these free space routines to specify their own threshold (possibly
zero). The attached applies on top of HEAD, because it's independent
of the relcache logic being discussed. If I can get that working, the
I'll rebase it on top of this API, if you like.

>  extern Size GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk);
> -extern BlockNumber GetPageWithFreeSpace(Relation rel, Size spaceNeeded);
> +extern BlockNumber GetPageWithFreeSpace(Relation rel, Size spaceNeeded,
> +                    bool check_fsm_only);
>
> So now freespace.c has an argument that says we should only check the
> fsm. That's confusing. And it's not explained to callers what that
> argument means, and when it should be set.

I split this up into 2 routines: GetPageWithFreeSpace() is now exactly
like it is in v11, and GetAlternatePage() is available for access
methods that can use it.

> +/* Only create the FSM if the heap has greater than this many blocks */
> +#define HEAP_FSM_CREATION_THRESHOLD 4
>
> Hm, this seems to be tying freespace.c closer to heap than I think is
> great - think of new AMs like zheap, that also want to use it.

This was a bit harder than expected. Because of the pg_upgrade
optimization, it was impossible to put this symbol in hio.h or
heapam.h, because they include things unsafe for frontend code. I
decided to create heap_fe.h, which is a hack. Also, because they have
freespace.c callers, I put other thresholds in

src/backend/storage/freespace/indexfsm.c
src/include/access/brin_pageops.h

Putting the thresholds in 3 files with completely different purposes
is a mess, and serves no example for future access methods, but I
don't have a better idea.

On the upside, untying free space from heap allowed me to remove most
of the checks for

(rel->rd_rel->relkind == RELKIND_RELATION ||
 rel->rd_rel->relkind == RELKIND_TOASTVALUE)

except for the one in pg_upgrade.c, which is again a bit of a hack, but not bad.

> Hm, given realistic HEAP_FSM_CREATION_THRESHOLD, and the fact that we
> really only need one bit per relation, it seems like map should really
> be just a uint32 with one bit per page.

Done. Regarding the idea upthread about using bytes to store ordinary
freespace values, I think that's better for correctness, but also
makes it more difficult to use different thresholds per access method.

> +static bool
> +fsm_allow_writes(Relation rel, BlockNumber heapblk,
> +                BlockNumber nblocks, BlockNumber *get_nblocks)
>
> +   RelationOpenSmgr(rel);
> +   if (smgrexists(rel->rd_smgr, FSM_FORKNUM))
> +       return true;
>
> Isn't this like really expensive? mdexists() closes the relations and
> reopens it from scratch. Shouldn't we at the very least check
> smgr_fsm_nblocks beforehand, so this is only done once?

I did this in an earlier patch above -- do you have an opinion about that?

I also removed the call to smgrnblocks(smgr, MAIN_FORKNUM) from
XLogRecordPageWithFreeSpace() because I don't think it's actually
needed. There's a window where a table could have 5 blocks, but trying
to record space on, say, block 2 won't actually create the FSM on the
standby. When block 5 fills up enough, then the xlog call will cause
the FSM to be created. Not sure if this is best, but it saves another
syscall, and this function is only called when freespace is less than
20%, IIRC.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Fri, Apr 26, 2019 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 25, 2019 at 12:39 PM John Naylor
> <john.naylor@2ndquadrant.com> wrote:
> >
> > On Wed, Apr 24, 2019 at 1:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > <v2 patch>
> >
> > Sorry for not noticing earlier, but this patch causes a regression
> > test failure for me (attached)
> >
>
> Can you please try to finish the remaining work of the patch (I am bit
> tied up with some other things)?  I think the main thing apart from
> representation of map as one-byte or one-bit per block is to implement
> invalidation.   Also, try to see if there is anything pending which I
> might have missed.  As discussed above, we need to issue an
> invalidation for following points:  (a) when vacuum finds there is no
> FSM and page has more space now, I think you can detect this in
> RecordPageWithFreeSpace (b) invalidation to notify the existence of
> FSM, this can be done both by vacuum and backend.

Yes, I'll work on it.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Fri, Apr 26, 2019 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> As discussed above, we need to issue an
> invalidation for following points:  (a) when vacuum finds there is no
> FSM and page has more space now, I think you can detect this in
> RecordPageWithFreeSpace

I took a brief look and we'd have to know how much space was there
before. That doesn't seem possible without first implementing the idea
to save free space locally in the same way the FSM does. Even if we
have consensus on that, there's no code for it, and we're running out
of time.

> (b) invalidation to notify the existence of
> FSM, this can be done both by vacuum and backend.

I still don't claim to be anything but naive in this area, but does
the attached get us any closer?

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Alvaro Herrera
Дата:
On 2019-Apr-30, John Naylor wrote:

> On Fri, Apr 26, 2019 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > As discussed above, we need to issue an
> > invalidation for following points:  (a) when vacuum finds there is no
> > FSM and page has more space now, I think you can detect this in
> > RecordPageWithFreeSpace
> 
> I took a brief look and we'd have to know how much space was there
> before. That doesn't seem possible without first implementing the idea
> to save free space locally in the same way the FSM does. Even if we
> have consensus on that, there's no code for it, and we're running out
> of time.

Hmm ... so, if vacuum runs and frees up any space from any of the pages,
then it should send out an invalidation -- it doesn't matter what the
FSM had, just that there is more free space now.  That means every other
process will need to determine a fresh FSM, but that seems correct.
Sounds better than keeping outdated entries indicating no-space-available.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Tue, Apr 30, 2019 at 7:52 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2019-Apr-30, John Naylor wrote:
>
> > On Fri, Apr 26, 2019 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > As discussed above, we need to issue an
> > > invalidation for following points:  (a) when vacuum finds there is no
> > > FSM and page has more space now, I think you can detect this in
> > > RecordPageWithFreeSpace
> >
> > I took a brief look and we'd have to know how much space was there
> > before. That doesn't seem possible without first implementing the idea
> > to save free space locally in the same way the FSM does. Even if we
> > have consensus on that, there's no code for it, and we're running out
> > of time.
>
> Hmm ... so, if vacuum runs and frees up any space from any of the pages,
> then it should send out an invalidation -- it doesn't matter what the
> FSM had, just that there is more free space now.  That means every other
> process will need to determine a fresh FSM,
>

I think you intend to say the local space map because once FSM is
created we will send invalidation and we won't further build relcache
entry having local space map.

> but that seems correct.
> Sounds better than keeping outdated entries indicating no-space-available.
>

Agreed, but as mentioned in one of the above emails, I am also bit
scared that it should not lead to many invalidation messages for small
relations, so may be we should send the invalidation message only when
the entire page is empty.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Tue, Apr 30, 2019 at 6:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Apr 30, 2019 at 2:24 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> >  insert into atacc1 values (21, 22, 23);
> > +ERROR:  could not read block 0 in file "base/16384/31379": read only
> > 0 of 8192 bytes
> >
> > I have analysed this failure.  Seems that we have not reset the
> > rel->fsm_local_map while truncating the relation pages by vacuum
> > (lazy_truncate_heap). So when next time while accessing it we are
> > getting the error.   I think we need a mechanism to invalidate this
> > when we truncate the relation pages.   I am not sure whether we should
> > invalidate the relcache entry here or just reset the
> > rel->fsm_local_map?
> >
>
> Thanks, this appears to be the missing case where we need to
> invalidate the cache.  So, as discussed above if we issue invalidation
> call (in RecordPageWithFreeSpace) when the page becomes empty, then we
> shouldn't encounter this.  John, can we try this out and see if the
> failure goes away?

I added a clear/inval call in RecordPageWithFreeSpace and the failure
goes away. Thanks for the analysis, Dilip!

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Tue, Apr 30, 2019 at 12:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Apr 26, 2019 at 10:46 AM John Naylor
> <john.naylor@2ndquadrant.com> wrote:
> >
> > On Wed, Apr 17, 2019 at 2:04 AM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > Hi,
> > >
> > > I'm somewhat unhappy in how much the no-fsm-for-small-rels exposed
> > > complexity that looks like it should be purely in freespacemap.c to
> > > callers.
> >
> > I took a stab at untying the free space code from any knowledge about
> > heaps, and made it the responsibility of each access method that calls
> > these free space routines to specify their own threshold (possibly
> > zero). The attached applies on top of HEAD, because it's independent
> > of the relcache logic being discussed. If I can get that working, the
> > I'll rebase it on top of this API, if you like.
> >
> > >  extern Size GetRecordedFreeSpace(Relation rel, BlockNumber heapBlk);
> > > -extern BlockNumber GetPageWithFreeSpace(Relation rel, Size spaceNeeded);
> > > +extern BlockNumber GetPageWithFreeSpace(Relation rel, Size spaceNeeded,
> > > +                    bool check_fsm_only);
> > >
> > > So now freespace.c has an argument that says we should only check the
> > > fsm. That's confusing. And it's not explained to callers what that
> > > argument means, and when it should be set.
> >
> > I split this up into 2 routines: GetPageWithFreeSpace() is now exactly
> > like it is in v11, and GetAlternatePage() is available for access
> > methods that can use it.
> >
>
> I don't much like the new function name GetAlternatePage, may be
> GetPageFromLocalFSM or something like that.  OTOH, I am not sure if we
> should go that far to address this concern of Andres's, maybe just
> adding a proper comment is sufficient.

That's a clearer name. I think 2 functions is easier to follow than
the boolean parameter.

> > Putting the thresholds in 3 files with completely different purposes
> > is a mess, and serves no example for future access methods, but I
> > don't have a better idea.
> >
>
> Yeah, I am also not sure if it is a good idea because it still won't
> be easy for pluggable storage especially the pg_upgrade part.  I think
> if we really want to make it easy for pluggable storage to define
> this, then we might need to build something along the lines of how to
> estimate relation size works.
>
> See how table_relation_estimate_size is defined and used
> and TableAmRoutine heapam_methods
> {
> ..
> relation_estimate_size
> }

That might be the best way for table ams, but I guess we'd still need
to keep the hard-coding for indexes to always have a FSM. That might
not be too bad.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Wed, May 1, 2019 at 9:57 AM John Naylor <john.naylor@2ndquadrant.com> wrote:
>
> On Tue, Apr 30, 2019 at 12:48 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Apr 26, 2019 at 10:46 AM John Naylor
> > <john.naylor@2ndquadrant.com> wrote:
> > I don't much like the new function name GetAlternatePage, may be
> > GetPageFromLocalFSM or something like that.  OTOH, I am not sure if we
> > should go that far to address this concern of Andres's, maybe just
> > adding a proper comment is sufficient.
>
> That's a clearer name. I think 2 functions is easier to follow than
> the boolean parameter.
>

Okay, but then add a few comments where you are calling that function.

> > > Putting the thresholds in 3 files with completely different purposes
> > > is a mess, and serves no example for future access methods, but I
> > > don't have a better idea.
> > >
> >
> > Yeah, I am also not sure if it is a good idea because it still won't
> > be easy for pluggable storage especially the pg_upgrade part.  I think
> > if we really want to make it easy for pluggable storage to define
> > this, then we might need to build something along the lines of how to
> > estimate relation size works.
> >
> > See how table_relation_estimate_size is defined and used
> > and TableAmRoutine heapam_methods
> > {
> > ..
> > relation_estimate_size
> > }
>
> That might be the best way for table ams, but I guess we'd still need
> to keep the hard-coding for indexes to always have a FSM. That might
> not be too bad.
>

I also think so.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Wed, May 1, 2019 at 11:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Apr 30, 2019 at 7:52 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > but that seems correct.
> > Sounds better than keeping outdated entries indicating no-space-available.
>
> Agreed, but as mentioned in one of the above emails, I am also bit
> scared that it should not lead to many invalidation messages for small
> relations, so may be we should send the invalidation message only when
> the entire page is empty.

One way would be to send the inval if the new free space is greater
than some percentage of BLCKSZ.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-04-18 14:10:29 -0700, Andres Freund wrote:
> My compromise suggestion would be to try to give John and Amit ~2 weeks
> to come up with a cleanup proposal, and then decide whether to 1) revert
> 2) apply the new patch, 3) decide to live with the warts for 12, and
> apply the patch in 13. As we would already have a patch, 3) seems like
> it'd be more tenable than without.

I think decision time has come. My tentative impression is that we're
not there yet, and should revert the improvements in v12, and apply the
improved version early in v13. As a second choice, we should live with
the current approach, if John and Amit "promise" further effort to clean
this up for v13.

Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Bruce Momjian
Дата:
On Wed, May  1, 2019 at 08:24:25AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2019-04-18 14:10:29 -0700, Andres Freund wrote:
> > My compromise suggestion would be to try to give John and Amit ~2 weeks
> > to come up with a cleanup proposal, and then decide whether to 1) revert
> > 2) apply the new patch, 3) decide to live with the warts for 12, and
> > apply the patch in 13. As we would already have a patch, 3) seems like
> > it'd be more tenable than without.
> 
> I think decision time has come. My tentative impression is that we're
> not there yet, and should revert the improvements in v12, and apply the
> improved version early in v13. As a second choice, we should live with
> the current approach, if John and Amit "promise" further effort to clean
> this up for v13.

My ignorant opinion is that I have been surprised by the churn caused by
this change, and have therefore questioned the value of it.  Frankly,
there has been so much churn I am unclear if it can be easily reverted.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-05-01 11:28:11 -0400, Bruce Momjian wrote:
> On Wed, May  1, 2019 at 08:24:25AM -0700, Andres Freund wrote:
> > Hi,
> >
> > On 2019-04-18 14:10:29 -0700, Andres Freund wrote:
> > > My compromise suggestion would be to try to give John and Amit ~2 weeks
> > > to come up with a cleanup proposal, and then decide whether to 1) revert
> > > 2) apply the new patch, 3) decide to live with the warts for 12, and
> > > apply the patch in 13. As we would already have a patch, 3) seems like
> > > it'd be more tenable than without.
> >
> > I think decision time has come. My tentative impression is that we're
> > not there yet, and should revert the improvements in v12, and apply the
> > improved version early in v13. As a second choice, we should live with
> > the current approach, if John and Amit "promise" further effort to clean
> > this up for v13.
>
> My ignorant opinion is that I have been surprised by the churn caused by
> this change, and have therefore questioned the value of it.

Hm, I don't think there has been that much churn? Sure, there was a
revert to figure out a regression test instability, but that doesn't
seem that bad. Relevant commits in date order are:


andres-classification: cleanup
commit 06c8a5090ed9ec188557a86d4de11384f5128ec0
Author: Amit Kapila <akapila@postgresql.org>
Date:   2019-03-16 06:55:56 +0530

    Improve code comments in b0eaa4c51b.

    Author: John Naylor
    Discussion: https://postgr.es/m/CACPNZCswjyGJxTT=mxHgK=Z=mJ9uJ4WEx_UO=bNwpR_i0EaHHg@mail.gmail.com


andres-classification: incremental improvement
commit 13e8643bfc29d3c1455c0946281cdfc24758ffb6
Author: Amit Kapila <akapila@postgresql.org>
Date:   2019-03-15 08:25:57 +0530

    During pg_upgrade, conditionally skip transfer of FSMs.


andres-classification: additional tests
commit 6f918159a97acf76ee2512e44f5ed5dcaaa0d923
Author: Amit Kapila <akapila@postgresql.org>
Date:   2019-03-12 08:14:28 +0530

    Add more tests for FSM.


andres-classification: cleanup
commit a6e48da08844eeb5a72c8b59dad3aaab6e891fac
Author: Amit Kapila <akapila@postgresql.org>
Date:   2019-03-11 08:16:14 +0530

    Fix typos in commit 8586bf7ed8.


andres-classification: bugfix
commit 9c32e4c35026bd52aaf340bfe7594abc653e42f0
Author: Amit Kapila <akapila@postgresql.org>
Date:   2019-03-01 07:38:47 +0530

    Clear the local map when not used.


andres-classification: docs addition
commit 29d108cdecbe918452e70041d802cc515b2d56b8
Author: Amit Kapila <akapila@postgresql.org>
Date:   2019-02-20 17:37:39 +0530

    Doc: Update the documentation for FSM behavior for small tables.


andres-classification: regression test stability
commit 08ecdfe7e5e0a31efbe1d58fefbe085b53bc79ca
Author: Amit Kapila <akapila@postgresql.org>
Date:   2019-02-04 10:08:29 +0530

    Make FSM test portable.


andres-classification: feature
commit b0eaa4c51bbff3e3c600b11e5d104d6feb9ca77f
Author: Amit Kapila <akapila@postgresql.org>
Date:   2019-02-04 07:49:15 +0530

    Avoid creation of the free space map for small heap relations, take 2.


So sure, there's a few typo fixes, one bugfix, and one buildfarm test
stability issue. Doesn't seem crazy for a nontrivial improvement.


> Frankly, there has been so much churn I am unclear if it can be easily reverted.

Doesn't seem that hard? There's some minor conflicts, but nothing bad?

Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Bruce Momjian
Дата:
On Wed, May  1, 2019 at 09:08:54AM -0700, Andres Freund wrote:
> So sure, there's a few typo fixes, one bugfix, and one buildfarm test
> stability issue. Doesn't seem crazy for a nontrivial improvement.

OK, my ignorant opinion was just based on the length of discussion
threads.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Alvaro Herrera
Дата:
On 2019-May-01, Amit Kapila wrote:

> On Tue, Apr 30, 2019 at 7:52 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> > Hmm ... so, if vacuum runs and frees up any space from any of the pages,
> > then it should send out an invalidation -- it doesn't matter what the
> > FSM had, just that there is more free space now.  That means every other
> > process will need to determine a fresh FSM,
> 
> I think you intend to say the local space map because once FSM is
> created we will send invalidation and we won't further build relcache
> entry having local space map.

Yeah, I mean the map that records free space.

> > but that seems correct.  Sounds better than keeping outdated entries
> > indicating no-space-available.
> 
> Agreed, but as mentioned in one of the above emails, I am also bit
> scared that it should not lead to many invalidation messages for small
> relations, so may be we should send the invalidation message only when
> the entire page is empty.

I don't think that's a concern, is it?  You typically won't be running
multiple vacuums per second, or even multiple vacuums per minute.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Wed, May 1, 2019 at 11:24 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-04-18 14:10:29 -0700, Andres Freund wrote:
> > My compromise suggestion would be to try to give John and Amit ~2 weeks
> > to come up with a cleanup proposal, and then decide whether to 1) revert
> > 2) apply the new patch, 3) decide to live with the warts for 12, and
> > apply the patch in 13. As we would already have a patch, 3) seems like
> > it'd be more tenable than without.
>
> I think decision time has come. My tentative impression is that we're
> not there yet, and should revert the improvements in v12, and apply the
> improved version early in v13. As a second choice, we should live with
> the current approach, if John and Amit "promise" further effort to clean
> this up for v13.

Yes, the revised approach is not currently as mature as the one in
HEAD. It's not ready. Not wanting to attempt Promise Driven
Development, I'd rather revert, and only try again if there's enough
time and interest.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Thu, May 2, 2019 at 3:42 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2019-May-01, Amit Kapila wrote:
>
> > On Tue, Apr 30, 2019 at 7:52 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> > > Hmm ... so, if vacuum runs and frees up any space from any of the pages,
> > > then it should send out an invalidation -- it doesn't matter what the
> > > FSM had, just that there is more free space now.  That means every other
> > > process will need to determine a fresh FSM,
> >
> > I think you intend to say the local space map because once FSM is
> > created we will send invalidation and we won't further build relcache
> > entry having local space map.
>
> Yeah, I mean the map that records free space.
>
> > > but that seems correct.  Sounds better than keeping outdated entries
> > > indicating no-space-available.
> >
> > Agreed, but as mentioned in one of the above emails, I am also bit
> > scared that it should not lead to many invalidation messages for small
> > relations, so may be we should send the invalidation message only when
> > the entire page is empty.
>
> I don't think that's a concern, is it?  You typically won't be running
> multiple vacuums per second, or even multiple vacuums per minute.
>

That's right.  So let's try by adding invalidation call whenever space
is reduced.  Is there a good way to test whether the new invalidation
calls added by this patch has any significant impact?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Thu, May 2, 2019 at 7:36 AM John Naylor <john.naylor@2ndquadrant.com> wrote:
>
> On Wed, May 1, 2019 at 11:24 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2019-04-18 14:10:29 -0700, Andres Freund wrote:
> > > My compromise suggestion would be to try to give John and Amit ~2 weeks
> > > to come up with a cleanup proposal, and then decide whether to 1) revert
> > > 2) apply the new patch, 3) decide to live with the warts for 12, and
> > > apply the patch in 13. As we would already have a patch, 3) seems like
> > > it'd be more tenable than without.
> >
> > I think decision time has come. My tentative impression is that we're
> > not there yet,

You are right that patch is not in committable shape, but the patch to
move the map to relcache is presented and the main work left there is
to review/test and add the invalidation calls as per discussion.   It
is just that I don't want to that in haste leading to some other
problems.  So, that patch should not take too much time and will
resolve the main complaint.  Basically, I was planning to re-post that
patch as the discussion concludes between me and Alvaro and then
probably you can also look into it once to see if that addresses the
main complaint.  There are a few other points for which John has
prepared a patch and that might need some work based on your inputs.

>> and should revert the improvements in v12, and apply the
> > improved version early in v13. As a second choice, we should live with
> > the current approach, if John and Amit "promise" further effort to clean
> > this up for v13.
>
> Yes, the revised approach is not currently as mature as the one in
> HEAD. It's not ready. Not wanting to attempt Promise Driven
> Development, I'd rather revert, and only try again if there's enough
> time and interest.
>

I can certainly help with moving patch (for cleanup) forward.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Tue, Apr 30, 2019 at 11:42 AM John Naylor
<john.naylor@2ndquadrant.com> wrote:
>
> On Fri, Apr 26, 2019 at 11:52 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > As discussed above, we need to issue an
> > invalidation for following points:  (a) when vacuum finds there is no
> > FSM and page has more space now, I think you can detect this in
> > RecordPageWithFreeSpace
>
> I took a brief look and we'd have to know how much space was there
> before. That doesn't seem possible without first implementing the idea
> to save free space locally in the same way the FSM does. Even if we
> have consensus on that, there's no code for it, and we're running out
> of time.
>
> > (b) invalidation to notify the existence of
> > FSM, this can be done both by vacuum and backend.
>
> I still don't claim to be anything but naive in this area, but does
> the attached get us any closer?
>

@@ -776,7 +776,10 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
  if ((rel->rd_smgr->smgr_fsm_nblocks == 0 ||
  rel->rd_smgr->smgr_fsm_nblocks == InvalidBlockNumber) &&
  !smgrexists(rel->rd_smgr, FSM_FORKNUM))
+ {
  smgrcreate(rel->rd_smgr, FSM_FORKNUM, false);
+ fsm_clear_local_map(rel);
+ }

I think this won't be correct because when we call fsm_extend via
vacuum the local map won't be already existing, so it won't issue an
invalidation call.  Isn't it better to directly call
CacheInvalidateRelcache here to notify other backends that their local
maps are invalid now?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Thu, May 2, 2019 at 2:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> @@ -776,7 +776,10 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
>   if ((rel->rd_smgr->smgr_fsm_nblocks == 0 ||
>   rel->rd_smgr->smgr_fsm_nblocks == InvalidBlockNumber) &&
>   !smgrexists(rel->rd_smgr, FSM_FORKNUM))
> + {
>   smgrcreate(rel->rd_smgr, FSM_FORKNUM, false);
> + fsm_clear_local_map(rel);
> + }
>
> I think this won't be correct because when we call fsm_extend via
> vacuum the local map won't be already existing, so it won't issue an
> invalidation call.  Isn't it better to directly call
> CacheInvalidateRelcache here to notify other backends that their local
> maps are invalid now?

Yes, you're quite correct.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Thu, May 2, 2019 at 12:39 PM John Naylor <john.naylor@2ndquadrant.com> wrote:
>
> On Thu, May 2, 2019 at 2:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > @@ -776,7 +776,10 @@ fsm_extend(Relation rel, BlockNumber fsm_nblocks)
> >   if ((rel->rd_smgr->smgr_fsm_nblocks == 0 ||
> >   rel->rd_smgr->smgr_fsm_nblocks == InvalidBlockNumber) &&
> >   !smgrexists(rel->rd_smgr, FSM_FORKNUM))
> > + {
> >   smgrcreate(rel->rd_smgr, FSM_FORKNUM, false);
> > + fsm_clear_local_map(rel);
> > + }
> >
> > I think this won't be correct because when we call fsm_extend via
> > vacuum the local map won't be already existing, so it won't issue an
> > invalidation call.  Isn't it better to directly call
> > CacheInvalidateRelcache here to notify other backends that their local
> > maps are invalid now?
>
> Yes, you're quite correct.
>

Okay,  I have updated the patch to incorporate your changes and call
relcache invalidation at required places. I have updated comments at a
few places as well.  The summarization of this patch is that (a) it
moves the local map to relation cache (b) performs the cache
invalidation whenever we create fsm (either via backend or vacuum),
when some space in a page is freed by vacuum (provided fsm doesn't
exist) or whenever the local map is cleared (c) additionally, we clear
the local map when we found a block from FSM, when we have already
tried all the blocks present in cache or when we are allowed to create
FSM.

If we agree on this, then we can update the README accordingly.

Can you please test/review?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Thu, May 2, 2019 at 4:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, May 2, 2019 at 12:39 PM John Naylor <john.naylor@2ndquadrant.com> wrote:
> >
> Okay,  I have updated the patch to incorporate your changes and call
> relcache invalidation at required places. I have updated comments at a
> few places as well.  The summarization of this patch is that (a) it
> moves the local map to relation cache (b) performs the cache
> invalidation whenever we create fsm (either via backend or vacuum),
> when some space in a page is freed by vacuum (provided fsm doesn't
> exist) or whenever the local map is cleared (c) additionally, we clear
> the local map when we found a block from FSM, when we have already
> tried all the blocks present in cache or when we are allowed to create
> FSM.
>
> If we agree on this, then we can update the README accordingly.
>
> Can you please test/review?

There isn't enough time. But since I already wrote some debugging
calls earlier (attached), I gave it a brief spin, I found this patch
isn't as careful as HEAD making sure we don't try the same block twice
in a row. If you insert enough tuples into an empty table such that we
need to extend, you get something like this:

DEBUG:  Not enough space on block 0
DEBUG:  Now trying block 0
DEBUG:  Not enough space on block 0
DEBUG:  Updating local map for block 0

At this point, I'm sorry to say, but I'm in favor of reverting. There
just wasn't enough time to redesign and debug a feature like this
during feature freeze.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Fri, May 3, 2019 at 11:43 AM John Naylor <john.naylor@2ndquadrant.com> wrote:
> On Thu, May 2, 2019 at 4:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, May 2, 2019 at 12:39 PM John Naylor <john.naylor@2ndquadrant.com> wrote:
> > >
> > Can you please test/review?
>
> There isn't enough time. But since I already wrote some debugging
> calls earlier (attached), I gave it a brief spin, I found this patch
> isn't as careful as HEAD making sure we don't try the same block twice
> in a row. If you insert enough tuples into an empty table such that we
> need to extend, you get something like this:
>
> DEBUG:  Not enough space on block 0
> DEBUG:  Now trying block 0
> DEBUG:  Not enough space on block 0
> DEBUG:  Updating local map for block 0
>
> At this point, I'm sorry to say, but I'm in favor of reverting.
>

Fair enough.  I think we have tried to come up with a patch for an
alternative approach, but it needs time.  I will revert this tomorrow.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Fri, May 3, 2019 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, May 3, 2019 at 11:43 AM John Naylor <john.naylor@2ndquadrant.com> wrote:
>
> Fair enough.  I think we have tried to come up with a patch for an
> alternative approach, but it needs time.  I will revert this tomorrow.
>

Attached is a revert patch.  John, can you please once double-check to
ensure I have not missed anything?

To summarize for everyone:  This patch avoids the fsm creation for
small relations (which is a small but good improvement as it saves
space).   This patch was using a process local map to track the first
few blocks and was reset as soon as we get the block with enough free
space.   It was discussed in this thread that it would be better to
track the local map in relcache and then invalidate it whenever vacuum
frees up space in the page or when FSM is created.   There is a
prototype patch written for the same, but it is not 100% clear to me
that the new idea would be a win in all cases (like code complexity or
API design-wise) especially because resetting the map is not
straight-forward.  As time was not enough, we couldn't complete the
patch from all aspects to see if it is really better in all cases.

We have two options (a) revert this patch and try the new approach in
next release, (b) keep the current patch and replace with the new
approach if it turns out to be better in next release.

So, do we want to keep this feature for this release?

I am fine going with option (a), that's why I have prepared a revert
patch, but I have a slight fear that the other option might not turn
out to be better and even if it is then we can anyway replace it as
shown in the prototype, so going with option (b) doesn't sound to be
dumb.

Anybody else wants to weigh in?




--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
John Naylor
Дата:
On Sat, May 4, 2019 at 5:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Attached is a revert patch.  John, can you please once double-check to
> ensure I have not missed anything?

Looks complete to me.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Sat, May 4, 2019 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, May 3, 2019 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, May 3, 2019 at 11:43 AM John Naylor <john.naylor@2ndquadrant.com> wrote:
>
> Attached is a revert patch.  John, can you please once double-check to
> ensure I have not missed anything?
>
> To summarize for everyone:  This patch avoids the fsm creation for
> small relations (which is a small but good improvement as it saves
> space).   This patch was using a process local map to track the first
> few blocks and was reset as soon as we get the block with enough free
> space.   It was discussed in this thread that it would be better to
> track the local map in relcache and then invalidate it whenever vacuum
> frees up space in the page or when FSM is created.   There is a
> prototype patch written for the same, but it is not 100% clear to me
> that the new idea would be a win in all cases (like code complexity or
> API design-wise) especially because resetting the map is not
> straight-forward.  As time was not enough, we couldn't complete the
> patch from all aspects to see if it is really better in all cases.
>
> We have two options (a) revert this patch and try the new approach in
> next release, (b) keep the current patch and replace with the new
> approach if it turns out to be better in next release.
>
> So, do we want to keep this feature for this release?
>
> I am fine going with option (a), that's why I have prepared a revert
> patch, but I have a slight fear that the other option might not turn
> out to be better and even if it is then we can anyway replace it as
> shown in the prototype, so going with option (b) doesn't sound to be
> dumb.
>
> Anybody else wants to weigh in?
>

I understand that we have to take a call here shortly, but as there is
a weekend so I would like to wait for another day to see if anyone
else wants to share his opinion.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-05-05 18:55:30 +0530, Amit Kapila wrote:
> On Sat, May 4, 2019 at 2:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, May 3, 2019 at 2:14 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I am fine going with option (a), that's why I have prepared a revert
> > patch, but I have a slight fear that the other option might not turn
> > out to be better and even if it is then we can anyway replace it as
> > shown in the prototype, so going with option (b) doesn't sound to be
> > dumb.

I don't think we realistically can "anyway replace it as shown in the
prototype" - especially not if we discover we'd need to do so after (or
even close) to 12's release.


> I understand that we have to take a call here shortly, but as there is
> a weekend so I would like to wait for another day to see if anyone
> else wants to share his opinion.

I still think that's the right course. I've previously stated that, so
I'm probably not fulfilling the "anyone else" criterion though.

Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-05 18:55:30 +0530, Amit Kapila wrote:
>> I understand that we have to take a call here shortly, but as there is
>> a weekend so I would like to wait for another day to see if anyone
>> else wants to share his opinion.

> I still think that's the right course. I've previously stated that, so
> I'm probably not fulfilling the "anyone else" criterion though.

I also prefer "revert and try again in v13", but I'm not "anyone else"
either ...

            regards, tom lane



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Robert Haas
Дата:
On Sun, May 5, 2019 at 9:25 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> I understand that we have to take a call here shortly, but as there is
> a weekend so I would like to wait for another day to see if anyone
> else wants to share his opinion.

I haven't looked deeply into the issues with this patch, but it seems
to me that if two other committers is saying that this should be
reverted, and the original author of the patch is agreeing, and your
patch to try to fix it still has demonstrable bugs ... it's time to
give up.  We're well past feature freeze at this point.

Some other random comments:

I'm really surprised that the original design of this patch involved
storing state in global variables.  That seems like a pretty poor
decision.  This is properly per-relation information, and any approach
like that isn't going to work well when there are multiple relations
involved, unless the information is only being used for a single
attempt to find a free page, in which case it should use parameters
and function-local variables, not globals.

I think it's legitimate to question whether sending additional
invalidation messages as part of the design of this feature is a good
idea.  If it happens frequently, it could trigger expensive sinval
resets more often.  I don't understand the various proposals well
enough to know whether that's really a problem, but if you've got a
lot of relations for which this optimization is in use, I'm not sure I
see why it couldn't be.

I think at some point it was proposed that, since an FSM access
involves touching 3 blocks, it ought to be fine for any relation of 4
or fewer blocks to just check all the others.  I don't really
understand why we drifted off that design principle, because it seems
like a reasonable theory.  Such an approach doesn't require anything
in the relcache, any global variables, or an every-other-page
algorithm.

If we wanted to avoid creating the FSM for relation with >4 blocks, we
might want to take a step back and think about a whole different
approach.  For instance, we could have a cache in shared memory that
can store N entries, and not bother creating FSM forks until things no
longer fit in that cache.  Or, what might be better, we could put FSM
data for many relations into a single FSM file, instead of having a
separate fork for each relation.  I think that would get at what's
really driving this work: having a zillion tiny little FSM files
sucks.  Of course, those kinds of changes are far too big to
contemplate for v12, but they might be worth some thought in the
future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-05-06 11:10:15 -0400, Robert Haas wrote:
> I'm really surprised that the original design of this patch involved
> storing state in global variables.  That seems like a pretty poor
> decision.  This is properly per-relation information, and any approach
> like that isn't going to work well when there are multiple relations
> involved, unless the information is only being used for a single
> attempt to find a free page, in which case it should use parameters
> and function-local variables, not globals.

I'm was too.


> I think it's legitimate to question whether sending additional
> invalidation messages as part of the design of this feature is a good
> idea.  If it happens frequently, it could trigger expensive sinval
> resets more often.  I don't understand the various proposals well
> enough to know whether that's really a problem, but if you've got a
> lot of relations for which this optimization is in use, I'm not sure I
> see why it couldn't be.

I don't think it's an actual problem. We'd only do so when creating an
FSM, or when freeing up additional space that'd otherwise not be visible
to other backends. The alternative to sinval would thus be a) not
discovering there's free space and extending the relation b) checking
disk state for a new FSM all the time. Which are much more expensive.


> I think at some point it was proposed that, since an FSM access
> involves touching 3 blocks, it ought to be fine for any relation of 4
> or fewer blocks to just check all the others.  I don't really
> understand why we drifted off that design principle, because it seems
> like a reasonable theory.  Such an approach doesn't require anything
> in the relcache, any global variables, or an every-other-page
> algorithm.

It's not that cheap to touch three heap blocks every time a new target
page is needed. Requires determining at least the target relation size
or the existance of the FSM fork.

We'll also commonly *not* end up touching 3 blocks in the FSM -
especially when there's actually no free space. And the FSM contents are
much less contended than the heap pages - the hot paths don't update the
FSM, and if so, the exclusive locks are held for a very short time only.

Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Robert Haas
Дата:
On Mon, May 6, 2019 at 11:27 AM Andres Freund <andres@anarazel.de> wrote:
> > I think it's legitimate to question whether sending additional
> > invalidation messages as part of the design of this feature is a good
> > idea.  If it happens frequently, it could trigger expensive sinval
> > resets more often.  I don't understand the various proposals well
> > enough to know whether that's really a problem, but if you've got a
> > lot of relations for which this optimization is in use, I'm not sure I
> > see why it couldn't be.
>
> I don't think it's an actual problem. We'd only do so when creating an
> FSM, or when freeing up additional space that'd otherwise not be visible
> to other backends. The alternative to sinval would thus be a) not
> discovering there's free space and extending the relation b) checking
> disk state for a new FSM all the time. Which are much more expensive.

None of that addresses the question of the distributed cost of sending
more sinval messages.  If you have a million little tiny relations and
VACUUM goes through and clears one tuple out of each one, it will be
spewing sinval messages really, really fast.  How can that fail to
threaten extra sinval resets?

> > I think at some point it was proposed that, since an FSM access
> > involves touching 3 blocks, it ought to be fine for any relation of 4
> > or fewer blocks to just check all the others.  I don't really
> > understand why we drifted off that design principle, because it seems
> > like a reasonable theory.  Such an approach doesn't require anything
> > in the relcache, any global variables, or an every-other-page
> > algorithm.
>
> It's not that cheap to touch three heap blocks every time a new target
> page is needed. Requires determining at least the target relation size
> or the existance of the FSM fork.
>
> We'll also commonly *not* end up touching 3 blocks in the FSM -
> especially when there's actually no free space. And the FSM contents are
> much less contended than the heap pages - the hot paths don't update the
> FSM, and if so, the exclusive locks are held for a very short time only.

Well, that seems like an argument that we just shouldn't do this at
all.  If the FSM is worthless for small relations, then eliding it
makes sense.  But if having it is valuable even when the relation is
tiny, then eliding it is the wrong thing to do, isn't it?  The
underlying concerns that prompted this patch probably have to do with
either [1] not wanting to have so many FSM forks on disk or [2] not
wanting to consume 24kB of space to track free space for a relation
that may be only 8kB.  I think those goals are valid, but if we accept
your argument then this is the wrong way to achieve them.

I do find it a bit surprising that touching heap pages would be all
that much more expensive than touching FSM pages, but that doesn't
mean that it isn't the case.  I would also note that this algorithm
ought to beat the FSM algorithm in many cases where there IS space
available, because you'll often find some usable free space on the
very first page you try, which will never happen with the FSM.  The
case where the pages are all full doesn't seem very important, because
I don't see how you can stay in that situation for all that long.
Each time it happens, the relation grows by a block immediately
afterwards, and once it hits 5 blocks, it never happens again.  I
guess you could incur the overhead repeatedly if the relation starts
out at 1 block, grows to 4, is vacuumed back down to 1, lather, rinse,
repeat, but is that actually realistic?  It requires all the live
tuples to live in block 0 at the beginning of each vacuum cycle, which
seems like a fringe outcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> ... I guess you could incur the overhead repeatedly if the relation starts
> out at 1 block, grows to 4, is vacuumed back down to 1, lather, rinse,
> repeat, but is that actually realistic?

While I've not studied the patch, I assumed that once a relation has an
FSM it won't disappear.  Making it go away again if the relation gets
shorter seems both fairly useless and a promising source of bugs.

            regards, tom lane



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Robert Haas
Дата:
On Mon, May 6, 2019 at 12:05 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > ... I guess you could incur the overhead repeatedly if the relation starts
> > out at 1 block, grows to 4, is vacuumed back down to 1, lather, rinse,
> > repeat, but is that actually realistic?
>
> While I've not studied the patch, I assumed that once a relation has an
> FSM it won't disappear.  Making it go away again if the relation gets
> shorter seems both fairly useless and a promising source of bugs.

Right, I think so too.  That's not what I as going for, though.  I was
trying to discuss a scenario where the relation repeatedly grows,
never reaching the size at which the FSM would be created, and then is
repeatedly truncated again.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-05-06 11:52:12 -0400, Robert Haas wrote:
> On Mon, May 6, 2019 at 11:27 AM Andres Freund <andres@anarazel.de> wrote:
> > > I think it's legitimate to question whether sending additional
> > > invalidation messages as part of the design of this feature is a good
> > > idea.  If it happens frequently, it could trigger expensive sinval
> > > resets more often.  I don't understand the various proposals well
> > > enough to know whether that's really a problem, but if you've got a
> > > lot of relations for which this optimization is in use, I'm not sure I
> > > see why it couldn't be.
> >
> > I don't think it's an actual problem. We'd only do so when creating an
> > FSM, or when freeing up additional space that'd otherwise not be visible
> > to other backends. The alternative to sinval would thus be a) not
> > discovering there's free space and extending the relation b) checking
> > disk state for a new FSM all the time. Which are much more expensive.
> 
> None of that addresses the question of the distributed cost of sending
> more sinval messages.  If you have a million little tiny relations and
> VACUUM goes through and clears one tuple out of each one, it will be
> spewing sinval messages really, really fast.  How can that fail to
> threaten extra sinval resets?

Vacuum triggers sinval messages already (via the pg_class update),
shouldn't be too hard to ensure that there's no duplicate ones in this
case.


> > > I think at some point it was proposed that, since an FSM access
> > > involves touching 3 blocks, it ought to be fine for any relation of 4
> > > or fewer blocks to just check all the others.  I don't really
> > > understand why we drifted off that design principle, because it seems
> > > like a reasonable theory.  Such an approach doesn't require anything
> > > in the relcache, any global variables, or an every-other-page
> > > algorithm.
> >
> > It's not that cheap to touch three heap blocks every time a new target
> > page is needed. Requires determining at least the target relation size
> > or the existance of the FSM fork.
> >
> > We'll also commonly *not* end up touching 3 blocks in the FSM -
> > especially when there's actually no free space. And the FSM contents are
> > much less contended than the heap pages - the hot paths don't update the
> > FSM, and if so, the exclusive locks are held for a very short time only.
> 
> Well, that seems like an argument that we just shouldn't do this at
> all.  If the FSM is worthless for small relations, then eliding it
> makes sense.  But if having it is valuable even when the relation is
> tiny, then eliding it is the wrong thing to do, isn't it?

Why? The problem with the entirely stateless proposal is just that we'd
do that every single time we need new space. If we amortize that cost
across multiple insertions, I don't think there's a problem?


> I do find it a bit surprising that touching heap pages would be all
> that much more expensive than touching FSM pages, but that doesn't
> mean that it isn't the case.  I would also note that this algorithm
> ought to beat the FSM algorithm in many cases where there IS space
> available, because you'll often find some usable free space on the
> very first page you try, which will never happen with the FSM.

Note that without additional state we do not *know* that the heap is 5
pages long, we have to do an smgrnblocks() - which is fairly
expensive. That's precisely why I want to keep state about a
non-existant FSM in the relcache, and why'd need sinval messages to
clear that. So we don't incur unnecessary syscalls when there's free
space.

I completely agree that avoiding the FSM for the small-rels case has the
potential to be faster, if we're not too naive about it. I think that
means

1) no checking of on-disk state for relation fork existance/sizes every
   time looking up a page with free space
2) not re-scanning pages when we should know they're full (because we
   scanned them for the last target page, in a previous insert)
3) ability to recognize concurrently freed space


> The case where the pages are all full doesn't seem very important,
> because I don't see how you can stay in that situation for all that
> long. Each time it happens, the relation grows by a block immediately
> afterwards, and once it hits 5 blocks, it never happens again.

> I guess you could incur the overhead repeatedly if the relation starts
> out at 1 block, grows to 4, is vacuumed back down to 1, lather, rinse,
> repeat, but is that actually realistic?  It requires all the live
> tuples to live in block 0 at the beginning of each vacuum cycle, which
> seems like a fringe outcome.

I think it's much more likely to be encountered when there's a lot of
churn on a small table, but HOT pruning removes just about all the
superflous space on a regular basis. Then the relation might actually
never get > 4 blocks.

Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Robert Haas
Дата:
On Mon, May 6, 2019 at 12:18 PM Andres Freund <andres@anarazel.de> wrote:
> > None of that addresses the question of the distributed cost of sending
> > more sinval messages.  If you have a million little tiny relations and
> > VACUUM goes through and clears one tuple out of each one, it will be
> > spewing sinval messages really, really fast.  How can that fail to
> > threaten extra sinval resets?
>
> Vacuum triggers sinval messages already (via the pg_class update),
> shouldn't be too hard to ensure that there's no duplicate ones in this
> case.

Yeah, if we can piggyback on the existing messages, then we can be
confident that we're not increasing the chances of sinval resets.

> > Well, that seems like an argument that we just shouldn't do this at
> > all.  If the FSM is worthless for small relations, then eliding it
> > makes sense.  But if having it is valuable even when the relation is
> > tiny, then eliding it is the wrong thing to do, isn't it?
>
> Why? The problem with the entirely stateless proposal is just that we'd
> do that every single time we need new space. If we amortize that cost
> across multiple insertions, I don't think there's a problem?

Hmm, I see.

> Note that without additional state we do not *know* that the heap is 5
> pages long, we have to do an smgrnblocks() - which is fairly
> expensive. That's precisely why I want to keep state about a
> non-existant FSM in the relcache, and why'd need sinval messages to
> clear that. So we don't incur unnecessary syscalls when there's free
> space.

Makes sense.

> > I guess you could incur the overhead repeatedly if the relation starts
> > out at 1 block, grows to 4, is vacuumed back down to 1, lather, rinse,
> > repeat, but is that actually realistic?  It requires all the live
> > tuples to live in block 0 at the beginning of each vacuum cycle, which
> > seems like a fringe outcome.
>
> I think it's much more likely to be encountered when there's a lot of
> churn on a small table, but HOT pruning removes just about all the
> superflous space on a regular basis. Then the relation might actually
> never get > 4 blocks.

Yeah, but if it leaves behind any tuples in block #3, the relation
will never be truncated.  You can't repeatedly hit the
all-blocks-are-full case without repeatedly extending the relation,
and you can't repeatedly extend the relation without getting beyond 4
blocks unless you are also truncating it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Mon, May 6, 2019 at 8:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-05-05 18:55:30 +0530, Amit Kapila wrote:
> >> I understand that we have to take a call here shortly, but as there is
> >> a weekend so I would like to wait for another day to see if anyone
> >> else wants to share his opinion.
>
> > I still think that's the right course. I've previously stated that, so
> > I'm probably not fulfilling the "anyone else" criterion though.
>
> I also prefer "revert and try again in v13", but I'm not "anyone else"
> either ...
>

Fine, I will take care of that today.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Amit Kapila
Дата:
On Mon, May 6, 2019 at 8:57 PM Andres Freund <andres@anarazel.de> wrote:
> On 2019-05-06 11:10:15 -0400, Robert Haas wrote:
>
> > I think it's legitimate to question whether sending additional
> > invalidation messages as part of the design of this feature is a good
> > idea.  If it happens frequently, it could trigger expensive sinval
> > resets more often.  I don't understand the various proposals well
> > enough to know whether that's really a problem, but if you've got a
> > lot of relations for which this optimization is in use, I'm not sure I
> > see why it couldn't be.
>
> I don't think it's an actual problem. We'd only do so when creating an
> FSM, or when freeing up additional space that'd otherwise not be visible
> to other backends.
>

The other place we need to consider for this is when one of the
backends updates its map (due to unavailability of space in the
existing set of pages).  We can choose not to send invalidation in
this case, but then different backends need to identify the same thing
themselves and reconstruct the map again.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Tom Lane
Дата:
Amit Kapila <amit.kapila16@gmail.com> writes:
> On Mon, May 6, 2019 at 8:57 PM Andres Freund <andres@anarazel.de> wrote:
>> On 2019-05-06 11:10:15 -0400, Robert Haas wrote:
>>> I think it's legitimate to question whether sending additional
>>> invalidation messages as part of the design of this feature is a good
>>> idea.

>> I don't think it's an actual problem. We'd only do so when creating an
>> FSM, or when freeing up additional space that'd otherwise not be visible
>> to other backends.

> The other place we need to consider for this is when one of the
> backends updates its map (due to unavailability of space in the
> existing set of pages).  We can choose not to send invalidation in
> this case, but then different backends need to identify the same thing
> themselves and reconstruct the map again.

I'm inclined to wonder why bother with invals at all.  The odds are
quite good that no other backend will care (which, I imagine, is the
reasoning behind why the original patch was designed like it was).
A table that has a lot of concurrent write activity on it is unlikely
to stay small enough to not have a FSM for long.

The approach I'm imagining here is not too different from Robert's
"just search the table's pages every time" straw-man.  Backends would
cache the results of their own searches, but not communicate about it.

            regards, tom lane



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-05-07 09:34:42 -0400, Tom Lane wrote:
> I'm inclined to wonder why bother with invals at all.  The odds are
> quite good that no other backend will care (which, I imagine, is the
> reasoning behind why the original patch was designed like it was).
> A table that has a lot of concurrent write activity on it is unlikely
> to stay small enough to not have a FSM for long.

But when updating the free space for the first four blocks, we're going
to either have to do an smgrexists() to check whether somebody
concurrently created the FSM, or we might not update an existing FSM. An
inval seems much better.

Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-07 09:34:42 -0400, Tom Lane wrote:
>> I'm inclined to wonder why bother with invals at all.

> But when updating the free space for the first four blocks, we're going
> to either have to do an smgrexists() to check whether somebody
> concurrently created the FSM, or we might not update an existing FSM. An
> inval seems much better.

I do not think sinval messaging is going to be sufficient to avoid that
problem.  sinval is only useful to tell you about changes if you first
take a lock strong enough to guarantee that no interesting change is
happening while you hold the lock.  We are certainly not going to let
writes take an exclusive lock, so I don't see how we could be certain
that we've seen an sinval message telling us about FSM status change.

This seems tied into the idea we've occasionally speculated about
of tracking relation sizes in shared memory to avoid lseek calls.
If we could solve the problems around that, it'd provide a cheap(er)
way to detect whether an FSM should exist or not.

A different way of looking at it is that the FSM data is imprecise
by definition, therefore it doesn't matter that much if some backend
is slow to realize that the FSM exists.  That still doesn't lead me
to think that sinval messaging is a component of the solution though.

            regards, tom lane



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-05-07 12:04:11 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-05-07 09:34:42 -0400, Tom Lane wrote:
> >> I'm inclined to wonder why bother with invals at all.
> 
> > But when updating the free space for the first four blocks, we're going
> > to either have to do an smgrexists() to check whether somebody
> > concurrently created the FSM, or we might not update an existing FSM. An
> > inval seems much better.
> 
> I do not think sinval messaging is going to be sufficient to avoid that
> problem.  sinval is only useful to tell you about changes if you first
> take a lock strong enough to guarantee that no interesting change is
> happening while you hold the lock.  We are certainly not going to let
> writes take an exclusive lock, so I don't see how we could be certain
> that we've seen an sinval message telling us about FSM status change.

Sure, but it'll be pretty darn close, rather than there basically not
being any limit except backend lifetime to how long we might not notice
that we'd need to switch to the on-disk FSM.


> This seems tied into the idea we've occasionally speculated about
> of tracking relation sizes in shared memory to avoid lseek calls.
> If we could solve the problems around that, it'd provide a cheap(er)
> way to detect whether an FSM should exist or not.

I'm back on working on a patch that provides that, FWIW. Not yet really
spending time on it, but I've re-skimmed through the code I'd previously
written.

Greetings,

Andres Freund



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2019-05-07 12:04:11 -0400, Tom Lane wrote:
>> I do not think sinval messaging is going to be sufficient to avoid that
>> problem.  sinval is only useful to tell you about changes if you first
>> take a lock strong enough to guarantee that no interesting change is
>> happening while you hold the lock.  We are certainly not going to let
>> writes take an exclusive lock, so I don't see how we could be certain
>> that we've seen an sinval message telling us about FSM status change.

> Sure, but it'll be pretty darn close, rather than there basically not
> being any limit except backend lifetime to how long we might not notice
> that we'd need to switch to the on-disk FSM.

Why do you think there's no limit?  We ordinarily do
RelationGetNumberOfBlocks at least once per query on a table, and
I should think we could fix things so that a "free" side-effect of
that is to get the relcache entry updated with whether an FSM
ought to exist or not.  So I think at worst we'd be one query behind.

            regards, tom lane



Re: Unhappy about API changes in the no-fsm-for-small-rels patch

От
Andres Freund
Дата:
Hi,

On 2019-05-07 12:12:37 -0400, Tom Lane wrote:
> Why do you think there's no limit?  We ordinarily do
> RelationGetNumberOfBlocks at least once per query on a table

Well, for the main fork. Which already could have shrunk below the size
that led the FSM to be created. And we only do
RelationGetNumberOfBlocks() when planning, right? Not when using
prepared statements.

Greetings,

Andres Freund