Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

Список
Период
Сортировка
От Andres Freund
Тема Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits
Дата
Msg-id 20190408010427.4l63qr7h2fjcyp77@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Andres Freund)
Ответы Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Andres Freund)
Список pgsql-hackers
Дерево обсуждения
COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Pavan Deolasee, )
 Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Kuntal Ghosh, )
  Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Simon Riggs, )
   Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Kuntal Ghosh, )
 Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Jeff Janes, )
  Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Pavan Deolasee, )
 Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Masahiko Sawada, )
  Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Pavan Deolasee, )
   Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Masahiko Sawada, )
    Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Pavan Deolasee, )
     Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Masahiko Sawada, )
      Re: Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (David Steele, )
      Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Pavan Deolasee, )
       Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Darafei Praliaskouski, )
       Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Masahiko Sawada, )
        Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Pavan Deolasee, )
 Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Pavan Deolasee, )
  Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Tomas Vondra, )
  Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Andres Freund, )
   Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Alvaro Herrera, )
    Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Andres Freund, )
     Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Andres Freund, )
      Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Alvaro Herrera, )
       Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Andres Freund, )
        Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Pavan Deolasee, )
       Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Andres Freund, )
        Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Pavan Deolasee, )
         Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Andres Freund, )
        Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Tom Lane, )
         Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Andres Freund, )
          Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Andres Freund, )
           Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Andres Freund, )
            Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Pavan Deolasee, )
             Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Pavan Deolasee, )
              Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Amit Kapila, )
               Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Ibrar Ahmed, )
         Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Darafei "Komяpa" Praliaskouski, )
          Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Andres Freund, )
       Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Ibrar Ahmed, )
        Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Ibrar Ahmed, )
         Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Daniel Gustafsson, )
          Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Anastasia Lubennikova, )
           Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Robert Haas, )
            Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Anastasia Lubennikova, )
             Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Ibrar Ahmed, )
              Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Hamid Akhtar, )
               Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Ibrar Ahmed, )
              Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Alvaro Herrera, )
               Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Anastasia Lubennikova, )
                Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Ibrar Ahmed, )
                 Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Anastasia Lubennikova, )
                  Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Ibrar Ahmed, )
                   Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Anastasia Lubennikova, )
                    Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Tatsuo Ishii, )
                     Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Tomas Vondra, )
                      Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Anastasia Lubennikova, )
 Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Tomas Vondra, )
  Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Anastasia Lubennikova, )
   Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Tomas Vondra, )
    Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Anastasia Lubennikova, )
     Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Tomas Vondra, )
      Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Anastasia Lubennikova, )
       Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Tomas Vondra, )
        Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Tomas Vondra, )
         Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Tatsuo Ishii, )
         Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits  (Pavan Deolasee, )
Hi,

On 2019-04-04 21:04:49 -0700, Andres Freund wrote:
> On 2019-04-04 23:57:58 -0400, Tom Lane wrote:
> > Andres Freund <> writes:
> > > I think the right approach would be to do all of this in heap_insert and
> > > heap_multi_insert. Whenever starting to work on a page, if INSERT_FROZEN
> > > is specified, remember whether it is either currently empty, or is
> > > already marked as all-visible. If previously empty, mark it as all
> > > visible at the end. If already all visible, there's no need to change
> > > that. If not yet all-visible, no need to do anything, since it can't
> > > have been inserted with COPY FREEZE.  Do you see any problem doing it
> > > that way?
> > 
> > Do we want to add overhead to these hot-spot routines for this purpose?
> 
> For heap_multi_insert I can't see it being a problem - it's only used
> from copy.c, and the cost should be "smeared" over many tuples.  I'd
> assume that compared with locking a page, WAL logging, etc, it'd not
> even meaningfully show up for heap_insert. Especially because we already
> have codepaths for options & HEAP_INSERT_FROZEN in
> heap_prepare_insert(), and I'd assume those could be combined.
> 
> I think we should measure it, but I don't think that one or two
> additional, very well predictd, branches are going to be measurable in
> in those routines.
> 
> The patch, as implemented, has modifications in
> RelationGetBufferForTuple(), that seems like they'd be more expensive.

Here's a *prototype* patch for this.  It only implements what I
described for heap_multi_insert, not for plain inserts. I wanted to see
what others think before investing additional time.

I don't think it's possible to see the overhead of the changed code in
heap_multi_insert(), and probably - with less confidence - that it's
also going to be ok for heap_insert(). But we gotta measure that.


This avoids an extra WAL record for setting empty pages to all visible,
by adding XLH_INSERT_ALL_VISIBLE_SET & XLH_INSERT_ALL_FROZEN_SET, and
setting those when appropriate in heap_multi_insert.  Unfortunately
currently visibilitymap_set() doesn't really properly allow to do this,
as it has embedded WAL logging for heap.

I think we should remove the WAL logging from visibilitymap_set(), and
move it to a separate, heap specific, function. Right now different
tableams e.g. would have to reimplement visibilitymap_set(), so that's a
second need to separate that functionality.  Let me try to come up with
a proposal.


The patch currently does a vmbuffer_pin() while holding an exclusive
lwlock on the page. That's something we normally try to avoid - but I
think it's probably OK here, because INSERT_FROZEN can only be used to
insert into a new relfilenode (which thus no other session can see). I
think it's preferrable to have this logic in specific to the
INSERT_FROZEN path, rather than adding nontrivial complications to
RelationGetBufferForTuple().

I noticed that, before this patch, we do a
    if (vmbuffer != InvalidBuffer)
        ReleaseBuffer(vmbuffer);
after every filled page - that doesn't strike me as particularly smart -
it's pretty likely that the next heap page to be filled is going to be
on the same vm page as the previous iteration.


I noticed one small oddity that I think is common to all the approaches
presented in this thread so far. After

BEGIN;
TRUNCATE foo;
COPY foo(i) FROM '/tmp/foo' WITH FREEZE;
COPY foo(i) FROM '/tmp/foo' WITH FREEZE;
COPY foo(i) FROM '/tmp/foo' WITH FREEZE;
COMMIT;

we currently end up with pages like:
┌───────┬───────────┬──────────┬───────┬───────┬───────┬─────────┬──────────┬─────────┬───────────┐
│ blkno │    lsn    │ checksum │ flags │ lower │ upper │ special │ pagesize │ version │ prune_xid │
├───────┼───────────┼──────────┼───────┼───────┼───────┼─────────┼──────────┼─────────┼───────────┤
│     0 │ 0/50B5488 │        0 │     4 │   928 │   960 │    8192 │     8192 │       4 │         0 │
│     1 │ 0/50B6360 │        0 │     4 │   928 │   960 │    8192 │     8192 │       4 │         0 │
│     2 │ 0/50B71B8 │        0 │     4 │   928 │   960 │    8192 │     8192 │       4 │         0 │
│     3 │ 0/50B8028 │        0 │     4 │   928 │   960 │    8192 │     8192 │       4 │         0 │
│     4 │ 0/50B8660 │        0 │     4 │   408 │  5120 │    8192 │     8192 │       4 │         0 │
│     5 │ 0/50B94B8 │        0 │     4 │   928 │   960 │    8192 │     8192 │       4 │         0 │
│     6 │ 0/50BA328 │        0 │     4 │   928 │   960 │    8192 │     8192 │       4 │         0 │
│     7 │ 0/50BB180 │        0 │     4 │   928 │   960 │    8192 │     8192 │       4 │         0 │
│     8 │ 0/50BBFD8 │        0 │     4 │   928 │   960 │    8192 │     8192 │       4 │         0 │
│     9 │ 0/50BCF88 │        0 │     4 │   928 │   960 │    8192 │     8192 │       4 │         0 │
│    10 │ 0/50BDDE0 │        0 │     4 │   928 │   960 │    8192 │     8192 │       4 │         0 │
│    11 │ 0/50BEC50 │        0 │     4 │   928 │   960 │    8192 │     8192 │       4 │         0 │
│    12 │ 0/50BFAA8 │        0 │     4 │   928 │   960 │    8192 │     8192 │       4 │         0 │
│    13 │ 0/50C06F8 │        0 │     4 │   792 │  2048 │    8192 │     8192 │       4 │         0 │
└───────┴───────────┴──────────┴───────┴───────┴───────┴─────────┴──────────┴─────────┴───────────┘
(14 rows)

Note how block 4 has more space available. That's because the
visibilitymap_pin() called in the first COPY has to vm_extend(), which
in turn does:

    /*
     * Send a shared-inval message to force other backends to close any smgr
     * references they may have for this rel, which we are about to change.
     * This is a useful optimization because it means that backends don't have
     * to keep checking for creation or extension of the file, which happens
     * infrequently.
     */
    CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode);

which invalidates ->rd_smgr->smgr_targblock *after* the first COPY,
because that's when the pending smgr invalidations are sent out.  That's
far from great, but it doesn't seem to be this patch's fault.

It seems to me we need a separate invalidation that doesn't close the
whole smgr relation, but just invalidates the VM specific fields.

Greetings,

Andres Freund

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: reloption to prevent VACUUM from truncating empty pages at theend of relation
Следующее
От: Sergei Kornilov
Дата:
Сообщение: Re: change password_encryption default to scram-sha-256?