Обсуждение: GIN non-intrusive vacuum of posting tree

Поиск
Список
Период
Сортировка

GIN non-intrusive vacuum of posting tree

От
Andrew Borodin
Дата:
Hi hackers!

Here is a patch with more concurrency-friendly vacuum of GIN.

===Short problem statement===
Currently GIN vacuum during cleaning posting tree can lock this tree
for a long time without real need.

===Problem statement===
Vacuum of posting tree (function ginVacuumPostingTree() in ginvacuum.c
) is doing two passes through posting tree:

1. ginVacuumPostingTreeLeaves() takes LockBufferForCleanup,
effectively excluding all inserts. Then it traverses down trough tree
taking exclusive lock, effectively excluding all reads. On leaf level
it calls ginVacuumPostingTreeLeaf(), which deletes all dead tuples. If
there are any empty page, root lock is not relaesed, it passes to
stage two.

Interim: vacuum_delay_point(), which can hand for a while, holding
CleanupLock on root.

2. If there are any empty pages, ginScanToDelete() scans through tree,
deleting empty lead pages, then deleting empty inner pages, if they
emerged after leaf page deletion.


===Proposed algorithm===
ginVacuumPostingTreeLeaves() takes SHARED locks on inner pages and
EXCLUSIVE locks on leaf pages. On leaf pages it calls
ginVacuumPostingTreeLeaf().

If ginVacuumPostingTreeLeaves() encounters subtree with all leafs
empty, then it takes LockBufferForCleanup() on page P pointing to
empty subtree. After taking lock on P, ginScanToDelete() is called for
P. For every parent of P we can guarantee that this procedure will not
be necessary: if P was empty subtree itself, we would pass this
procedure to parent (unless P is root, than behavior is effectively
equal to before-patched).


===Testing===
This patch solved a problem encountered by Evgeniy Efimkin and
Vladimir Borodin from Yandex.Mail.
They implemented testbed with GIN index

CREATE TABLE BOX (uid bigint,
                  lids integer[] NOT NULL
                  CHECK (array_ndims(lids) = 1));

CREATE OR REPLACE FUNCTION ulids(
    i_uid bigint,
    i_lids integer[]
) RETURNS bigint[] AS $$
    SELECT array_agg((i_uid << 32) | lid)
      FROM unnest(i_lids) lid;
$$ LANGUAGE SQL IMMUTABLE STRICT;

CREATE INDEX i_box_uid_lids
    ON box USING gin (ulids(uid, lids)) WITH (FASTUPDATE=OFF);

Executing by a pgbench following query

\setrandom uid 1 1500000
\setrandom lid 1 1000
\setrandom lid2 1 1000
\setrandom lid3 1 1000
BEGIN;
insert into box values(:uid,'{:lid,:lid2,:lid3}');
insert into box values(:uid,'{}');
END;

and eventually deleting some of data. This testbed showed VACUUM
holding inserts for up to tenths of seconds. They claim that proposed
patch made vacuum invisible in this test.

Evgeniy, Vladimir, if I missed something or you have something to add,
please join discussion.

===Known issues===
1. Probably, there exists better algorithms. I could not adapt B-tree
vacuum wizardy to GIN, but that does not mean it is impossible.

2. There may be more CleanUp locks taken. Under write-dense scenarios,
this can lead to longer vacuum, but it should not consume more
resources, just wait.

3. I have changed locks sequence during page deleteion. I think that
it is safe, but comments there were in fear of inserts (excluded by
CleanupLock). More details in a code of ginDeletePage().

4. ginVacuumPostingTreeLeaves() traverses children pages after release
of parent lock (event SHARED). This is safe if there is only one
vacuum at a time. Is there a reason to be afraid of concurrent
vacuums?


I will be happy if someone points me were is a best place to read
about B-tree vacuum process. Or if someone will explain me how it
works in a few words.
Dev process is here
https://github.com/x4m/postgres_g/pull/2

Thank you for reading, I'm looking forward to hear your thought on the matter.


Best regards, Andrey Borodin.

Вложения

Re: GIN non-intrusive vacuum of posting tree

От
Vladimir Borodin
Дата:

28 нояб. 2016 г., в 20:31, Andrew Borodin <borodin@octonica.com> написал(а):

This patch solved a problem encountered by Evgeniy Efimkin and
Vladimir Borodin from Yandex.Mail.

and eventually deleting some of data. This testbed showed VACUUM
holding inserts for up to tenths of seconds. They claim that proposed
patch made vacuum invisible in this test.

Evgeniy, Vladimir, if I missed something or you have something to add,
please join discussion.

Yep, in our load environment the table is not so big (~ 50 GB), GIN index size is ~ 1 GB. And under our load profile we have seen 90 seconds of impossibility to do an insert into the table because of vacuuming this index. I confirm that with this patch we now don’t see any spikes of errors as it was previously.

--
May the force be with you…

Re: GIN non-intrusive vacuum of posting tree

От
Andrew Borodin
Дата:
Here is v1 of the patch. Now it has changes for README and contains
more comments clarifying changes of locking model.

Also I will elaborate some more on what is patched. Main portion of
changes is made to function ginVacuumPostingTreeLeaves(). Before the
patch it was traversing tree in depth-first fashion, acquiring
exclusive lock on each page and removing dead tuples from leafs. Also
this function used to hold cleanup lock. Now this function is doing
same DFS, but without cleanup lock, acquiring only read locks on inner
pages and exclusive lock on leafs before eliminating dead tuples. If
this function finds empty leafs, it computes minimal subtree,
containing only empty pages and start scan for empty pages from parent
page pointing to found subtree.

This scan acquires cleanup lock on root of scan (not necessarily root
of posting tree). Cleanup lock ensures no inserts are inside subtree.
Scan traverses subtree DF taking exclusive locks from left to right.
For any page being deleted all leftmost pages were locked and unlocked
before. New reads cannot enter subtree, all old readscans were
excluded by lock\unlock. Thus there shall not be deadlocks with
ginStepRight().

We get lock on page being deleted, then on a left page.
ginStepRight() takes lock on left page, than on right page. But it’s
presence is excluded by cleanup lock and DFS scan with locks of upper
and left parts of tree.


Thank you for reading this. Concurrency bothers me a lot. If you see
that anything is wrong or suspicious, please do not hesitate to post
your thoughts.


Best regards, Andrey Borodin.

Вложения

Re: GIN non-intrusive vacuum of posting tree

От
Robert Haas
Дата:
On Wed, Nov 30, 2016 at 11:38 AM, Andrew Borodin <borodin@octonica.com> wrote:
> This scan acquires cleanup lock on root of scan (not necessarily root
> of posting tree). Cleanup lock ensures no inserts are inside subtree.
> Scan traverses subtree DF taking exclusive locks from left to right.
> For any page being deleted all leftmost pages were locked and unlocked
> before. New reads cannot enter subtree, all old readscans were
> excluded by lock\unlock. Thus there shall not be deadlocks with
> ginStepRight().
>
> We get lock on page being deleted, then on a left page.
> ginStepRight() takes lock on left page, than on right page. But it’s
> presence is excluded by cleanup lock and DFS scan with locks of upper
> and left parts of tree.
>
> Thank you for reading this. Concurrency bothers me a lot. If you see
> that anything is wrong or suspicious, please do not hesitate to post
> your thoughts.

I don't know much about GIN, but this seems like an interesting
improvement.  I hope somebody who knows more about GIN will step up to
review it in depth.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company