Partitioning performance: cache stringToNode() of pg_constraint.ccbin

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Partitioning performance: cache stringToNode() of pg_constraint.ccbin
Дата
Msg-id 20130603190727.GA360354@tornado.leadboat.com
обсуждение исходный текст
Ответы Re: Partitioning performance: cache stringToNode() of pg_constraint.ccbin  (Amit Kapila <amit.kapila@huawei.com>)
Re: Partitioning performance: cache stringToNode() of pg_constraint.ccbin  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
A colleague, Korry Douglas, observed a table partitioning scenario where
deserializing pg_constraint.ccbin is a hot spot.  The following test case, a
simplification of a typical partitioning setup, spends 28% of its time in
stringToNode() and callees thereof:

\timing on
\set n 600000

BEGIN;

CREATE TABLE bench_check_constr_parent (c int);
CREATE TABLE bench_check_constr_child (
    CHECK (c > 0 AND c <= 100000000)
) INHERITS (bench_check_constr_parent);
CREATE FUNCTION trig() RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    INSERT INTO bench_check_constr_child VALUES (NEW.*);
    RETURN NULL;
END
$$;
CREATE TRIGGER redir BEFORE INSERT ON bench_check_constr_parent
    FOR EACH ROW EXECUTE PROCEDURE trig();

-- Main benchmark
INSERT INTO bench_check_constr_parent SELECT * FROM generate_series(1, :n);
TRUNCATE bench_check_constr_parent;
INSERT INTO bench_check_constr_parent SELECT * FROM generate_series(1, :n);
TRUNCATE bench_check_constr_parent;
INSERT INTO bench_check_constr_parent SELECT * FROM generate_series(1, :n);
TRUNCATE bench_check_constr_parent;

-- Compare direct insert performance @ 10x volume
INSERT INTO bench_check_constr_child SELECT * FROM generate_series(1, :n * 10);
TRUNCATE bench_check_constr_parent;
INSERT INTO bench_check_constr_child SELECT * FROM generate_series(1, :n * 10);
TRUNCATE bench_check_constr_parent;
INSERT INTO bench_check_constr_child SELECT * FROM generate_series(1, :n * 10);
TRUNCATE bench_check_constr_parent;

ROLLBACK;


The executor caches each CHECK constraint in ResultRelInfo as a planned
expression.  That cache is highly effectively for long-running statements, but
the trivial INSERTs effectively work without a cache.  Korry devised this
patch to cache the stringToNode() form of the constraint in the relcache.  It
improves the benchmark's partitioned scenario by 33%:

-- Timings (seconds) --
master, INSERT parent:       14.2, 14.4, 14.4
patched, INSERT parent:      9.6,  9.7,  9.7

master, INSERT*10 child:     9.9,  9.9,  10.2
patched, INSERT*10 child:    10.0, 10.2, 10.2

There's still not much to like about that tenfold overhead from use of the
partition routing trigger, but this patch makes a nice cut into that overhead
without doing anything aggressive.  The profile no longer shows low-hanging
fruit; running an entire SQL statement per row piles on the runtime from a
wide range of sources.  For anyone curious, I've attached output from "perf
report -s parent -g graph,1,caller" with the patch applied; I suggest browsing
under "less -S".

Some call sites need to modify the node tree, so the patch has them do
copyObject().  I ran a microbenchmark of copyObject() on the cached node tree
vs. redoing stringToNode(), and copyObject() still won by a factor of four.

Thanks,
nm

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: Optimising Foreign Key checks
Следующее
От: javadi
Дата:
Сообщение: Question about storage subsystem of PotgreSQL