Re: Assorted improvements in pg_dump

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Assorted improvements in pg_dump
Дата
Msg-id 20211025193024.ssesntsq25muk7py@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Assorted improvements in pg_dump  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Assorted improvements in pg_dump  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi,

On 2021-10-24 17:10:55 -0400, Tom Lane wrote:
> 0004 is not committable as-is, because it assumes that the source
> server has single-array unnest(), which is not true before 8.4.
> We could fix that by using "oid = ANY(array-constant)" conditions
> instead, but I'm unsure about the performance properties of that
> for large OID arrays on those old server versions.

It doesn't seem bad at all. 8.3 assert:

CREATE TABLE foo(oid oid primary key);
INSERT INTO foo SELECT generate_series(1, 1000000);
postgres[1164129][1]=# explain ANALYZE SELECT count(*) FROM foo WHERE oid = ANY(ARRAY(SELECT generate_series(1100000,
1,-1)));
 

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                             QUERY PLAN
             │
 

├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Aggregate  (cost=81.54..81.55 rows=1 width=0) (actual time=2433.656..2433.656 rows=1 loops=1)
             │
 
│   InitPlan
             │
 
│     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.004..149.425 rows=1100000 loops=1)
             │
 
│   ->  Bitmap Heap Scan on foo  (cost=42.70..81.50 rows=10 width=0) (actual time=2275.137..2369.478 rows=1000000
loops=1)           │
 
│         Recheck Cond: (oid = ANY (($0)::oid[]))
             │
 
│         ->  Bitmap Index Scan on foo_pkey  (cost=0.00..42.69 rows=10 width=0) (actual time=2274.077..2274.077
rows=1000000loops=1) │
 
│               Index Cond: (oid = ANY (($0)::oid[]))
             │
 
│ Total runtime: 2436.201 ms
             │
 

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)

Time: 2437.568 ms (00:02.438)


> Lastly, 0006 implements the other idea we'd discussed in the other
> thread: for queries that are issued repetitively but not within a
> single pg_dump function invocation, use PREPARE/EXECUTE to cut down
> the overhead.  This gets only diminishing returns after 0004, but
> it still brings "pg_dump -s regression" down from 0.39s to 0.33s,
> so maybe it's worth doing.

I think it's worth doing. There's things that the batch approach won't help
with and even if it doesn't help a lot with the regression test database, I'd
expect it to help plenty with other cases.

A test database I had around with lots of functions got drastically faster to
dump (7.4s to 2.5s), even though the number of queries didn't change
significantly. According to pg_stat_statements plan_time for the dumpFunc
query went from 2352ms to 0.4ms - interestingly execution time nearly halves
as well.


> I stopped after caching the plans for
> functions/aggregates/operators/types, though.  The remaining sorts
> of objects aren't likely to appear in typical databases enough times
> to be worth worrying over.  (This patch will be a net loss if there
> are more than zero but less than perhaps 10 instances of an object type,
> so there's definitely reasons beyond laziness for not doing more.)

Seems reasonable.


> @@ -7340,25 +7340,37 @@ getDomainConstraints(Archive *fout, TypeInfo *tyinfo)
>                  i_consrc;
>      int            ntups;
>  
> -    query = createPQExpBuffer();
> +    static bool query_prepared = false;
>  
> -    if (fout->remoteVersion >= 90100)
> -        appendPQExpBuffer(query, "SELECT tableoid, oid, conname, "
> -                          "pg_catalog.pg_get_constraintdef(oid) AS consrc, "
> -                          "convalidated "
> -                          "FROM pg_catalog.pg_constraint "
> -                          "WHERE contypid = '%u'::pg_catalog.oid "
> -                          "ORDER BY conname",
> -                          tyinfo->dobj.catId.oid);
> +    if (!query_prepared)
> +    {

I wonder if it'd be better to store this in Archive or such. The approach with
static variables might run into problems with parallel pg_dump at some
point. These objects aren't dumped in parallel yet, but still...

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)
Следующее
От: Mark Dilger
Дата:
Сообщение: Re: Delegating superuser tasks to new security roles (Was: Granting control of SUSET gucs to non-superusers)