Обсуждение: Speeding up query pulling comments from pg_catalog

Поиск
Список
Период
Сортировка

Speeding up query pulling comments from pg_catalog

От
Ken Tanzer
Дата:
Hi.  I've got an app that queries pg_catalog to find any table columns that have comments.  After setting up PgBadger, it was #2 on my list of time consuming queries, with min/max/avg duration of 199/2351/385 ms (across ~12,000 executions logged).

I'm wondering if there are any ways to speed this query up, including if there are better options for what to query.

I'm running on 9.6.14 on CentOS 7.

I've copied the EXPLAIN below.  Let me know if additional info would be helpful.  Thanks in advance!

Ken

ag_reach=> EXPLAIN (ANALYZE, VERBOSE,BUFFERS,TIMING, COSTS)
SELECT c.relname AS table,a.attname AS column,pg_catalog.col_description(a.attrelid, a.attnum) AS comment
FROM pg_catalog.pg_attribute a, pg_class c
WHERE  a.attrelid = c.oid
AND pg_catalog.col_description(a.attrelid, a.attnum) IS NOT NULL;
                                                                                                                                              QUERY PLAN                                                                                                                                              
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=197.09..22533.42 rows=39858 width=160) (actual time=92.538..386.047 rows=8 loops=1)
   Output: c.relname, a.attname, col_description(a.attrelid, (a.attnum)::integer)
   Hash Cond: (a.attrelid = c.oid)
   Buffers: shared hit=81066
   ->  Seq Scan on pg_catalog.pg_attribute a  (cost=0.00..11718.81 rows=41278 width=70) (actual time=76.069..369.410 rows=8 loops=1)
         Output: a.attrelid, a.attname, a.atttypid, a.attstattarget, a.attlen, a.attnum, a.attndims, a.attcacheoff, a.atttypmod, a.attbyval, a.attstorage, a.attalign, a.attnotnull, a.atthasdef, a.attisdropped, a.attislocal, a.attinhcount, a.attcollation, a.attacl, a.attoptions, a.attfdwoptions
         Filter: (col_description(a.attrelid, (a.attnum)::integer) IS NOT NULL)
         Rows Removed by Filter: 40043
         Buffers: shared hit=80939
   ->  Hash  (cost=144.82..144.82 rows=4182 width=68) (actual time=15.932..15.934 rows=4183 loops=1)
         Output: c.relname, c.oid
         Buckets: 8192  Batches: 1  Memory Usage: 473kB
         Buffers: shared hit=103
         ->  Seq Scan on pg_catalog.pg_class c  (cost=0.00..144.82 rows=4182 width=68) (actual time=0.015..7.667 rows=4183 loops=1)
               Output: c.relname, c.oid
               Buffers: shared hit=103
 Planning time: 0.408 ms
 Execution time: 386.148 ms
(18 rows)



--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Speeding up query pulling comments from pg_catalog

От
Tom Lane
Дата:
Ken Tanzer <ken.tanzer@gmail.com> writes:
> Hi.  I've got an app that queries pg_catalog to find any table columns that
> have comments.  After setting up PgBadger, it was #2 on my list of time
> consuming queries, with min/max/avg duration of 199/2351/385 ms (across
> ~12,000 executions logged).
> I'm wondering if there are any ways to speed this query up, including if
> there are better options for what to query.

> ag_reach=> EXPLAIN (ANALYZE, VERBOSE,BUFFERS,TIMING, COSTS)
> SELECT c.relname AS table,a.attname AS
> column,pg_catalog.col_description(a.attrelid, a.attnum) AS comment
> FROM pg_catalog.pg_attribute a, pg_class c
> WHERE  a.attrelid = c.oid
> AND pg_catalog.col_description(a.attrelid, a.attnum) IS NOT NULL;

Unfortunately, the planner isn't smart enough to inline the
col_description() function.  But if you do so manually you'll end up
with something like

SELECT c.relname AS table, a.attname AS column, d.description AS comment
FROM
  pg_catalog.pg_attribute a JOIN pg_catalog.pg_class c ON a.attrelid = c.oid
  LEFT JOIN pg_catalog.pg_description d ON d.classoid = c.tableoid and d.objoid = c.oid and d.objsubid = a.attnum
WHERE d.description IS NOT NULL;

For me, that formulation is quite a bit faster than the original ---
what you wrote basically forces a nestloop join against pg_description,
and then to add insult to injury, has to search pg_description a second
time for each hit.

            regards, tom lane



Re: Speeding up query pulling comments from pg_catalog

От
Ken Tanzer
Дата:
On Sat, Jul 20, 2019 at 7:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
But if you do so manually you'll end up with something like

SELECT c.relname AS table, a.attname AS column, d.description AS comment
FROM
  pg_catalog.pg_attribute a JOIN pg_catalog.pg_class c ON a.attrelid = c.oid
  LEFT JOIN pg_catalog.pg_description d ON d.classoid = c.tableoid and d.objoid = c.oid and d.objsubid = a.attnum
WHERE d.description IS NOT NULL;

For me, that formulation is quite a bit faster than the original ---

A lot faster for me too (~30-40 ms).  Thanks!
 

and then to add insult to injury, has to search pg_description a second
time for each hit.
 
Not sure if I'm understanding this correctly, but are you saying that because col_description() is specified in two places in the query, that it actually will get called twice?  I was under the impression that a function (at least a non-volatile one) specified multiple times, but with the same arguments, would only get called once. Is that just wishful thinking?

Cheers,

Ken


--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Speeding up query pulling comments from pg_catalog

От
Tom Lane
Дата:
Ken Tanzer <ken.tanzer@gmail.com> writes:
> On Sat, Jul 20, 2019 at 7:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> and then to add insult to injury, has to search pg_description a second
>> time for each hit.

> Not sure if I'm understanding this correctly, but are you saying that
> because col_description() is specified in two places in the query, that it
> actually will get called twice?

Yes.

> I was under the impression that a function
> (at least a non-volatile one) specified multiple times, but with the same
> arguments, would only get called once. Is that just wishful thinking?

Afraid so.  There's been assorted talk about various optimizations to
avoid unnecessary duplicate function calls, but I don't think that
merging textually-separate calls has even been on the radar.  The
discussions I can recall have been more about not calling stable functions
(with fixed arguments) more than once per query.

            regards, tom lane



Re: Speeding up query pulling comments from pg_catalog

От
Ken Tanzer
Дата:
On Sat, Jul 20, 2019 at 12:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ken Tanzer <ken.tanzer@gmail.com> writes:
> On Sat, Jul 20, 2019 at 7:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> and then to add insult to injury, has to search pg_description a second
>> time for each hit.

> Not sure if I'm understanding this correctly, but are you saying that
> because col_description() is specified in two places in the query, that it
> actually will get called twice?

Yes.

> I was under the impression that a function
> (at least a non-volatile one) specified multiple times, but with the same
> arguments, would only get called once. Is that just wishful thinking?

Afraid so. 

That's good to know!  Just to help me understand:

 
There's been assorted talk about various optimizations to
avoid unnecessary duplicate function calls,

So I had read the sentence below to mean my functions would only get called once.  But is that sentence only supposed to apply to index scans?  Or does it mean the planner is allowed to optimize, but it just doesn't know how yet?

STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement. This category allows the optimizer to optimize multiple calls of the function to a single call. In particular, it is safe to use an expression containing such a function in an index scan condition. (Since an index scan will evaluate the comparison value only once, not once at each row, it is not valid to use a VOLATILE function in an index scan condition.)
 
Cheers,
Ken



--
AGENCY Software  
A Free Software data system
By and for non-profits
(253) 245-3801

learn more about AGENCY or
follow the discussion.

Re: Speeding up query pulling comments from pg_catalog

От
Tom Lane
Дата:
Ken Tanzer <ken.tanzer@gmail.com> writes:
> On Sat, Jul 20, 2019 at 12:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There's been assorted talk about various optimizations to
>> avoid unnecessary duplicate function calls,

> So I had read the sentence below to mean my functions would only get called
> once.  But is that sentence only supposed to apply to index scans?  Or does
> it mean the planner is allowed to optimize, but it just doesn't know how
> yet?

> A STABLE function cannot modify the database and is guaranteed to return
> the same results given the same arguments for all rows within a single
> statement. *This category allows the optimizer to optimize multiple calls
> of the function to a single call.*

It says "allows", not "requires".  But in particular, we've interpreted
that to mean trying to call a stable function (with constant or at least
stable arguments) once per query rather than once per row, as the naive
interpretation of SQL semantics would have us do.  Matching up textually
distinct calls has not been on the radar --- it seems fairly expensive
to do, with no return in typical queries, and relatively small return
even if we find a match.

            regards, tom lane