Re: information_schema performance in Postgres 12

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: information_schema performance in Postgres 12
Дата
Msg-id 1991.1582760106@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: information_schema performance in Postgres 12  (Eric Gillum <eric@hazel.co>)
Список pgsql-general
Eric Gillum <eric@hazel.co> writes:
> Does not seem to improve if I turn off JIT.

AFAICT you're just getting sideswiped by the poor quality of our estimates
for messy sub-SELECTs :-(.  v12 is estimating that it will get only one
row out of the information_schema.table_constraints sub-select, which
causes it to decide that it'd be a cool idea to put that on the outside of
a nestloop with a moderately-expensive other arm.  But actually it gets
178 rows out, so the other arm gets iterated 178 times, and you lose.

         ->  Nested Loop  (cost=84.37..844.93 rows=1 width=232) (actual time=143.780..538.653 rows=9 loops=1)
               Join Filter: ((c_1.conname)::name = ("*SELECT* 1".constraint_name)::name)
               Rows Removed by Join Filter: 2127
               ->  Subquery Scan on "*SELECT* 1"  (cost=0.41..39.31 rows=1 width=128) (actual time=0.165..2.650
rows=178loops=1) 
...
               ->  Hash Join  (cost=83.96..805.59 rows=2 width=296) (actual time=0.448..3.007 rows=12 loops=178)

Now, v11 had the same poor estimate, but it happens not to make that
particular disastrous choice, in this particular case.  I think you
just got lucky on that before, though, and now are not lucky.
We did make some changes in v12 around the handling of conditions
involving information_schema.sql_identifier casts, which the info-schema
views are full of, so the idea that some other estimates might've
changed is not very surprising.  For the most part those changes
should've led to improved estimates, but in this case the big problem
is this unrelated bad estimate.

The crux of the misestimate seems to be here:

                                       ->  Seq Scan on pg_constraint c  (cost=0.00..20.98 rows=2 width=72) (actual
time=0.102..0.231rows=178 loops=1) 
                                             Filter: ((contype <> ALL ('{t,x}'::"char"[])) AND ((CASE contype WHEN
'c'::"char"THEN 'CHECK'::text WHEN 'f'::"char" THEN 'FOREIGN KEY'::text WHEN 'p'::"char" THEN 'PRIMARY KEY'::text WHEN
'u'::"char"THEN 'UNIQUE'::text ELSE NULL::text END)::text = 'FOREIGN KEY'::text)) 

The "contype <> ALL ..." condition is actually estimated pretty well,
AFAICT, but the condition on a CASE clause is not something that the
planner has any understanding of, so it falls back to a default
estimate that is pretty far off for this case.

I noticed along the way that you show v12 as doing an unnecessary sort
that v11 was smart enough to avoid:

 Sort  (cost=1370.57..1370.59 rows=8 width=128) (actual time=10.416..10.417 rows=9 loops=1)
   Sort Key: "*SELECT* 1".table_name, ((a.attname)::information_schema.sql_identifier), (("*SELECT*
1_1".attname)::information_schema.sql_identifier)

 Sort  (cost=845.27..845.27 rows=1 width=256) (actual time=538.828..538.829 rows=9 loops=1)
   Sort Key: "*SELECT* 1".table_name, ((a.attname)::information_schema.sql_identifier), (("*SELECT*
1_1".relname)::information_schema.sql_identifier),(("*SELECT* 1_1".attname)::information_schema.sql_identifier) 

It's not actually necessary to sort on the relname (third sort key)
because your WHERE clause constrains that to be a constant.  v11 saw
that but v12 fails to, which is a bug that I just pushed a fix for.
It's not terribly important for your example, but it might matter more
in other cases.

As far as solving your larger problem goes, I don't have any near-term
ideas other than to rewrite the query to look directly at the system
catalogs rather than going through information_schema views.  But that's
not a very nice answer.

            regards, tom lane



В списке pgsql-general по дате отправления:

Предыдущее
От: Casey Meijer
Дата:
Сообщение: Statistics and Index Usage Data
Следующее
От: "sivapostgres@yahoo.com"
Дата:
Сообщение: Re: Need to find the no. of connections for a database