Обсуждение: Query performance - normal on 9.0.4, slow from 9.0.5 onwards

Поиск
Список
Период
Сортировка

Query performance - normal on 9.0.4, slow from 9.0.5 onwards

От
Miguel Silva
Дата:
Hi all!

I've ran into a performance problem a few time ago and I've been trying
to figure out a solution until now. But since I've failed to come up
with anything conclusive, it's time to ask some help from people with
more understanding of how postgresql works.

Here's the big picture.
I work for a software company that has it's main program installed on
over 200 clients. This program uses a small local database in
postgresql. Always installed with the one-click installer and
postgresql.conf left on default settings. This structure allows us to
always install the latest version of postgresql both in new clients and
older clients (when they are updated). And all was well for over 7 years.
But with postgresql version 9.0.5 (in version 9.0.4 all was fine), we
noticed the program was taking longer to start. In fact, in some clients
that had older hardware, it could take around 20 minutes when it usually
takes only a few seconds. To make a long story short, the problem was
traced and narrowed down to a single auto generated query. Here it is:

"SELECT
     NULL::text AS PKTABLE_CAT,
     pkn.nspname AS PKTABLE_SCHEM,
     pkc.relname AS PKTABLE_NAME,
     pka.attname AS PKCOLUMN_NAME,
     NULL::text AS FKTABLE_CAT,
     fkn.nspname AS FKTABLE_SCHEM,
     fkc.relname AS FKTABLE_NAME,
     fka.attname AS FKCOLUMN_NAME,
     pos.n AS KEY_SEQ,
     CASE con.confupdtype  WHEN 'c' THEN 0 WHEN 'n' THEN 2 WHEN 'd' THEN
4 WHEN 'r' THEN 1 WHEN 'a' THEN 3 ELSE NULL END AS UPDATE_RULE,
     CASE con.confdeltype  WHEN 'c' THEN 0 WHEN 'n' THEN 2 WHEN 'd' THEN
4 WHEN 'r' THEN 1 WHEN 'a' THEN 3 ELSE NULL END AS DELETE_RULE,
     con.conname AS FK_NAME,
     pkic.relname AS PK_NAME,
     CASE  WHEN con.condeferrable AND con.condeferred THEN 5 WHEN
con.condeferrable THEN 6 ELSE 7 END AS DEFERRABILITY
FROM
     pg_catalog.pg_namespace pkn,
     pg_catalog.pg_class pkc,
     pg_catalog.pg_attribute pka,
     pg_catalog.pg_namespace fkn,
     pg_catalog.pg_class fkc,
     pg_catalog.pg_attribute fka,
     pg_catalog.pg_constraint con,
     pg_catalog.generate_series(1, 32) pos(n),
     pg_catalog.pg_depend dep,
     pg_catalog.pg_class pkic
WHERE pkn.oid = pkc.relnamespace
     AND pkc.oid = pka.attrelid
     AND pka.attnum = con.confkey[pos.n]
     AND con.confrelid = pkc.oid
     AND fkn.oid = fkc.relnamespace
     AND fkc.oid = fka.attrelid
     AND fka.attnum = con.conkey[pos.n]
     AND con.conrelid = fkc.oid
     AND con.contype = 'f'
     AND con.oid = dep.objid
     AND pkic.oid = dep.refobjid
     AND pkic.relkind = 'i'
     AND dep.classid = 'pg_constraint'::regclass::oid
     AND dep.refclassid = 'pg_class'::regclass::oid
     AND pkn.nspname = 'public'
     AND fkn.nspname = 'public'
ORDER BY
     pkn.nspname,
     pkc.relname,
     pos.n;"


 From this point on, in all the tests I did, I directly typed this query
on psql command line. I tried everything. Vaccuming and analyzing
(although this is already automatic on postgresql 9.0), updating
postgresql to version 9.1, tuning the database as explained on
postgresql.org documentation (with various values to every parameter,
different possible combinations), nothing worked, EXCEPT switching the
"enable_material" parameter to OFF. That reduces the query time from
around 25 seconds on my system (Intel Core2 Duo 2.93GHz 32bit running
Windows 7 Enterprise Service Pack 1) to around 5 seconds. Here are the
explain analyzes.

enable_material ON: http://explain.depesz.com/s/wen
enable_material OFF: http://explain.depesz.com/s/Zaa

Then, to narrow it down a bit further, I tried running the query on
another database. It ran much faster.
So I made a script that creates tables and foreign keys on a database,
to find out at which number of tables/foreign keys the query started to
slow down. I managed to get identically slow performance when I had 1000
tables and 5000 foreign keys. Which didn't help at all, since the
database in which the problem occurs has only 292 tables and 521 foreign
keys.

Of course, it is possible to change the code and use a (different)
manual query that does the same and runs perfectly fine, I've already
done that. But why does this happen, from 9.0.5 on? Is there any idea?
Is this situation already known?
I hope someone can enlighten me on this subject..

Thanks in advance! Best regards,

Miguel Silva

Re: Query performance - normal on 9.0.4, slow from 9.0.5 onwards

От
Tom Lane
Дата:
Miguel Silva <miguel.silva@tactis.pt> writes:
> But with postgresql version 9.0.5 (in version 9.0.4 all was fine), we
> noticed the program was taking longer to start. In fact, in some clients
> that had older hardware, it could take around 20 minutes when it usually
> takes only a few seconds. To make a long story short, the problem was
> traced and narrowed down to a single auto generated query. Here it is:

> "SELECT [ snip ]"

> ... Here are the explain analyzes.

> enable_material ON: http://explain.depesz.com/s/wen
> enable_material OFF: http://explain.depesz.com/s/Zaa

It doesn't really accomplish anything to post anonymized explain output
when you've already shown us the actual query, does it?  Especially when
said query involves only the system catalogs and by no stretch of the
imagination could be thought to contain anything proprietary?

            regards, tom lane

Re: Query performance - normal on 9.0.4, slow from 9.0.5 onwards

От
Miguel Silva
Дата:
On 30-12-2011 17:40, Tom Lane wrote:
> Miguel Silva<miguel.silva@tactis.pt>  writes:
>> But with postgresql version 9.0.5 (in version 9.0.4 all was fine), we
>> noticed the program was taking longer to start. In fact, in some clients
>> that had older hardware, it could take around 20 minutes when it usually
>> takes only a few seconds. To make a long story short, the problem was
>> traced and narrowed down to a single auto generated query. Here it is:
>> "SELECT [ snip ]"
>> ... Here are the explain analyzes.
>> enable_material ON: http://explain.depesz.com/s/wen
>> enable_material OFF: http://explain.depesz.com/s/Zaa
> It doesn't really accomplish anything to post anonymized explain output
> when you've already shown us the actual query, does it?  Especially when
> said query involves only the system catalogs and by no stretch of the
> imagination could be thought to contain anything proprietary?
>
>             regards, tom lane
>
Indeed you are right. Those are explains I created some time ago, when I
didn't really know what that webpage did. I just kept them, and used
them now, didn't even think about that. But the explains are still
there, still useful. Anyway, if it is really necessary, I can post new ones.

Best regards,

Miguel Silva

Re: Query performance - normal on 9.0.4, slow from 9.0.5 onwards

От
Merlin Moncure
Дата:
On Fri, Dec 30, 2011 at 10:39 AM, Miguel Silva <miguel.silva@tactis.pt> wrote:
> Hi all!
>
> I've ran into a performance problem a few time ago and I've been trying to
> figure out a solution until now. But since I've failed to come up with
> anything conclusive, it's time to ask some help from people with more
> understanding of how postgresql works.
>
> Here's the big picture.
> I work for a software company that has it's main program installed on over
> 200 clients. This program uses a small local database in postgresql. Always
> installed with the one-click installer and postgresql.conf left on default
> settings. This structure allows us to always install the latest version of
> postgresql both in new clients and older clients (when they are updated).
> And all was well for over 7 years.
> But with postgresql version 9.0.5 (in version 9.0.4 all was fine), we
> noticed the program was taking longer to start. In fact, in some clients
> that had older hardware, it could take around 20 minutes when it usually
> takes only a few seconds. To make a long story short, the problem was traced
> and narrowed down to a single auto generated query. Here it is:
>
> "SELECT
>    NULL::text AS PKTABLE_CAT,
>    pkn.nspname AS PKTABLE_SCHEM,
>    pkc.relname AS PKTABLE_NAME,
>    pka.attname AS PKCOLUMN_NAME,
>    NULL::text AS FKTABLE_CAT,
>    fkn.nspname AS FKTABLE_SCHEM,
>    fkc.relname AS FKTABLE_NAME,
>    fka.attname AS FKCOLUMN_NAME,
>    pos.n AS KEY_SEQ,
>    CASE con.confupdtype  WHEN 'c' THEN 0 WHEN 'n' THEN 2 WHEN 'd' THEN 4
> WHEN 'r' THEN 1 WHEN 'a' THEN 3 ELSE NULL END AS UPDATE_RULE,
>    CASE con.confdeltype  WHEN 'c' THEN 0 WHEN 'n' THEN 2 WHEN 'd' THEN 4
> WHEN 'r' THEN 1 WHEN 'a' THEN 3 ELSE NULL END AS DELETE_RULE,
>    con.conname AS FK_NAME,
>    pkic.relname AS PK_NAME,
>    CASE  WHEN con.condeferrable AND con.condeferred THEN 5 WHEN
> con.condeferrable THEN 6 ELSE 7 END AS DEFERRABILITY
> FROM
>    pg_catalog.pg_namespace pkn,
>    pg_catalog.pg_class pkc,
>    pg_catalog.pg_attribute pka,
>    pg_catalog.pg_namespace fkn,
>    pg_catalog.pg_class fkc,
>    pg_catalog.pg_attribute fka,
>    pg_catalog.pg_constraint con,
>    pg_catalog.generate_series(1, 32) pos(n),
>    pg_catalog.pg_depend dep,
>    pg_catalog.pg_class pkic
> WHERE pkn.oid = pkc.relnamespace
>    AND pkc.oid = pka.attrelid
>    AND pka.attnum = con.confkey[pos.n]
>    AND con.confrelid = pkc.oid
>    AND fkn.oid = fkc.relnamespace
>    AND fkc.oid = fka.attrelid
>    AND fka.attnum = con.conkey[pos.n]
>    AND con.conrelid = fkc.oid
>    AND con.contype = 'f'
>    AND con.oid = dep.objid
>    AND pkic.oid = dep.refobjid
>    AND pkic.relkind = 'i'
>    AND dep.classid = 'pg_constraint'::regclass::oid
>    AND dep.refclassid = 'pg_class'::regclass::oid
>    AND pkn.nspname = 'public'
>    AND fkn.nspname = 'public'
> ORDER BY
>    pkn.nspname,
>    pkc.relname,
>    pos.n;"
>
>
> From this point on, in all the tests I did, I directly typed this query on
> psql command line. I tried everything. Vaccuming and analyzing (although
> this is already automatic on postgresql 9.0), updating postgresql to version
> 9.1, tuning the database as explained on postgresql.org documentation (with
> various values to every parameter, different possible combinations), nothing
> worked, EXCEPT switching the "enable_material" parameter to OFF. That
> reduces the query time from around 25 seconds on my system (Intel Core2 Duo
> 2.93GHz 32bit running Windows 7 Enterprise Service Pack 1) to around 5
> seconds. Here are the explain analyzes.
>
> enable_material ON: http://explain.depesz.com/s/wen
> enable_material OFF: http://explain.depesz.com/s/Zaa
>
> Then, to narrow it down a bit further, I tried running the query on another
> database. It ran much faster.
> So I made a script that creates tables and foreign keys on a database, to
> find out at which number of tables/foreign keys the query started to slow
> down. I managed to get identically slow performance when I had 1000 tables
> and 5000 foreign keys. Which didn't help at all, since the database in which
> the problem occurs has only 292 tables and 521 foreign keys.
>
> Of course, it is possible to change the code and use a (different) manual
> query that does the same and runs perfectly fine, I've already done that.
> But why does this happen, from 9.0.5 on? Is there any idea? Is this
> situation already known?
> I hope someone can enlighten me on this subject..

try this (curious):
create table pos as select n from generate_series(1,32) n;

and swap that for the in-query generate series call.  your statistics
in the query are completely off (not 100% sure why), so I'm thinking
to replace that since it lies to the planner about the # rows
returned.  also the join on the array element probably isn't helping.

merlin

Re: Query performance - normal on 9.0.4, slow from 9.0.5 onwards

От
Tom Lane
Дата:
Miguel Silva <miguel.silva@tactis.pt> writes:
> I work for a software company that has it's main program installed on
> over 200 clients. This program uses a small local database in
> postgresql. Always installed with the one-click installer and
> postgresql.conf left on default settings. This structure allows us to
> always install the latest version of postgresql both in new clients and
> older clients (when they are updated). And all was well for over 7 years.
> But with postgresql version 9.0.5 (in version 9.0.4 all was fine), we
> noticed the program was taking longer to start.

I poked at this a little bit.  AFAICS the only potentially relevant
planner change between 9.0.4 and 9.0.5 was the removal of eqjoinsel's
ndistinct-clamping heuristic,
http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3505862a8d3e3b389ab926346061b7135fa44f79

Now that's something we took out because it seemed to be making more
cases worse than better, but there were cases where it helped (for the
wrong reasons, but nonetheless it sometimes adjusted the estimates to be
closer to reality), and apparently you've got one such case.  However,
removing that logic just brought the behavior back to what it was
pre-8.4, so I'm a bit dubious of the claim that this query has worked
well for "over 7 years".  Perhaps you had lots fewer tables and/or FKs
back in pre-8.4 days?

I experimented with a toy database having 1000 tables of 30 columns
each, with one foreign key per table, all in the "public" schema, and
indeed this query is pretty slow on current releases.  A big part of the
problem is that the planner is unaware that the one row you're selecting
from pg_namespace will join to almost all the rows in pg_class; so it
underestimates the sizes of those join results, and that leads to
picking a nestloop plan style where it's not appropriate.

I tried removing these WHERE conditions:

>      AND pkn.nspname = 'public'
>      AND fkn.nspname = 'public'

and got a decently fast plan.  If those are, as I suspect, also no-ops
in your real database, perhaps that will do as a workaround.

            regards, tom lane

Re: Query performance - normal on 9.0.4, slow from 9.0.5 onwards

От
Miguel Silva
Дата:
On 30-12-2011 19:35, Merlin Moncure wrote:
> try this (curious):
> create table pos as select n from generate_series(1,32) n;
>
> and swap that for the in-query generate series call.  your statistics
> in the query are completely off (not 100% sure why), so I'm thinking
> to replace that since it lies to the planner about the # rows
> returned.  also the join on the array element probably isn't helping.
>
> merlin
>
Tried it. The query still takes around the same amount of time but, out
of curiosity, here's the explain analyze of it:
http://explain.depesz.com/s/MvE .

Re: Query performance - normal on 9.0.4, slow from 9.0.5 onwards

От
Miguel Silva
Дата:
On 30-12-2011 22:29, Tom Lane wrote:
> I poked at this a little bit.  AFAICS the only potentially relevant
> planner change between 9.0.4 and 9.0.5 was the removal of eqjoinsel's
> ndistinct-clamping heuristic,
> http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3505862a8d3e3b389ab926346061b7135fa44f79
>
> Now that's something we took out because it seemed to be making more
> cases worse than better, but there were cases where it helped (for the
> wrong reasons, but nonetheless it sometimes adjusted the estimates to be
> closer to reality), and apparently you've got one such case.  However,
> removing that logic just brought the behavior back to what it was
> pre-8.4, so I'm a bit dubious of the claim that this query has worked
> well for "over 7 years".  Perhaps you had lots fewer tables and/or FKs
> back in pre-8.4 days?
Well, thanks, that clarifies the reason why this happens!
Perhaps you are right. I mean, that's what I've been told, and I believe
it really worked well for all that time. But since this is an
auto-generated query, maybe it hasn't always been exactly like this. Or
maybe there really were fewer tables/FKs, back then.
>
> I experimented with a toy database having 1000 tables of 30 columns
> each, with one foreign key per table, all in the "public" schema, and
> indeed this query is pretty slow on current releases.  A big part of the
> problem is that the planner is unaware that the one row you're selecting
> from pg_namespace will join to almost all the rows in pg_class; so it
> underestimates the sizes of those join results, and that leads to
> picking a nestloop plan style where it's not appropriate.
>
> I tried removing these WHERE conditions:
>
>>       AND pkn.nspname = 'public'
>>       AND fkn.nspname = 'public'
> and got a decently fast plan.  If those are, as I suspect, also no-ops
> in your real database, perhaps that will do as a workaround.
>
>             regards, tom lane
>
I tried running the query with that change, but it still takes around 25
secs. What I did as a workaround, was use this query instead of an
auto-generated one:

SELECT
     tc.constraint_name AS FK_NAME,
     tc.table_name AS PKTABLE_NAME,
     kcu.column_name AS PKCOLUMN_NAME,
     ccu.table_name AS FKTABLE_NAME,
     ccu.column_name AS FKCOLUMN_NAME,
     CASE con.confupdtype  WHEN 'c' THEN 0 WHEN 'n' THEN 2 WHEN 'd' THEN
4 WHEN 'r' THEN 1 WHEN 'a' THEN 3 ELSE NULL END AS UPDATE_RULE,
     CASE con.confdeltype  WHEN 'c' THEN 0 WHEN 'n' THEN 2 WHEN 'd' THEN
4 WHEN 'r' THEN 1 WHEN 'a' THEN 3 ELSE NULL END AS DELETE_RULE

FROM information_schema.table_constraints AS tc
             JOIN information_schema.key_column_usage AS kcu ON
tc.constraint_name = kcu.constraint_name
             JOIN information_schema.constraint_column_usage AS ccu ON
ccu.constraint_name = tc.constraint_name
             JOIN pg_catalog.pg_constraint AS con ON con.conname =
tc.constraint_name

WHERE constraint_type = 'FOREIGN KEY';

Thanks for looking into this!

Best regards,

Miguel Silva