Обсуждение: Query performance - normal on 9.0.4, slow from 9.0.5 onwards
Hi all! I've ran into a performance problem a few time ago and I've been trying to figure out a solution until now. But since I've failed to come up with anything conclusive, it's time to ask some help from people with more understanding of how postgresql works. Here's the big picture. I work for a software company that has it's main program installed on over 200 clients. This program uses a small local database in postgresql. Always installed with the one-click installer and postgresql.conf left on default settings. This structure allows us to always install the latest version of postgresql both in new clients and older clients (when they are updated). And all was well for over 7 years. But with postgresql version 9.0.5 (in version 9.0.4 all was fine), we noticed the program was taking longer to start. In fact, in some clients that had older hardware, it could take around 20 minutes when it usually takes only a few seconds. To make a long story short, the problem was traced and narrowed down to a single auto generated query. Here it is: "SELECT NULL::text AS PKTABLE_CAT, pkn.nspname AS PKTABLE_SCHEM, pkc.relname AS PKTABLE_NAME, pka.attname AS PKCOLUMN_NAME, NULL::text AS FKTABLE_CAT, fkn.nspname AS FKTABLE_SCHEM, fkc.relname AS FKTABLE_NAME, fka.attname AS FKCOLUMN_NAME, pos.n AS KEY_SEQ, CASE con.confupdtype WHEN 'c' THEN 0 WHEN 'n' THEN 2 WHEN 'd' THEN 4 WHEN 'r' THEN 1 WHEN 'a' THEN 3 ELSE NULL END AS UPDATE_RULE, CASE con.confdeltype WHEN 'c' THEN 0 WHEN 'n' THEN 2 WHEN 'd' THEN 4 WHEN 'r' THEN 1 WHEN 'a' THEN 3 ELSE NULL END AS DELETE_RULE, con.conname AS FK_NAME, pkic.relname AS PK_NAME, CASE WHEN con.condeferrable AND con.condeferred THEN 5 WHEN con.condeferrable THEN 6 ELSE 7 END AS DEFERRABILITY FROM pg_catalog.pg_namespace pkn, pg_catalog.pg_class pkc, pg_catalog.pg_attribute pka, pg_catalog.pg_namespace fkn, pg_catalog.pg_class fkc, pg_catalog.pg_attribute fka, pg_catalog.pg_constraint con, pg_catalog.generate_series(1, 32) pos(n), pg_catalog.pg_depend dep, pg_catalog.pg_class pkic WHERE pkn.oid = pkc.relnamespace AND pkc.oid = pka.attrelid AND pka.attnum = con.confkey[pos.n] AND con.confrelid = pkc.oid AND fkn.oid = fkc.relnamespace AND fkc.oid = fka.attrelid AND fka.attnum = con.conkey[pos.n] AND con.conrelid = fkc.oid AND con.contype = 'f' AND con.oid = dep.objid AND pkic.oid = dep.refobjid AND pkic.relkind = 'i' AND dep.classid = 'pg_constraint'::regclass::oid AND dep.refclassid = 'pg_class'::regclass::oid AND pkn.nspname = 'public' AND fkn.nspname = 'public' ORDER BY pkn.nspname, pkc.relname, pos.n;" From this point on, in all the tests I did, I directly typed this query on psql command line. I tried everything. Vaccuming and analyzing (although this is already automatic on postgresql 9.0), updating postgresql to version 9.1, tuning the database as explained on postgresql.org documentation (with various values to every parameter, different possible combinations), nothing worked, EXCEPT switching the "enable_material" parameter to OFF. That reduces the query time from around 25 seconds on my system (Intel Core2 Duo 2.93GHz 32bit running Windows 7 Enterprise Service Pack 1) to around 5 seconds. Here are the explain analyzes. enable_material ON: http://explain.depesz.com/s/wen enable_material OFF: http://explain.depesz.com/s/Zaa Then, to narrow it down a bit further, I tried running the query on another database. It ran much faster. So I made a script that creates tables and foreign keys on a database, to find out at which number of tables/foreign keys the query started to slow down. I managed to get identically slow performance when I had 1000 tables and 5000 foreign keys. Which didn't help at all, since the database in which the problem occurs has only 292 tables and 521 foreign keys. Of course, it is possible to change the code and use a (different) manual query that does the same and runs perfectly fine, I've already done that. But why does this happen, from 9.0.5 on? Is there any idea? Is this situation already known? I hope someone can enlighten me on this subject.. Thanks in advance! Best regards, Miguel Silva
Miguel Silva <miguel.silva@tactis.pt> writes: > But with postgresql version 9.0.5 (in version 9.0.4 all was fine), we > noticed the program was taking longer to start. In fact, in some clients > that had older hardware, it could take around 20 minutes when it usually > takes only a few seconds. To make a long story short, the problem was > traced and narrowed down to a single auto generated query. Here it is: > "SELECT [ snip ]" > ... Here are the explain analyzes. > enable_material ON: http://explain.depesz.com/s/wen > enable_material OFF: http://explain.depesz.com/s/Zaa It doesn't really accomplish anything to post anonymized explain output when you've already shown us the actual query, does it? Especially when said query involves only the system catalogs and by no stretch of the imagination could be thought to contain anything proprietary? regards, tom lane
On 30-12-2011 17:40, Tom Lane wrote: > Miguel Silva<miguel.silva@tactis.pt> writes: >> But with postgresql version 9.0.5 (in version 9.0.4 all was fine), we >> noticed the program was taking longer to start. In fact, in some clients >> that had older hardware, it could take around 20 minutes when it usually >> takes only a few seconds. To make a long story short, the problem was >> traced and narrowed down to a single auto generated query. Here it is: >> "SELECT [ snip ]" >> ... Here are the explain analyzes. >> enable_material ON: http://explain.depesz.com/s/wen >> enable_material OFF: http://explain.depesz.com/s/Zaa > It doesn't really accomplish anything to post anonymized explain output > when you've already shown us the actual query, does it? Especially when > said query involves only the system catalogs and by no stretch of the > imagination could be thought to contain anything proprietary? > > regards, tom lane > Indeed you are right. Those are explains I created some time ago, when I didn't really know what that webpage did. I just kept them, and used them now, didn't even think about that. But the explains are still there, still useful. Anyway, if it is really necessary, I can post new ones. Best regards, Miguel Silva
On Fri, Dec 30, 2011 at 10:39 AM, Miguel Silva <miguel.silva@tactis.pt> wrote: > Hi all! > > I've ran into a performance problem a few time ago and I've been trying to > figure out a solution until now. But since I've failed to come up with > anything conclusive, it's time to ask some help from people with more > understanding of how postgresql works. > > Here's the big picture. > I work for a software company that has it's main program installed on over > 200 clients. This program uses a small local database in postgresql. Always > installed with the one-click installer and postgresql.conf left on default > settings. This structure allows us to always install the latest version of > postgresql both in new clients and older clients (when they are updated). > And all was well for over 7 years. > But with postgresql version 9.0.5 (in version 9.0.4 all was fine), we > noticed the program was taking longer to start. In fact, in some clients > that had older hardware, it could take around 20 minutes when it usually > takes only a few seconds. To make a long story short, the problem was traced > and narrowed down to a single auto generated query. Here it is: > > "SELECT > NULL::text AS PKTABLE_CAT, > pkn.nspname AS PKTABLE_SCHEM, > pkc.relname AS PKTABLE_NAME, > pka.attname AS PKCOLUMN_NAME, > NULL::text AS FKTABLE_CAT, > fkn.nspname AS FKTABLE_SCHEM, > fkc.relname AS FKTABLE_NAME, > fka.attname AS FKCOLUMN_NAME, > pos.n AS KEY_SEQ, > CASE con.confupdtype WHEN 'c' THEN 0 WHEN 'n' THEN 2 WHEN 'd' THEN 4 > WHEN 'r' THEN 1 WHEN 'a' THEN 3 ELSE NULL END AS UPDATE_RULE, > CASE con.confdeltype WHEN 'c' THEN 0 WHEN 'n' THEN 2 WHEN 'd' THEN 4 > WHEN 'r' THEN 1 WHEN 'a' THEN 3 ELSE NULL END AS DELETE_RULE, > con.conname AS FK_NAME, > pkic.relname AS PK_NAME, > CASE WHEN con.condeferrable AND con.condeferred THEN 5 WHEN > con.condeferrable THEN 6 ELSE 7 END AS DEFERRABILITY > FROM > pg_catalog.pg_namespace pkn, > pg_catalog.pg_class pkc, > pg_catalog.pg_attribute pka, > pg_catalog.pg_namespace fkn, > pg_catalog.pg_class fkc, > pg_catalog.pg_attribute fka, > pg_catalog.pg_constraint con, > pg_catalog.generate_series(1, 32) pos(n), > pg_catalog.pg_depend dep, > pg_catalog.pg_class pkic > WHERE pkn.oid = pkc.relnamespace > AND pkc.oid = pka.attrelid > AND pka.attnum = con.confkey[pos.n] > AND con.confrelid = pkc.oid > AND fkn.oid = fkc.relnamespace > AND fkc.oid = fka.attrelid > AND fka.attnum = con.conkey[pos.n] > AND con.conrelid = fkc.oid > AND con.contype = 'f' > AND con.oid = dep.objid > AND pkic.oid = dep.refobjid > AND pkic.relkind = 'i' > AND dep.classid = 'pg_constraint'::regclass::oid > AND dep.refclassid = 'pg_class'::regclass::oid > AND pkn.nspname = 'public' > AND fkn.nspname = 'public' > ORDER BY > pkn.nspname, > pkc.relname, > pos.n;" > > > From this point on, in all the tests I did, I directly typed this query on > psql command line. I tried everything. Vaccuming and analyzing (although > this is already automatic on postgresql 9.0), updating postgresql to version > 9.1, tuning the database as explained on postgresql.org documentation (with > various values to every parameter, different possible combinations), nothing > worked, EXCEPT switching the "enable_material" parameter to OFF. That > reduces the query time from around 25 seconds on my system (Intel Core2 Duo > 2.93GHz 32bit running Windows 7 Enterprise Service Pack 1) to around 5 > seconds. Here are the explain analyzes. > > enable_material ON: http://explain.depesz.com/s/wen > enable_material OFF: http://explain.depesz.com/s/Zaa > > Then, to narrow it down a bit further, I tried running the query on another > database. It ran much faster. > So I made a script that creates tables and foreign keys on a database, to > find out at which number of tables/foreign keys the query started to slow > down. I managed to get identically slow performance when I had 1000 tables > and 5000 foreign keys. Which didn't help at all, since the database in which > the problem occurs has only 292 tables and 521 foreign keys. > > Of course, it is possible to change the code and use a (different) manual > query that does the same and runs perfectly fine, I've already done that. > But why does this happen, from 9.0.5 on? Is there any idea? Is this > situation already known? > I hope someone can enlighten me on this subject.. try this (curious): create table pos as select n from generate_series(1,32) n; and swap that for the in-query generate series call. your statistics in the query are completely off (not 100% sure why), so I'm thinking to replace that since it lies to the planner about the # rows returned. also the join on the array element probably isn't helping. merlin
Miguel Silva <miguel.silva@tactis.pt> writes: > I work for a software company that has it's main program installed on > over 200 clients. This program uses a small local database in > postgresql. Always installed with the one-click installer and > postgresql.conf left on default settings. This structure allows us to > always install the latest version of postgresql both in new clients and > older clients (when they are updated). And all was well for over 7 years. > But with postgresql version 9.0.5 (in version 9.0.4 all was fine), we > noticed the program was taking longer to start. I poked at this a little bit. AFAICS the only potentially relevant planner change between 9.0.4 and 9.0.5 was the removal of eqjoinsel's ndistinct-clamping heuristic, http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3505862a8d3e3b389ab926346061b7135fa44f79 Now that's something we took out because it seemed to be making more cases worse than better, but there were cases where it helped (for the wrong reasons, but nonetheless it sometimes adjusted the estimates to be closer to reality), and apparently you've got one such case. However, removing that logic just brought the behavior back to what it was pre-8.4, so I'm a bit dubious of the claim that this query has worked well for "over 7 years". Perhaps you had lots fewer tables and/or FKs back in pre-8.4 days? I experimented with a toy database having 1000 tables of 30 columns each, with one foreign key per table, all in the "public" schema, and indeed this query is pretty slow on current releases. A big part of the problem is that the planner is unaware that the one row you're selecting from pg_namespace will join to almost all the rows in pg_class; so it underestimates the sizes of those join results, and that leads to picking a nestloop plan style where it's not appropriate. I tried removing these WHERE conditions: > AND pkn.nspname = 'public' > AND fkn.nspname = 'public' and got a decently fast plan. If those are, as I suspect, also no-ops in your real database, perhaps that will do as a workaround. regards, tom lane
On 30-12-2011 19:35, Merlin Moncure wrote: > try this (curious): > create table pos as select n from generate_series(1,32) n; > > and swap that for the in-query generate series call. your statistics > in the query are completely off (not 100% sure why), so I'm thinking > to replace that since it lies to the planner about the # rows > returned. also the join on the array element probably isn't helping. > > merlin > Tried it. The query still takes around the same amount of time but, out of curiosity, here's the explain analyze of it: http://explain.depesz.com/s/MvE .
On 30-12-2011 22:29, Tom Lane wrote: > I poked at this a little bit. AFAICS the only potentially relevant > planner change between 9.0.4 and 9.0.5 was the removal of eqjoinsel's > ndistinct-clamping heuristic, > http://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3505862a8d3e3b389ab926346061b7135fa44f79 > > Now that's something we took out because it seemed to be making more > cases worse than better, but there were cases where it helped (for the > wrong reasons, but nonetheless it sometimes adjusted the estimates to be > closer to reality), and apparently you've got one such case. However, > removing that logic just brought the behavior back to what it was > pre-8.4, so I'm a bit dubious of the claim that this query has worked > well for "over 7 years". Perhaps you had lots fewer tables and/or FKs > back in pre-8.4 days? Well, thanks, that clarifies the reason why this happens! Perhaps you are right. I mean, that's what I've been told, and I believe it really worked well for all that time. But since this is an auto-generated query, maybe it hasn't always been exactly like this. Or maybe there really were fewer tables/FKs, back then. > > I experimented with a toy database having 1000 tables of 30 columns > each, with one foreign key per table, all in the "public" schema, and > indeed this query is pretty slow on current releases. A big part of the > problem is that the planner is unaware that the one row you're selecting > from pg_namespace will join to almost all the rows in pg_class; so it > underestimates the sizes of those join results, and that leads to > picking a nestloop plan style where it's not appropriate. > > I tried removing these WHERE conditions: > >> AND pkn.nspname = 'public' >> AND fkn.nspname = 'public' > and got a decently fast plan. If those are, as I suspect, also no-ops > in your real database, perhaps that will do as a workaround. > > regards, tom lane > I tried running the query with that change, but it still takes around 25 secs. What I did as a workaround, was use this query instead of an auto-generated one: SELECT tc.constraint_name AS FK_NAME, tc.table_name AS PKTABLE_NAME, kcu.column_name AS PKCOLUMN_NAME, ccu.table_name AS FKTABLE_NAME, ccu.column_name AS FKCOLUMN_NAME, CASE con.confupdtype WHEN 'c' THEN 0 WHEN 'n' THEN 2 WHEN 'd' THEN 4 WHEN 'r' THEN 1 WHEN 'a' THEN 3 ELSE NULL END AS UPDATE_RULE, CASE con.confdeltype WHEN 'c' THEN 0 WHEN 'n' THEN 2 WHEN 'd' THEN 4 WHEN 'r' THEN 1 WHEN 'a' THEN 3 ELSE NULL END AS DELETE_RULE FROM information_schema.table_constraints AS tc JOIN information_schema.key_column_usage AS kcu ON tc.constraint_name = kcu.constraint_name JOIN information_schema.constraint_column_usage AS ccu ON ccu.constraint_name = tc.constraint_name JOIN pg_catalog.pg_constraint AS con ON con.conname = tc.constraint_name WHERE constraint_type = 'FOREIGN KEY'; Thanks for looking into this! Best regards, Miguel Silva