Обсуждение: selecting data from information_schema.columns performance.

Поиск
Список
Период
Сортировка

selecting data from information_schema.columns performance.

От
Steve Martin
Дата:
Hi,

I am trying to determine if there is a way to improve the performance
when selecting data from the information_schema.columns view.

We use data from this view to inform our application information on the
columns on a table and is used when data is selected from a table.

Below is the output from EXPLAIN ANALYSE:

smf=> explain analyse select column_name, column_default, is_nullable,
data_type, character_maximum_length, numeric_precision,
numeric_precision_radix,
smf-> numeric_scale, udt_name from information_schema.columns where
table_name = 't_fph_tdrdw' order by ordinal_position;

                                                                                 
QUERY
PLAN
                                                                                      


-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=5228.55..5228.64 rows=38 width=449) (actual
time=567.434..567.467 rows=47 loops=1)
   Sort Key: (a.attnum)::information_schema.cardinal_number
   ->  Hash Join  (cost=5071.47..5227.55 rows=38 width=449) (actual
time=547.207..567.113 rows=47 loops=1)
         Hash Cond: ("outer".oid = "inner".atttypid)
         ->  Hash Left Join  (cost=79.27..173.95 rows=1169 width=310)
(actual time=8.036..17.515 rows=1170 loops=1)
               Hash Cond: ("outer".typbasetype = "inner".oid)
               Join Filter: ("outer".typtype = 'd'::"char")
               ->  Hash Join  (cost=1.06..75.29 rows=1169 width=176)
(actual time=0.046..6.960 rows=1170 loops=1)
                     Hash Cond: ("outer".typnamespace = "inner".oid)
                     ->  Seq Scan on pg_type t  (cost=0.00..56.69
rows=1169 width=116) (actual time=0.006..3.868 rows=1170 loops=1)
                     ->  Hash  (cost=1.05..1.05 rows=5 width=68) (actual
time=0.025..0.025 rows=5 loops=1)
                           ->  Seq Scan on pg_namespace nt
(cost=0.00..1.05 rows=5 width=68) (actual time=0.003..0.013 rows=5 loops=1)
               ->  Hash  (cost=75.29..75.29 rows=1169 width=138) (actual
time=7.983..7.983 rows=1170 loops=1)
                     ->  Hash Join  (cost=1.06..75.29 rows=1169
width=138) (actual time=0.036..5.620 rows=1170 loops=1)
                           Hash Cond: ("outer".typnamespace = "inner".oid)
                           ->  Seq Scan on pg_type bt  (cost=0.00..56.69
rows=1169 width=78) (actual time=0.003..2.493 rows=1170 loops=1)
                           ->  Hash  (cost=1.05..1.05 rows=5 width=68)
(actual time=0.022..0.022 rows=5 loops=1)
                                 ->  Seq Scan on pg_namespace nbt
(cost=0.00..1.05 rows=5 width=68) (actual time=0.003..0.012 rows=5 loops=1)
         ->  Hash  (cost=4992.11..4992.11 rows=38 width=143) (actual
time=536.532..536.532 rows=47 loops=1)
               ->  Merge Join  (cost=4722.45..4992.11 rows=38 width=143)
(actual time=535.940..536.287 rows=47 loops=1)
                     Merge Cond: ("outer".attrelid = "inner".oid)
                     ->  Merge Left Join  (cost=4527.17..4730.67
rows=26238 width=143) (actual time=481.392..520.627 rows=10508 loops=1)
                           Merge Cond: (("outer".attrelid =
"inner".adrelid) AND ("outer".attnum = "inner".adnum))
                           ->  Sort  (cost=4471.90..4537.50 rows=26238
width=107) (actual time=481.345..497.647 rows=10508 loops=1)
                                 Sort Key: a.attrelid, a.attnum
                                 ->  Seq Scan on pg_attribute a
(cost=0.00..1474.20 rows=26238 width=107) (actual time=0.007..92.444
rows=26792 loops=1)
                                       Filter: ((attnum > 0) AND (NOT
attisdropped))
                           ->  Sort  (cost=55.27..57.22 rows=780
width=38) (actual time=0.035..0.035 rows=0 loops=1)
                                 Sort Key: ad.adrelid, ad.adnum
                                 ->  Seq Scan on pg_attrdef ad
(cost=0.00..17.80 rows=780 width=38) (actual time=0.003..0.003 rows=0
loops=1)
                     ->  Sort  (cost=195.27..195.28 rows=3 width=8)
(actual time=3.900..3.938 rows=1 loops=1)
                           Sort Key: c.oid
                           ->  Hash Join  (cost=194.12..195.25 rows=3
width=8) (actual time=3.889..3.892 rows=1 loops=1)
                                 Hash Cond: ("outer".oid =
"inner".relnamespace)
                                 ->  Seq Scan on pg_namespace nc
(cost=0.00..1.05 rows=5 width=4) (actual time=0.007..0.016 rows=5 loops=1)
                                 ->  Hash  (cost=194.11..194.11 rows=3
width=12) (actual time=3.826..3.826 rows=1 loops=1)
                                       ->  Seq Scan on pg_class c
(cost=0.00..194.11 rows=3 width=12) (actual time=2.504..3.818 rows=1
loops=1)
                                             Filter: (((relkind =
'r'::"char") OR (relkind = 'v'::"char")) AND (pg_has_role(relowner,
'MEMBER'::text) OR has_table_privilege(oid, 'SELECT'::text) OR
has_table_privilege(oid, 'INSERT'::text) OR has_table_privilege(oid,
'UPDATE'::text) OR has_table_privilege(oid, 'REFERENCES'::text)) AND
(((relname)::information_schema.sql_identifier)::text =
't_fph_tdrdw'::text))
 Total runtime: 568.211 ms
(39 rows)

smf=>


If I create a table from this view "create table
my_information_schema_columns as select * from
information_schema.columns;", naturally the performance is greatly improved.

smf=> explain analyse select column_name, column_default, is_nullable,
data_type, character_maximum_length, numeric_precision,
numeric_precision_radix,
smf-> numeric_scale, udt_name from my_information_schema_columns where
table_name = 't_fph_tdrdw' order by ordinal_position;
                                                             QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=605.75..605.81 rows=24 width=180) (actual
time=39.878..39.914 rows=47 loops=1)
   Sort Key: ordinal_position
   ->  Seq Scan on my_information_schema_columns  (cost=0.00..605.20
rows=24 width=180) (actual time=16.280..39.651 rows=47 loops=1)
         Filter: ((table_name)::text = 't_fph_tdrdw'::text)
 Total runtime: 40.049 ms
(5 rows)

smf=>

And if I add a index "create index my_information_schema_columns_index
on my_information_schema_columns (table_name);" , it is improved even more.

smf=> explain analyse select column_name, column_default, is_nullable,
data_type, character_maximum_length, numeric_precision,
numeric_precision_radix,
smf-> numeric_scale, udt_name from my_information_schema_columns where
table_name = 't_fph_tdrdw' order by
ordinal_position;

QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=294.18..294.48 rows=119 width=180) (actual
time=0.520..0.558 rows=47 loops=1)
   Sort Key: ordinal_position
   ->  Bitmap Heap Scan on my_information_schema_columns
(cost=2.42..290.08 rows=119 width=180) (actual time=0.169..0.296 rows=47
loops=1)
         Recheck Cond: ((table_name)::text = 't_fph_tdrdw'::text)
         ->  Bitmap Index Scan on my_information_schema_columns_index
(cost=0.00..2.42 rows=119 width=0) (actual time=0.149..0.149 rows=47
loops=1)
               Index Cond: ((table_name)::text = 't_fph_tdrdw'::text)
 Total runtime: 0.691 ms
(7 rows)

smf=>

If a table is created from the information_schema.columns view, then we
have the problem of keeping the table up to date.

Any hints, rtfm's (locations please), where to look, etc,  will be
appreciated.

Regards
Steve Martin

--
               \\|//             From near to far,
                @ @              from here to there,
        ---oOOo-(_)-oOOo---      funny things are everywhere.  (Dr. Seuss)




Re: selecting data from information_schema.columns performance.

От
Tom Lane
Дата:
Steve Martin <smartin@nec.co.nz> writes:
> I am trying to determine if there is a way to improve the performance
> when selecting data from the information_schema.columns view.

In my experience, there isn't any single one of the information_schema
views whose performance doesn't suck :-(.  Somebody should work on that
sometime.  I haven't looked closely enough to determine where the
bottlenecks are.

            regards, tom lane

Re: selecting data from information_schema.columns performance.

От
"Jim C. Nasby"
Дата:
On Sun, Oct 01, 2006 at 11:01:19PM -0400, Tom Lane wrote:
> Steve Martin <smartin@nec.co.nz> writes:
> > I am trying to determine if there is a way to improve the performance
> > when selecting data from the information_schema.columns view.
>
> In my experience, there isn't any single one of the information_schema
> views whose performance doesn't suck :-(.  Somebody should work on that
> sometime.  I haven't looked closely enough to determine where the
> bottlenecks are.

Looking at the newsysviews stuff should prove enlightening...
AndrewSN spent a lot of time making sure those views are very
performant.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: selecting data from information_schema.columns

От
Steve Martin
Дата:
Hi

Thanks for you replies.

Regarding, newsysviews, what is the current state, I have had a quick
look at the pgFoundry site and the last updates were 9 months ago.

The most efficient way in the short term I can find to improve
performance for our application is to create a table from
information_schema.columns and update it when tables a created or
deleted, or columns added or removed. E.g.

=>  create table my_information_schema_columns as select * from
information_schema.columns;
=>  create index my_information_schema_columns_index
on my_information_schema_columns (table_name);

Update table with the following statements:

When tables or columns are added:
=>  insert into my_information_schema_columns select * from
information_schema.columns
->  except select * from my_information_schema_columns;

When tables are removed, does not work for column changes:
=>  delete from my_information_schema_columns
->  where table_name = (select table_name from my_information_schema_columns
->   except select table_name from information_schema.columns);

For column changes a script will need to be created, the following
returns the rows to be deleted.  (Any alternative ideas?)
=> select table_name, column_name, ordinal_position from
my_information_schema_columns
-> except select table_name, column_name, ordinal_position from
information_schema.columns;


My problem now is how to get the update statements to be executed when a
table is created or dropped, or columns are added or removed.  For our
application, this is not very often.  My understanding is that triggers
cannot be created for system tables, therefore the updates cannot be
triggered when pg_tables is modified.  Also how to detect column changes
is a problem.

Detecting when a table has been added is relatively easy and can be
performed by our application, e.g. check my_information_schema_columns,
if it does not exist, check information_schema.columns, if exist, run
update statements.

A simple method would be to run a cron job to do the updates, but I
would like to try to be a bit more intelligent about when  the update
statements are executed.

Regards
Steve Martin


Jim C. Nasby wrote:

>On Sun, Oct 01, 2006 at 11:01:19PM -0400, Tom Lane wrote:
>
>
>>Steve Martin <smartin@nec.co.nz> writes:
>>
>>
>>>I am trying to determine if there is a way to improve the performance
>>>when selecting data from the information_schema.columns view.
>>>
>>>
>>In my experience, there isn't any single one of the information_schema
>>views whose performance doesn't suck :-(.  Somebody should work on that
>>sometime.  I haven't looked closely enough to determine where the
>>bottlenecks are.
>>
>>
>
>Looking at the newsysviews stuff should prove enlightening...
>AndrewSN spent a lot of time making sure those views are very
>performant.
>
>

--
               \\|//             From near to far,
                @ @              from here to there,
        ---oOOo-(_)-oOOo---      funny things are everywhere.  (Dr. Seuss)



Re: selecting data from information_schema.columns

От
Jim Nasby
Дата:
On Oct 2, 2006, at 7:31 PM, Steve Martin wrote:
> Regarding, newsysviews, what is the current state, I have had a
> quick look at the pgFoundry site and the last updates were 9 months
> ago.

Well, the system catalogs don't change terribly often, so it's not
like a lot needs to be done. We'd hoped to get them into core, but
that didn't pan out. Theoretically, we should be making the views
look more like information_schema, but no one's gotten to it yet.

> The most efficient way in the short term I can find to improve
> performance for our application is to create a table from
> information_schema.columns and update it when tables a created or
> deleted, or columns added or removed. E.g.

Well, there's nothing that says you have to use information_schema.
You can always query the catalog tables directly. Even if you don't
want to use newsysviews as-is, the code there should be very helpful
for doing that.

There is no ability to put triggers on DDL, so the best you could do
with your caching table is to just periodically update it.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)