Обсуждение: using explain output within pgsql

Поиск
Список
Период
Сортировка

using explain output within pgsql

От
Uwe Bartels
Дата:
Hi,

I'm starting up a datawarehouse with patitioning.
my etl processes write directly into the corresponding partitions instead of using triggers.

The reports I run in the datawarehouse are stored in a cache within the same database.
Now I'd like to store besides the results the dependencies to the tables which were used to generate the report. with this information i could invalidate cache results for the tables I'm going to import with my etl processes.

explain analyze gives me the information which table or patition is read from for each report. e.g
explain analyze (FORMAT YAML) create table cache.report234 as select col1,col2,sum(clicks) from dwh.event_log_weekly where week >= '2011-06-27' and week <= '2011-07-11' group by col1,col2;

now I'd like to store the output of explain analyze in a pgsql variable for further processing. that looks something like this.

DO $$declare l_explain text;
begin
l_explain := explain analyze (FORMAT YAML) create table cache.report234 as select col1,col2,sum(clicks) from dwh.event_log_weekly where week >= '2011-06-27' and week <= '2011-07-11' group by col1,col2;
select l_explain;
end$$;

But that doesn't work. I get a syntax error.

Does anybody has an idea how to retrieve the output of explain within pgsql and store this in a variable?
An alternative would be any other way to extract the information about tables used by arbitrary sql statements.

best regards,
Uwe

Re: [SQL] using explain output within pgsql

От
Adrian Klaver
Дата:
On Sunday, July 10, 2011 11:54:10 am Uwe Bartels wrote:
> Hi,
>
> I'm starting up a datawarehouse with patitioning.
> my etl processes write directly into the corresponding partitions instead
> of using triggers.
>
> The reports I run in the datawarehouse are stored in a cache within the
> same database.
> Now I'd like to store besides the results the dependencies to the tables
> which were used to generate the report. with this information i could
> invalidate cache results for the tables I'm going to import with my etl
> processes.
>
> explain analyze gives me the information which table or patition is read
> from for each report. e.g
> explain analyze (FORMAT YAML) create table cache.report234 as select
> col1,col2,sum(clicks) from dwh.event_log_weekly where week >= '2011-06-27'
> and week <= '2011-07-11' group by col1,col2;
>
> now I'd like to store the output of explain analyze in a pgsql variable for
> further processing. that looks something like this.
>
> DO $$declare l_explain text;
> begin
> l_explain := explain analyze (FORMAT YAML) create table cache.report234 as
> select col1,col2,sum(clicks) from dwh.event_log_weekly where week >=
> '2011-06-27' and week <= '2011-07-11' group by col1,col2;
> select l_explain;
> end$$;
>
> But that doesn't work. I get a syntax error.

From here:
http://www.postgresql.org/docs/9.0/interactive/sql-explain.html

I believe you are looking for:
explain (ANALYZE, FORMAT YAML) create table...

>
> Does anybody has an idea how to retrieve the output of explain within pgsql
> and store this in a variable?
> An alternative would be any other way to extract the information about
> tables used by arbitrary sql statements.
>
> best regards,
> Uwe

--
Adrian Klaver
adrian.klaver@gmail.com

Re: [SQL] using explain output within pgsql

От
Pavel Stehule
Дата:
Hello



2011/7/10 Uwe Bartels <uwe.bartels@gmail.com>:
> Hi,
>
> I'm starting up a datawarehouse with patitioning.
> my etl processes write directly into the corresponding partitions instead of
> using triggers.
>
> The reports I run in the datawarehouse are stored in a cache within the same
> database.
> Now I'd like to store besides the results the dependencies to the tables
> which were used to generate the report. with this information i could
> invalidate cache results for the tables I'm going to import with my etl
Hello

try

FOR l_explain IN EXPLAIN ANALYZE ...
LOOP
   ...

Regards

Pavel Stehule

> processes.
>
> explain analyze gives me the information which table or patition is read
> from for each report. e.g
> explain analyze (FORMAT YAML) create table cache.report234 as select
> col1,col2,sum(clicks) from dwh.event_log_weekly where week >= '2011-06-27'
> and week <= '2011-07-11' group by col1,col2;
>
> now I'd like to store the output of explain analyze in a pgsql variable for
> further processing. that looks something like this.
>
> DO $$declare l_explain text;
> begin
> l_explain := explain analyze (FORMAT YAML) create table cache.report234 as
> select col1,col2,sum(clicks) from dwh.event_log_weekly where week >=
> '2011-06-27' and week <= '2011-07-11' group by col1,col2;
> select l_explain;
> end$$;
>
> But that doesn't work. I get a syntax error.
>
> Does anybody has an idea how to retrieve the output of explain within pgsql
> and store this in a variable?
> An alternative would be any other way to extract the information about
> tables used by arbitrary sql statements.
>
> best regards,
> Uwe
>

Re: [SQL] using explain output within pgsql

От
Uwe Bartels
Дата:
Hi Adrian,

yes. but this is not the cause for the syntax error.

thanks,
Uwe




On 10 July 2011 21:20, Adrian Klaver <adrian.klaver@gmail.com> wrote:
On Sunday, July 10, 2011 11:54:10 am Uwe Bartels wrote:
> Hi,
>
> I'm starting up a datawarehouse with patitioning.
> my etl processes write directly into the corresponding partitions instead
> of using triggers.
>
> The reports I run in the datawarehouse are stored in a cache within the
> same database.
> Now I'd like to store besides the results the dependencies to the tables
> which were used to generate the report. with this information i could
> invalidate cache results for the tables I'm going to import with my etl
> processes.
>
> explain analyze gives me the information which table or patition is read
> from for each report. e.g
> explain analyze (FORMAT YAML) create table cache.report234 as select
> col1,col2,sum(clicks) from dwh.event_log_weekly where week >= '2011-06-27'
> and week <= '2011-07-11' group by col1,col2;
>
> now I'd like to store the output of explain analyze in a pgsql variable for
> further processing. that looks something like this.
>
> DO $$declare l_explain text;
> begin
> l_explain := explain analyze (FORMAT YAML) create table cache.report234 as
> select col1,col2,sum(clicks) from dwh.event_log_weekly where week >=
> '2011-06-27' and week <= '2011-07-11' group by col1,col2;
> select l_explain;
> end$$;
>
> But that doesn't work. I get a syntax error.

From here:
http://www.postgresql.org/docs/9.0/interactive/sql-explain.html

I believe you are looking for:
explain (ANALYZE, FORMAT YAML) create table...

>
> Does anybody has an idea how to retrieve the output of explain within pgsql
> and store this in a variable?
> An alternative would be any other way to extract the information about
> tables used by arbitrary sql statements.
>
> best regards,
> Uwe

--
Adrian Klaver
adrian.klaver@gmail.com

Re: [SQL] using explain output within pgsql

От
Uwe Bartels
Дата:
Hi Pavel,

I don't believe it. the second answer on sunday evening within half an hour.
and it works. excellent!!!

best regards,
Uwe


On 10 July 2011 21:20, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hello



2011/7/10 Uwe Bartels <uwe.bartels@gmail.com>:
> Hi,
>
> I'm starting up a datawarehouse with patitioning.
> my etl processes write directly into the corresponding partitions instead of
> using triggers.
>
> The reports I run in the datawarehouse are stored in a cache within the same
> database.
> Now I'd like to store besides the results the dependencies to the tables
> which were used to generate the report. with this information i could
> invalidate cache results for the tables I'm going to import with my etl
Hello

try

FOR l_explain IN EXPLAIN ANALYZE ...
LOOP
  ...

Regards

Pavel Stehule

> processes.
>
> explain analyze gives me the information which table or patition is read
> from for each report. e.g
> explain analyze (FORMAT YAML) create table cache.report234 as select
> col1,col2,sum(clicks) from dwh.event_log_weekly where week >= '2011-06-27'
> and week <= '2011-07-11' group by col1,col2;
>
> now I'd like to store the output of explain analyze in a pgsql variable for
> further processing. that looks something like this.
>
> DO $$declare l_explain text;
> begin
> l_explain := explain analyze (FORMAT YAML) create table cache.report234 as
> select col1,col2,sum(clicks) from dwh.event_log_weekly where week >=
> '2011-06-27' and week <= '2011-07-11' group by col1,col2;
> select l_explain;
> end$$;
>
> But that doesn't work. I get a syntax error.
>
> Does anybody has an idea how to retrieve the output of explain within pgsql
> and store this in a variable?
> An alternative would be any other way to extract the information about
> tables used by arbitrary sql statements.
>
> best regards,
> Uwe
>

Re: [SQL] using explain output within pgsql

От
Uwe Bartels
Дата:
Hi Pavel,

is it posible to get this running even with dynamic sql?
I didn't write that. I'm using execute to run this create table ....

best regards,
Uwe

On 10 July 2011 21:20, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hello



2011/7/10 Uwe Bartels <uwe.bartels@gmail.com>:
> Hi,
>
> I'm starting up a datawarehouse with patitioning.
> my etl processes write directly into the corresponding partitions instead of
> using triggers.
>
> The reports I run in the datawarehouse are stored in a cache within the same
> database.
> Now I'd like to store besides the results the dependencies to the tables
> which were used to generate the report. with this information i could
> invalidate cache results for the tables I'm going to import with my etl
Hello

try

FOR l_explain IN EXPLAIN ANALYZE ...
LOOP
  ...

Regards

Pavel Stehule

> processes.
>
> explain analyze gives me the information which table or patition is read
> from for each report. e.g
> explain analyze (FORMAT YAML) create table cache.report234 as select
> col1,col2,sum(clicks) from dwh.event_log_weekly where week >= '2011-06-27'
> and week <= '2011-07-11' group by col1,col2;
>
> now I'd like to store the output of explain analyze in a pgsql variable for
> further processing. that looks something like this.
>
> DO $$declare l_explain text;
> begin
> l_explain := explain analyze (FORMAT YAML) create table cache.report234 as
> select col1,col2,sum(clicks) from dwh.event_log_weekly where week >=
> '2011-06-27' and week <= '2011-07-11' group by col1,col2;
> select l_explain;
> end$$;
>
> But that doesn't work. I get a syntax error.
>
> Does anybody has an idea how to retrieve the output of explain within pgsql
> and store this in a variable?
> An alternative would be any other way to extract the information about
> tables used by arbitrary sql statements.
>
> best regards,
> Uwe
>

Re: [SQL] using explain output within pgsql

От
Pavel Stehule
Дата:
2011/7/10 Uwe Bartels <uwe.bartels@gmail.com>:
> Hi Pavel,
>
> is it posible to get this running even with dynamic sql?
> I didn't write that. I'm using execute to run this create table ....
>

probably yes

postgres=# do $$
declare x text;
begin
execute e'explain(format yaml)  select * from data where value = \'a\'' into x;
raise notice '%', x;
end;
$$ language plpgsql;
NOTICE:  - Plan:
    Node Type: "Seq Scan"
    Relation Name: "data"
    Alias: "data"
    Startup Cost: 0.00
    Total Cost: 23.38
    Plan Rows: 5
    Plan Width: 46
    Filter: "((value)::text = 'a'::text)"
DO


> best regards,
> Uwe
>
> On 10 July 2011 21:20, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>
>> Hello
>>
>>
>>
>> 2011/7/10 Uwe Bartels <uwe.bartels@gmail.com>:
>> > Hi,
>> >
>> > I'm starting up a datawarehouse with patitioning.
>> > my etl processes write directly into the corresponding partitions
>> > instead of
>> > using triggers.
>> >
>> > The reports I run in the datawarehouse are stored in a cache within the
>> > same
>> > database.
>> > Now I'd like to store besides the results the dependencies to the tables
>> > which were used to generate the report. with this information i could
>> > invalidate cache results for the tables I'm going to import with my etl
>> Hello
>>
>> try
>>
>> FOR l_explain IN EXPLAIN ANALYZE ...
>> LOOP
>>   ...
>>
>> Regards
>>
>> Pavel Stehule
>>
>> > processes.
>> >
>> > explain analyze gives me the information which table or patition is read
>> > from for each report. e.g
>> > explain analyze (FORMAT YAML) create table cache.report234 as select
>> > col1,col2,sum(clicks) from dwh.event_log_weekly where week >=
>> > '2011-06-27'
>> > and week <= '2011-07-11' group by col1,col2;
>> >
>> > now I'd like to store the output of explain analyze in a pgsql variable
>> > for
>> > further processing. that looks something like this.
>> >
>> > DO $$declare l_explain text;
>> > begin
>> > l_explain := explain analyze (FORMAT YAML) create table cache.report234
>> > as
>> > select col1,col2,sum(clicks) from dwh.event_log_weekly where week >=
>> > '2011-06-27' and week <= '2011-07-11' group by col1,col2;
>> > select l_explain;
>> > end$$;
>> >
>> > But that doesn't work. I get a syntax error.
>> >
>> > Does anybody has an idea how to retrieve the output of explain within
>> > pgsql
>> > and store this in a variable?
>> > An alternative would be any other way to extract the information about
>> > tables used by arbitrary sql statements.
>> >
>> > best regards,
>> > Uwe
>> >
>
>

Re: [SQL] using explain output within pgsql

От
Gavin Flower
Дата:
On 11/07/11 08:18, Pavel Stehule wrote:
2011/7/10 Uwe Bartels <uwe.bartels@gmail.com>:
Hi Pavel,

is it posible to get this running even with dynamic sql?
I didn't write that. I'm using execute to run this create table ....

probably yes

postgres=# do $$
declare x text;
begin
execute e'explain(format yaml)  select * from data where value = \'a\'' into x;
raise notice '%', x;
end;
$$ language plpgsql;
NOTICE:  - Plan:   Node Type: "Seq Scan"   Relation Name: "data"   Alias: "data"   Startup Cost: 0.00   Total Cost: 23.38   Plan Rows: 5   Plan Width: 46   Filter: "((value)::text = 'a'::text)"
DO
[...]

I find that I understand things better if I rephrase things, so I took Pavel's code and converted it to use variables so I could see more clearly what is happening.

I think using variables makes the use of 'execute' more understandable.

I hope this version is of value to to others, I have included all the code required to run it as a working example.

CREATE TABLE data
(
    id      int,
    value   text
);

INSERT INTO
    data (id, value)
VALUES
    (1, 'a'),
    (2, 'b');

do $$
declare
    v_sql_query    text;
    v_sql_explain  text;
    v_result       text;
begin
v_sql_query :=e'SELECT * FROM data d WHERE value = \'a\'';
v_sql_explain :=e'EXPLAIN(FORMAT YAML) ' || v_sql_query;
execute v_sql_explain into v_result;
raise notice 'v_result: %', v_result;
end;
$$ language plpgsql;



Cheers,
Gavin