Обсуждение: PostgreSQL and VIEWS

Поиск
Список
Период
Сортировка

PostgreSQL and VIEWS

От
Misa Simic
Дата:
HI,

When I have met PostgreSQL for a first time - I have been really amazed - with many things...

But how we started to use it - and data jumps in - we meet performance problems...

Now, it is a bit tricky... any concrete performance problem - can be solved on some way...

However, I am more concerned, that "our way" how we do the things - is not best way for Postgres...

The thing is - our tables are highly normalised - so lot of joins... (NULLS do not exist in the tables)

to make things simplier we use VIEWS to "denormalise" data - though it is not that much we care about denormalisation - it is more to make things simplier and less error prone...

So every "thing" (Customer, product, transaction, employee, whatever) is built up from more tables... I am not sure we have even one "thing" built up just from 1 table...

Lot of "thing properties" are actually calculations: i.e. Invoice Amount, or for Employee:
We have in the one table: first_name and last_name fields, but Full name is concatented as:

last_name || ',' || first_name

So, whenever we need employee full name - is it on Employee Info window - or in an Invoice as Salesperson, or in Payslip report...instead of to everywhere have above formula - we have function...
but again, instead of to developer think about each possible function for concrete thing - we have made for each entity - the view... what have all relevant joins - and all relevant calculated properties... about the thing...

I have had a thought - somewhere read (but now I am more convenient I have read it wrong) that planner is that smart that if we we have:

CREATE VIEW person_view AS
SELECT person_id, first_name, last_name, fn_full_name(id) as full_name, date_of_birth, age(date_of_birth) from person LEFT JOIN  person_date_of_birth USING (person_id)

SELECT first_name FROM person_view WHERE id = 1

Planner is smart and:
-will not care about joins  - you ask for the field(s) - what do not belong to other tables - both belong to 1 - and it is pk!
-will not care about functions - you havent asked for any field what is function in your query... 

 However - how we met more and more problems with our approach... and spending more time on
EXPLAIN ANALYZE - instead of on business problems... It seems things are not that good...

for simple questions as above - results are acceptable - even looking into EXPLAIN ANALYZE i would not say it is the best possible plan... (i.e. planner spending time on Seq Scan on person_date_of_birth_table - and filter it - even no need to think about that table at all - LEFT JOIN (show me columns - if there is a matching row for pk column) - so could be check via index -however - there is no any column from that table in the query - I would simple discard that table from plan.... 

So query

SELECT id FROM view WHERE id = 5 (view is SELECT * FROM table1 LEFT JOIN table2)

I would treat the same as:
SELECT id FROM table1 = 5

ok in INNER JOIN it requires additional confimration - but even there is FK to PK join - that confirmation is not needed iether - but in our cases it is always FK to PK...

However - if we need to involve more "entities"/views - from some unknown reason to me - postgres always picks bad plan...

i.e. list of employees what work in New York

we have employees_contract table:
contract_id, person_id, department_id, 

a lot of others tables, but to make it shorter:


Department_view

Buidlings_view


and now query:
SELECT full_name FROM person_view INNER JOIN emplyee_contract USING (person_id) INNER JOIN department_view USING (department_id) INNER JOIN buildings_view USING (building_id) WHERE city_id = 'NY' 


from some unknown reason - gives bad plan - then if we "refactor" query and send different question - we get good result... I am pretty sure planner should be capable to "rephrase" my question instead of me...

I would like to hear your expirience with VIEWS in postgres... And some kind of best practice/advice  for described situation... So far it looks to me there is no way - to make things ready for any specific question - every "request" will need specific SQL syntax to drive planner in acceptable direction...

Many thanks,

Misa... 

Re: PostgreSQL and VIEWS

От
Serge Fonville
Дата:
Hi a couple of things:

from some unknown reason - gives bad plan - then if we "refactor" query and send different question - we get good result... I am pretty sure planner should be capable to "rephrase" my question instead of me...

You seem to be comparing PostgreSQL performance to some other implementation, could you share which one that is?
Have you defined things like Foreign Key constraints, Indexes, Statistics?
Also could you share some queries and their plans?

HTH

Kind regards/met vriendelijke groet,

Serge Fonville

http://www.sergefonville.nl

Convince Microsoft!
They need to add TRUNCATE PARTITION in SQL Server


2013/3/24 Misa Simic <misa.simic@gmail.com>
HI,

When I have met PostgreSQL for a first time - I have been really amazed - with many things...

But how we started to use it - and data jumps in - we meet performance problems...

Now, it is a bit tricky... any concrete performance problem - can be solved on some way...

However, I am more concerned, that "our way" how we do the things - is not best way for Postgres...

The thing is - our tables are highly normalised - so lot of joins... (NULLS do not exist in the tables)

to make things simplier we use VIEWS to "denormalise" data - though it is not that much we care about denormalisation - it is more to make things simplier and less error prone...

So every "thing" (Customer, product, transaction, employee, whatever) is built up from more tables... I am not sure we have even one "thing" built up just from 1 table...

Lot of "thing properties" are actually calculations: i.e. Invoice Amount, or for Employee:
We have in the one table: first_name and last_name fields, but Full name is concatented as:

last_name || ',' || first_name

So, whenever we need employee full name - is it on Employee Info window - or in an Invoice as Salesperson, or in Payslip report...instead of to everywhere have above formula - we have function...
but again, instead of to developer think about each possible function for concrete thing - we have made for each entity - the view... what have all relevant joins - and all relevant calculated properties... about the thing...

I have had a thought - somewhere read (but now I am more convenient I have read it wrong) that planner is that smart that if we we have:

CREATE VIEW person_view AS
SELECT person_id, first_name, last_name, fn_full_name(id) as full_name, date_of_birth, age(date_of_birth) from person LEFT JOIN  person_date_of_birth USING (person_id)

SELECT first_name FROM person_view WHERE id = 1

Planner is smart and:
-will not care about joins  - you ask for the field(s) - what do not belong to other tables - both belong to 1 - and it is pk!
-will not care about functions - you havent asked for any field what is function in your query... 

 However - how we met more and more problems with our approach... and spending more time on
EXPLAIN ANALYZE - instead of on business problems... It seems things are not that good...

for simple questions as above - results are acceptable - even looking into EXPLAIN ANALYZE i would not say it is the best possible plan... (i.e. planner spending time on Seq Scan on person_date_of_birth_table - and filter it - even no need to think about that table at all - LEFT JOIN (show me columns - if there is a matching row for pk column) - so could be check via index -however - there is no any column from that table in the query - I would simple discard that table from plan.... 

So query

SELECT id FROM view WHERE id = 5 (view is SELECT * FROM table1 LEFT JOIN table2)

I would treat the same as:
SELECT id FROM table1 = 5

ok in INNER JOIN it requires additional confimration - but even there is FK to PK join - that confirmation is not needed iether - but in our cases it is always FK to PK...

However - if we need to involve more "entities"/views - from some unknown reason to me - postgres always picks bad plan...

i.e. list of employees what work in New York

we have employees_contract table:
contract_id, person_id, department_id, 

a lot of others tables, but to make it shorter:


Department_view

Buidlings_view


and now query:
SELECT full_name FROM person_view INNER JOIN emplyee_contract USING (person_id) INNER JOIN department_view USING (department_id) INNER JOIN buildings_view USING (building_id) WHERE city_id = 'NY' 


from some unknown reason - gives bad plan - then if we "refactor" query and send different question - we get good result... I am pretty sure planner should be capable to "rephrase" my question instead of me...

I would like to hear your expirience with VIEWS in postgres... And some kind of best practice/advice  for described situation... So far it looks to me there is no way - to make things ready for any specific question - every "request" will need specific SQL syntax to drive planner in acceptable direction...

Many thanks,

Misa... 

Re: PostgreSQL and VIEWS

От
Misa Simic
Дата:
Hi Serge,

here are the some examples I posted in PERFORM list:


No - I do not compare Pg to anything - I am developer - before, I used SQL Server - but two years ago I moved to Postgres - and I have no idea what SQL Server does now - neither I have time to do compare...

I just said something what is "logical" to me,,, "How little man imagine world should work"

Neither I have been thinking about planner before - because of always got results in acceptable time - just now when we have more data - we meet problems - and solution is always "rephrase" the question for particular thing...


But here is example about similar thing what someone else have:


 
"rephrase the question" algorythm - from my developer perspective - is not hard task to do - and it is something what we will need to do to solve our problem... But I think it is more generic thing what would be good for Postgres in general....


Thanks,

Misa


:





2013/3/24 Serge Fonville <serge.fonville@gmail.com>
Hi a couple of things:

from some unknown reason - gives bad plan - then if we "refactor" query and send different question - we get good result... I am pretty sure planner should be capable to "rephrase" my question instead of me...

You seem to be comparing PostgreSQL performance to some other implementation, could you share which one that is?
Have you defined things like Foreign Key constraints, Indexes, Statistics?
Also could you share some queries and their plans?

HTH

Kind regards/met vriendelijke groet,

Serge Fonville

http://www.sergefonville.nl

Convince Microsoft!
They need to add TRUNCATE PARTITION in SQL Server


2013/3/24 Misa Simic <misa.simic@gmail.com>
HI,

When I have met PostgreSQL for a first time - I have been really amazed - with many things...

But how we started to use it - and data jumps in - we meet performance problems...

Now, it is a bit tricky... any concrete performance problem - can be solved on some way...

However, I am more concerned, that "our way" how we do the things - is not best way for Postgres...

The thing is - our tables are highly normalised - so lot of joins... (NULLS do not exist in the tables)

to make things simplier we use VIEWS to "denormalise" data - though it is not that much we care about denormalisation - it is more to make things simplier and less error prone...

So every "thing" (Customer, product, transaction, employee, whatever) is built up from more tables... I am not sure we have even one "thing" built up just from 1 table...

Lot of "thing properties" are actually calculations: i.e. Invoice Amount, or for Employee:
We have in the one table: first_name and last_name fields, but Full name is concatented as:

last_name || ',' || first_name

So, whenever we need employee full name - is it on Employee Info window - or in an Invoice as Salesperson, or in Payslip report...instead of to everywhere have above formula - we have function...
but again, instead of to developer think about each possible function for concrete thing - we have made for each entity - the view... what have all relevant joins - and all relevant calculated properties... about the thing...

I have had a thought - somewhere read (but now I am more convenient I have read it wrong) that planner is that smart that if we we have:

CREATE VIEW person_view AS
SELECT person_id, first_name, last_name, fn_full_name(id) as full_name, date_of_birth, age(date_of_birth) from person LEFT JOIN  person_date_of_birth USING (person_id)

SELECT first_name FROM person_view WHERE id = 1

Planner is smart and:
-will not care about joins  - you ask for the field(s) - what do not belong to other tables - both belong to 1 - and it is pk!
-will not care about functions - you havent asked for any field what is function in your query... 

 However - how we met more and more problems with our approach... and spending more time on
EXPLAIN ANALYZE - instead of on business problems... It seems things are not that good...

for simple questions as above - results are acceptable - even looking into EXPLAIN ANALYZE i would not say it is the best possible plan... (i.e. planner spending time on Seq Scan on person_date_of_birth_table - and filter it - even no need to think about that table at all - LEFT JOIN (show me columns - if there is a matching row for pk column) - so could be check via index -however - there is no any column from that table in the query - I would simple discard that table from plan.... 

So query

SELECT id FROM view WHERE id = 5 (view is SELECT * FROM table1 LEFT JOIN table2)

I would treat the same as:
SELECT id FROM table1 = 5

ok in INNER JOIN it requires additional confimration - but even there is FK to PK join - that confirmation is not needed iether - but in our cases it is always FK to PK...

However - if we need to involve more "entities"/views - from some unknown reason to me - postgres always picks bad plan...

i.e. list of employees what work in New York

we have employees_contract table:
contract_id, person_id, department_id, 

a lot of others tables, but to make it shorter:


Department_view

Buidlings_view


and now query:
SELECT full_name FROM person_view INNER JOIN emplyee_contract USING (person_id) INNER JOIN department_view USING (department_id) INNER JOIN buildings_view USING (building_id) WHERE city_id = 'NY' 


from some unknown reason - gives bad plan - then if we "refactor" query and send different question - we get good result... I am pretty sure planner should be capable to "rephrase" my question instead of me...

I would like to hear your expirience with VIEWS in postgres... And some kind of best practice/advice  for described situation... So far it looks to me there is no way - to make things ready for any specific question - every "request" will need specific SQL syntax to drive planner in acceptable direction...

Many thanks,

Misa... 


Re: PostgreSQL and VIEWS

От
Merlin Moncure
Дата:
On Sat, Mar 23, 2013 at 9:25 PM, Misa Simic <misa.simic@gmail.com> wrote:
> HI,
>
> When I have met PostgreSQL for a first time - I have been really amazed -
> with many things...
>
> But how we started to use it - and data jumps in - we meet performance
> problems...
>
> Now, it is a bit tricky... any concrete performance problem - can be solved
> on some way...
>
> However, I am more concerned, that "our way" how we do the things - is not
> best way for Postgres...
>
> The thing is - our tables are highly normalised - so lot of joins... (NULLS
> do not exist in the tables)
>
> to make things simplier we use VIEWS to "denormalise" data - though it is
> not that much we care about denormalisation - it is more to make things
> simplier and less error prone...
>
> So every "thing" (Customer, product, transaction, employee, whatever) is
> built up from more tables... I am not sure we have even one "thing" built up
> just from 1 table...
>
> Lot of "thing properties" are actually calculations: i.e. Invoice Amount, or
> for Employee:
> We have in the one table: first_name and last_name fields, but Full name is
> concatented as:
>
> last_name || ',' || first_name
>
> So, whenever we need employee full name - is it on Employee Info window - or
> in an Invoice as Salesperson, or in Payslip report...instead of to
> everywhere have above formula - we have function...
> but again, instead of to developer think about each possible function for
> concrete thing - we have made for each entity - the view... what have all
> relevant joins - and all relevant calculated properties... about the
> thing...
>
> I have had a thought - somewhere read (but now I am more convenient I have
> read it wrong) that planner is that smart that if we we have:
>
> CREATE VIEW person_view AS
> SELECT person_id, first_name, last_name, fn_full_name(id) as full_name,
> date_of_birth, age(date_of_birth) from person LEFT JOIN
> person_date_of_birth USING (person_id)
>
> SELECT first_name FROM person_view WHERE id = 1
>
> Planner is smart and:
> -will not care about joins  - you ask for the field(s) - what do not belong
> to other tables - both belong to 1 - and it is pk!
> -will not care about functions - you havent asked for any field what is
> function in your query...
>
>  However - how we met more and more problems with our approach... and
> spending more time on
> EXPLAIN ANALYZE - instead of on business problems... It seems things are not
> that good...
>
> for simple questions as above - results are acceptable - even looking into
> EXPLAIN ANALYZE i would not say it is the best possible plan... (i.e.
> planner spending time on Seq Scan on person_date_of_birth_table - and filter
> it - even no need to think about that table at all - LEFT JOIN (show me
> columns - if there is a matching row for pk column) - so could be check via
> index -however - there is no any column from that table in the query - I
> would simple discard that table from plan....
>
> So query
>
> SELECT id FROM view WHERE id = 5 (view is SELECT * FROM table1 LEFT JOIN
> table2)
>
> I would treat the same as:
> SELECT id FROM table1 = 5
>
> ok in INNER JOIN it requires additional confimration - but even there is FK
> to PK join - that confirmation is not needed iether - but in our cases it is
> always FK to PK...
>
> However - if we need to involve more "entities"/views - from some unknown
> reason to me - postgres always picks bad plan...
>
> i.e. list of employees what work in New York
>
> we have employees_contract table:
> contract_id, person_id, department_id,
>
> a lot of others tables, but to make it shorter:
>
>
> Department_view
>
> Buidlings_view
>
>
> and now query:
> SELECT full_name FROM person_view INNER JOIN emplyee_contract USING
> (person_id) INNER JOIN department_view USING (department_id) INNER JOIN
> buildings_view USING (building_id) WHERE city_id = 'NY'
>
>
> from some unknown reason - gives bad plan - then if we "refactor" query and
> send different question - we get good result... I am pretty sure planner
> should be capable to "rephrase" my question instead of me...
>
> I would like to hear your expirience with VIEWS in postgres... And some kind
> of best practice/advice  for described situation... So far it looks to me
> there is no way - to make things ready for any specific question - every
> "request" will need specific SQL syntax to drive planner in acceptable
> direction...


You asked some broad questions so you are going to get broad answers.

*) query planner is very complicated and changes are very incremental.
 only a very, very small number of people (Tom especially) are capable
of making major changes to it.  some known planner issues that might
get fixed in the short term are better handling of quals through UNION
ALL and/or pushing quals through partitioned windows functions.  these
are documented shortcomings -- other improvements have to be looked
through the lens of 'what else did you break', including,
unfortunately, plan time.

*) filtering in predicate expressions (select * from foo where a || b
= 'x') is going to always suck.  consider optimizing via indexes
and/or materialization

*) excessive use of views is for data abstraction (a technique I
advocate) is on one side of a tradeoff between materlization and run
time performance.  if you are exceeding your ability to quickly write
clean queries, perhaps it's time to start thinking about some
materialization steps.  especially you should be asking yourself if
users are needing 100% up to date data (if not, materialization
requirements relax significantly)

*) better, more experienced programmers write faster, cleaner code.
this is true in all backends.  @ two years in, you have some learning
to do that -hackers will not be able to solve for you.

merlin


Re: PostgreSQL and VIEWS

От
Misa Simic
Дата:
Thanks Merlin,

Well... sorry, It could be and my bad english... but let me explain chronologicaly things...

I have first written concrete case...


But because of I recognized the pattern - always is problem with JOIN to a view...

I have written this abroad generic question.... Because of, I think, Postgres have problem with JOIN to a view in general...So probably someone before me have had the same problem - and if that is the case I just wanted to hear thier solution... 

 But from others examples, and some tests EXPLAIN ANALYZE I have done...

i.e. SELECT t1.a FROM t1 LEFTJOIN t2 USING (a)

Planer includes some actions related to t2 - what are not necessary at all... again - it is just my opinion :) 
(Please, don't take this - I don't know... as some most important thing...)

So that are "small" problems - on our simplified examples - what have big impact in performance on a bit complex examples...

So what we have indentified until know - solution to our problem with views - is always: "rephrase the question" (not indexes - they exist - just not used...)

for example:

SELECT view.* FROM view  INNER JOIN t1 USING (col1) WHERE t1.col2 = 1

to get better performance, you need to say:

SELECT view.* FROM view WHERE col1 = (SELECT t.col1 FROM t1 WHERE t1.col2 = 1)

Logically - that are the same questions - result is the same.... just from some unknown reason to me - postgres in first case picks wrong plan - and we got very bad performance... :(

So solution to our problem - is to add "rephrase the question" tier... (analyze what "input question" is - and transform it to better SQL for Postgres"

And fortunately we have that flexibility in our app... And how the things are - we will need to do it,,,.

So, if input question is: SELECT t1.a FROM t1 LEFT JOIN t2 USING (a) - transform it to: SELECT t1.a FROM t1

etc...

But, don't you think that would be better for Postgres planner in general?


Nowhere in our examples are cases like where A || B = 'x' ... or WHERE volatile_function(a, b) = 5... etc...


Materalisation - well that is another thing why we use VIEWS....

So for calculated properties of the things - we use SQL stable functions...

i.e the thing: Customer - is the VIEW in postgres:

cust_id, cust_name, blablabla...columns, customer_balance

customer_balance - is calculated property of Entity: Customer - postgres function actually...

if custmer_balabce sucks in perform - involve mat view - and the function will return value  from mat_view - instead of to do calculation... But again - we dont have the problem with that :)


We have the problem with:


SELECT c.* FROM customers_view c INNER JOIN invoices USING (customer_id) WHERE invoice_id = 156


And solution to our problem is: "rephrase the question" :)

Kind Regards,

Misa
 

 


 




  


2013/3/25 Merlin Moncure <mmoncure@gmail.com>
On Sat, Mar 23, 2013 at 9:25 PM, Misa Simic <misa.simic@gmail.com> wrote:
> HI,
>
> When I have met PostgreSQL for a first time - I have been really amazed -
> with many things...
>
> But how we started to use it - and data jumps in - we meet performance
> problems...
>
> Now, it is a bit tricky... any concrete performance problem - can be solved
> on some way...
>
> However, I am more concerned, that "our way" how we do the things - is not
> best way for Postgres...
>
> The thing is - our tables are highly normalised - so lot of joins... (NULLS
> do not exist in the tables)
>
> to make things simplier we use VIEWS to "denormalise" data - though it is
> not that much we care about denormalisation - it is more to make things
> simplier and less error prone...
>
> So every "thing" (Customer, product, transaction, employee, whatever) is
> built up from more tables... I am not sure we have even one "thing" built up
> just from 1 table...
>
> Lot of "thing properties" are actually calculations: i.e. Invoice Amount, or
> for Employee:
> We have in the one table: first_name and last_name fields, but Full name is
> concatented as:
>
> last_name || ',' || first_name
>
> So, whenever we need employee full name - is it on Employee Info window - or
> in an Invoice as Salesperson, or in Payslip report...instead of to
> everywhere have above formula - we have function...
> but again, instead of to developer think about each possible function for
> concrete thing - we have made for each entity - the view... what have all
> relevant joins - and all relevant calculated properties... about the
> thing...
>
> I have had a thought - somewhere read (but now I am more convenient I have
> read it wrong) that planner is that smart that if we we have:
>
> CREATE VIEW person_view AS
> SELECT person_id, first_name, last_name, fn_full_name(id) as full_name,
> date_of_birth, age(date_of_birth) from person LEFT JOIN
> person_date_of_birth USING (person_id)
>
> SELECT first_name FROM person_view WHERE id = 1
>
> Planner is smart and:
> -will not care about joins  - you ask for the field(s) - what do not belong
> to other tables - both belong to 1 - and it is pk!
> -will not care about functions - you havent asked for any field what is
> function in your query...
>
>  However - how we met more and more problems with our approach... and
> spending more time on
> EXPLAIN ANALYZE - instead of on business problems... It seems things are not
> that good...
>
> for simple questions as above - results are acceptable - even looking into
> EXPLAIN ANALYZE i would not say it is the best possible plan... (i.e.
> planner spending time on Seq Scan on person_date_of_birth_table - and filter
> it - even no need to think about that table at all - LEFT JOIN (show me
> columns - if there is a matching row for pk column) - so could be check via
> index -however - there is no any column from that table in the query - I
> would simple discard that table from plan....
>
> So query
>
> SELECT id FROM view WHERE id = 5 (view is SELECT * FROM table1 LEFT JOIN
> table2)
>
> I would treat the same as:
> SELECT id FROM table1 = 5
>
> ok in INNER JOIN it requires additional confimration - but even there is FK
> to PK join - that confirmation is not needed iether - but in our cases it is
> always FK to PK...
>
> However - if we need to involve more "entities"/views - from some unknown
> reason to me - postgres always picks bad plan...
>
> i.e. list of employees what work in New York
>
> we have employees_contract table:
> contract_id, person_id, department_id,
>
> a lot of others tables, but to make it shorter:
>
>
> Department_view
>
> Buidlings_view
>
>
> and now query:
> SELECT full_name FROM person_view INNER JOIN emplyee_contract USING
> (person_id) INNER JOIN department_view USING (department_id) INNER JOIN
> buildings_view USING (building_id) WHERE city_id = 'NY'
>
>
> from some unknown reason - gives bad plan - then if we "refactor" query and
> send different question - we get good result... I am pretty sure planner
> should be capable to "rephrase" my question instead of me...
>
> I would like to hear your expirience with VIEWS in postgres... And some kind
> of best practice/advice  for described situation... So far it looks to me
> there is no way - to make things ready for any specific question - every
> "request" will need specific SQL syntax to drive planner in acceptable
> direction...


You asked some broad questions so you are going to get broad answers.

*) query planner is very complicated and changes are very incremental.
 only a very, very small number of people (Tom especially) are capable
of making major changes to it.  some known planner issues that might
get fixed in the short term are better handling of quals through UNION
ALL and/or pushing quals through partitioned windows functions.  these
are documented shortcomings -- other improvements have to be looked
through the lens of 'what else did you break', including,
unfortunately, plan time.

*) filtering in predicate expressions (select * from foo where a || b
= 'x') is going to always suck.  consider optimizing via indexes
and/or materialization

*) excessive use of views is for data abstraction (a technique I
advocate) is on one side of a tradeoff between materlization and run
time performance.  if you are exceeding your ability to quickly write
clean queries, perhaps it's time to start thinking about some
materialization steps.  especially you should be asking yourself if
users are needing 100% up to date data (if not, materialization
requirements relax significantly)

*) better, more experienced programmers write faster, cleaner code.
this is true in all backends.  @ two years in, you have some learning
to do that -hackers will not be able to solve for you.

merlin

Re: PostgreSQL and VIEWS

От
Merlin Moncure
Дата:
On Mon, Mar 25, 2013 at 4:32 PM, Misa Simic <misa.simic@gmail.com> wrote:
> Thanks Merlin,
>
> Well... sorry, It could be and my bad english... but let me explain
> chronologicaly things...
>
> I have first written concrete case...
>
> http://postgresql.1045698.n5.nabble.com/PostgreSQL-planner-tp5749427.html
>
> But because of I recognized the pattern - always is problem with JOIN to a
> view...
>
> I have written this abroad generic question.... Because of, I think,
> Postgres have problem with JOIN to a view in general...So probably someone
> before me have had the same problem - and if that is the case I just wanted
> to hear thier solution...
>
>  But from others examples, and some tests EXPLAIN ANALYZE I have done...
>
> i.e. SELECT t1.a FROM t1 LEFTJOIN t2 USING (a)
>
> Planer includes some actions related to t2 - what are not necessary at
> all... again - it is just my opinion :)
> (Please, don't take this - I don't know... as some most important thing...)
>
> So that are "small" problems - on our simplified examples - what have big
> impact in performance on a bit complex examples...
>
> So what we have indentified until know - solution to our problem with views
> - is always: "rephrase the question" (not indexes - they exist - just not
> used...)
>
> for example:
>
> SELECT view.* FROM view  INNER JOIN t1 USING (col1) WHERE t1.col2 = 1
>
> to get better performance, you need to say:
>
> SELECT view.* FROM view WHERE col1 = (SELECT t.col1 FROM t1 WHERE t1.col2 =
> 1)


yeah.  I understand -- it would help to see a test case there.  the
devil is always in the details.  point being, let's take your other
example

or the supplied test case you mentioned (where you evaluate a volatile
function in a view), things are working as designed.  the only
difference between  a view and a regular query is you get pushed down
one level in terms if subquery.  so,

select * from view;

is the same as:

select * from (<the view query>) q;

so, when using volatile function, the case basically boils down to:

SELECT * FROM (select volatile_func(), stuff FROM big_table) q WHERE
key = value;

that's a *very* different query vs:
select volatile_func(), stuff FROM big_table WHERE key = value;

the slower performance there is because logically you *have* to
evaluate volatile performance first -- things are working as designed.

merlin


Re: PostgreSQL and VIEWS

От
Misa Simic
Дата:
hm...

I have provided examples? Tables definitions and plan for each query? (in another thread..)

I am not sure I can buy it that that are *very* different queries.... I would say - they are the same - why would you need to evalute 100 rows  and reduce end result on one?

execute function   - is most expensive step... most expensive step - I would do on the end - not on the beginning... after i applied all filters - of course if my function is not part of the filter - if it is - then it is something ... unavoidable - and must be executed on all rows...) 

And, even I would do it - just if it is needed i.e.

on:

SELECT stuff FROM (select immutable_func(), stuff FROM big_table) q

I would never execute  the function - even it is immutable... how i understand it - immutable function has just advantage that it could be executed just once - instead of number of rows times - even you want all rows... but if it is not in top query - who cares...why to execute it at all...


I mean - I don't know - maybe it is "by design" - but is there some (hidden) reason why you must execute volatile function on all rows - not just after filter - number of filtered rows times?

P.S. I took volatile function as potentially worst possible scenario...

Though I dont think it is true...

Because of :

SELECT * FROM view_with_volatile_function WHERE indexed_column = 5 - uses index...

but

SELECT * FROM view_with_volatile_function INNER JOIN (SELECT 5 AS indexed_column) q USING (indexed_column) - does not!


Logically - that are the same queries...

Thanks,

Misa









 


2013/3/26 Merlin Moncure <mmoncure@gmail.com>
On Mon, Mar 25, 2013 at 4:32 PM, Misa Simic <misa.simic@gmail.com> wrote:
> Thanks Merlin,
>
> Well... sorry, It could be and my bad english... but let me explain
> chronologicaly things...
>
> I have first written concrete case...
>
> http://postgresql.1045698.n5.nabble.com/PostgreSQL-planner-tp5749427.html
>
> But because of I recognized the pattern - always is problem with JOIN to a
> view...
>
> I have written this abroad generic question.... Because of, I think,
> Postgres have problem with JOIN to a view in general...So probably someone
> before me have had the same problem - and if that is the case I just wanted
> to hear thier solution...
>
>  But from others examples, and some tests EXPLAIN ANALYZE I have done...
>
> i.e. SELECT t1.a FROM t1 LEFTJOIN t2 USING (a)
>
> Planer includes some actions related to t2 - what are not necessary at
> all... again - it is just my opinion :)
> (Please, don't take this - I don't know... as some most important thing...)
>
> So that are "small" problems - on our simplified examples - what have big
> impact in performance on a bit complex examples...
>
> So what we have indentified until know - solution to our problem with views
> - is always: "rephrase the question" (not indexes - they exist - just not
> used...)
>
> for example:
>
> SELECT view.* FROM view  INNER JOIN t1 USING (col1) WHERE t1.col2 = 1
>
> to get better performance, you need to say:
>
> SELECT view.* FROM view WHERE col1 = (SELECT t.col1 FROM t1 WHERE t1.col2 =
> 1)


yeah.  I understand -- it would help to see a test case there.  the
devil is always in the details.  point being, let's take your other
example

or the supplied test case you mentioned (where you evaluate a volatile
function in a view), things are working as designed.  the only
difference between  a view and a regular query is you get pushed down
one level in terms if subquery.  so,

select * from view;

is the same as:

select * from (<the view query>) q;

so, when using volatile function, the case basically boils down to:

SELECT * FROM (select volatile_func(), stuff FROM big_table) q WHERE
key = value;

that's a *very* different query vs:
select volatile_func(), stuff FROM big_table WHERE key = value;

the slower performance there is because logically you *have* to
evaluate volatile performance first -- things are working as designed.

merlin