Aggregated join vs. aggregate in column?

Поиск
Список
Период
Сортировка
От Durumdara
Тема Aggregated join vs. aggregate in column?
Дата
Msg-id CAEcMXhn5ex_bJ+1tJKDnhnb=G7c-ujpYfmnaM6MQtkTAMNHg1g@mail.gmail.com
обсуждение исходный текст
Ответы Re: Aggregated join vs. aggregate in column?  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-general
Dear Members!

I have a query which I extended with an extra calculated column.
I need to list the request, and the last date when they scheduled.

Example:

select 
    request.*,
    (
        select max(s_date) as s_date from schedule
        where schedule.product_id = request.product_id and schedule.ok = True
     ) as max_s_date,
    ...
from request
...

This can  find the last scheduled timestamp for that product.

It's working, but now they wanted to search for this max_s_date column).

I have two ways:

1.) With query / subquery:

select * from
(
select 
    request.*,
    (
        select max(s_date) as s_date from schedule
        where schedule.product_id = request.product_id and schedule.ok = True
     ) as max_s_date,
    ...
from request
...
) t where t.max_s_date between ...

2.) I may relocate this section as join...

select 
    request.*, s.max_s_date
from request 
left join 
    (
        select schedule.product_id, max(s_date) as max_s_date from schedule
        where schedule.ok = True
        group by  schedule.product_id  
     ) s on (s.product_id = request.product_id)
    ...

But I really don't know what the hell will happen in query optimizer with this method.

a.)
Optimizer is clever, and it calculates the aggregates only in the needed rows.
So it find the request.product_id-s, and execute the "s" query only in these rows.

b.)
Or it isn't enough wise, it executes the "s" subquery on whole schedule, and later joins to main table.
The schedule table is big in customers' database, so this is worst case. :-(

---

I asked this because sometimes I need to get more result columns in the select, but I couldn't retreive more...

    (
        select max(s_date) as s_date from schedule
        where schedule.product_id = request.product_id and schedule.ok = True
     ) as max_s_date,   <=== only one column

So sometimes the join is better.

But if the optimizer isn't enough wise, I can get these values only "WITH" queries (select the main rows to temp, run subselects with only these records, return the mix of main and subselects in one query).

What is your experience with these kind of problems?

Can I use this join or I need to avoid because of very slow (and slower-slower) running time.

Thank your for any help!

Best regards
    dd



В списке pgsql-general по дате отправления:

Предыдущее
От: lilu
Дата:
Сообщение: Re: How to run a task continuously in the background
Следующее
От: David Rowley
Дата:
Сообщение: Re: Aggregated join vs. aggregate in column?