Обсуждение: Whats the most efficient query for this result?

Поиск
Список
Период
Сортировка

Whats the most efficient query for this result?

От
Nick
Дата:
I have three tables (users, books, pencils) and would like to get a
list of all users with a count and total price of their books and
pencils for 2012-01-01...

So with this data...

users (user_id)
1
2
3

books (user_id, price, created)
1 | $10 | 2012-01-01
1 | $10 | 2012-01-01
3 | $10 | 2012-01-01

pencils
1 | $.50 | 2012-01-02
3 | $.50 | 2012-01-01
3 | $.50 | 2012-01-02

What is the most efficient way to get this result...

query_result (user_id, book_count, book_price_total, pencil_count,
pencil_price_total)
1 | 2 | $20 | 0 | $0
2 | 0 | $0 | 0 | $0
3 | 1 | $10 | 1 | $.50


Re: Whats the most efficient query for this result?

От
Tom Molesworth
Дата:
Hi Nick,

On 17/01/12 00:18, Nick wrote:
> I have three tables (users, books, pencils) and would like to get a
> list of all users with a count and total price of their books and
> pencils for 2012-01-01...
>
> So with this data...
>
> users (user_id)
> 1
> 2
> 3
>
> books (user_id, price, created)
> 1 | $10 | 2012-01-01
> 1 | $10 | 2012-01-01
> 3 | $10 | 2012-01-01
>
> pencils
> 1 | $.50 | 2012-01-02
> 3 | $.50 | 2012-01-01
> 3 | $.50 | 2012-01-02
>
> What is the most efficient way to get this result...
>
> query_result (user_id, book_count, book_price_total, pencil_count,
> pencil_price_total)
> 1 | 2 | $20 | 0 | $0
> 2 | 0 | $0 | 0 | $0
> 3 | 1 | $10 | 1 | $.50
>
>

Seems straightforward enough - left join the tables, group the result on
user_id - so I'd write it as:

select u.user_id, count(b.user_id) as "book_count",
coalesce(sum(b.price), 0) as "book_price_total", count(p.user_id) as
"pencil_count", coalesce(sum(b.price), 0) as "pencil_price_total"
from users u
left join books b on b.user_id = u.user_id and b.created = '2012-01-01'
left join pencils p on p.user_id = u.user_id and p.created = '2012-01-01'
group by u.user_id
order by u.user_id

If you need something more efficient, summary tables may help - hard to
say without knowing more about the real data.

Tom


Re: Whats the most efficient query for this result?

От
Nick
Дата:
On Jan 17, 3:33 am, t...@audioboundary.com (Tom Molesworth) wrote:
> Hi Nick,
>
> On 17/01/12 00:18, Nick wrote:
>
>
>
>
>
>
>
>
>
> > I have three tables (users, books, pencils) and would like to get a
> > list of all users with a count and total price of their books and
> > pencils for 2012-01-01...
>
> > So with this data...
>
> > users (user_id)
> > 1
> > 2
> > 3
>
> > books (user_id, price, created)
> > 1 | $10 | 2012-01-01
> > 1 | $10 | 2012-01-01
> > 3 | $10 | 2012-01-01
>
> > pencils
> > 1 | $.50 | 2012-01-02
> > 3 | $.50 | 2012-01-01
> > 3 | $.50 | 2012-01-02
>
> > What is the most efficient way to get this result...
>
> > query_result (user_id, book_count, book_price_total, pencil_count,
> > pencil_price_total)
> > 1 | 2 | $20 | 0 | $0
> > 2 | 0 | $0 | 0 | $0
> > 3 | 1 | $10 | 1 | $.50
>
> Seems straightforward enough - left join the tables, group the result on
> user_id - so I'd write it as:
>
> select u.user_id, count(b.user_id) as "book_count",
> coalesce(sum(b.price), 0) as "book_price_total", count(p.user_id) as
> "pencil_count", coalesce(sum(b.price), 0) as "pencil_price_total"
> from users u
> left join books b on b.user_id = u.user_id and b.created = '2012-01-01'
> left join pencils p on p.user_id = u.user_id and p.created = '2012-01-01'
> group by u.user_id
> order by u.user_id
>
> If you need something more efficient, summary tables may help - hard to
> say without knowing more about the real data.
>
> Tom
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general

I think I figured it out. I have to add a where clause...

select u.user_id, count(b.user_id) as "book_count",
coalesce(sum(b.price), 0) as "book_price_total", count(p.user_id) as
"pencil_count", coalesce(sum(b.price), 0) as "pencil_price_total"
from users u
left join books b on b.user_id = u.user_id and b.created =
'2012-01-01'
left join pencils p on p.user_id = u.user_id and p.created =
'2012-01-01'
WHERE b.user_id = u.user_id AND p.user_id = u.user_id
group by u.user_id
order by u.user_id

Re: Whats the most efficient query for this result?

От
Nick
Дата:
On Jan 17, 3:33 am, t...@audioboundary.com (Tom Molesworth) wrote:
> Hi Nick,
>
> On 17/01/12 00:18, Nick wrote:
>
>
>
>
>
>
>
>
>
> > I have three tables (users, books, pencils) and would like to get a
> > list of all users with a count and total price of their books and
> > pencils for 2012-01-01...
>
> > So with this data...
>
> > users (user_id)
> > 1
> > 2
> > 3
>
> > books (user_id, price, created)
> > 1 | $10 | 2012-01-01
> > 1 | $10 | 2012-01-01
> > 3 | $10 | 2012-01-01
>
> > pencils
> > 1 | $.50 | 2012-01-02
> > 3 | $.50 | 2012-01-01
> > 3 | $.50 | 2012-01-02
>
> > What is the most efficient way to get this result...
>
> > query_result (user_id, book_count, book_price_total, pencil_count,
> > pencil_price_total)
> > 1 | 2 | $20 | 0 | $0
> > 2 | 0 | $0 | 0 | $0
> > 3 | 1 | $10 | 1 | $.50
>
> Seems straightforward enough - left join the tables, group the result on
> user_id - so I'd write it as:
>
> select u.user_id, count(b.user_id) as "book_count",
> coalesce(sum(b.price), 0) as "book_price_total", count(p.user_id) as
> "pencil_count", coalesce(sum(b.price), 0) as "pencil_price_total"
> from users u
> left join books b on b.user_id = u.user_id and b.created = '2012-01-01'
> left join pencils p on p.user_id = u.user_id and p.created = '2012-01-01'
> group by u.user_id
> order by u.user_id
>
> If you need something more efficient, summary tables may help - hard to
> say without knowing more about the real data.
>
> Tom
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general

Thanks Tom. Thats what I originally thought it would be, but my
results (disregarding the date clause) show that user 1 has 2 pencils
instead of 1, and user 3 has 2 books instead of 1.

I guess the LEFT JOIN is joining the first LEFT JOIN instead of the
users table.

Any other thoughts on how to get books and pencils to individually
LEFT JOIN the users table?

Re: Whats the most efficient query for this result?

От
Nick
Дата:
On Jan 17, 3:33 am, t...@audioboundary.com (Tom Molesworth) wrote:
> Hi Nick,
>
> On 17/01/12 00:18, Nick wrote:
>
>
>
>
>
>
>
>
>
> > I have three tables (users, books, pencils) and would like to get a
> > list of all users with a count and total price of their books and
> > pencils for 2012-01-01...
>
> > So with this data...
>
> > users (user_id)
> > 1
> > 2
> > 3
>
> > books (user_id, price, created)
> > 1 | $10 | 2012-01-01
> > 1 | $10 | 2012-01-01
> > 3 | $10 | 2012-01-01
>
> > pencils
> > 1 | $.50 | 2012-01-02
> > 3 | $.50 | 2012-01-01
> > 3 | $.50 | 2012-01-02
>
> > What is the most efficient way to get this result...
>
> > query_result (user_id, book_count, book_price_total, pencil_count,
> > pencil_price_total)
> > 1 | 2 | $20 | 0 | $0
> > 2 | 0 | $0 | 0 | $0
> > 3 | 1 | $10 | 1 | $.50
>
> Seems straightforward enough - left join the tables, group the result on
> user_id - so I'd write it as:
>
> select u.user_id, count(b.user_id) as "book_count",
> coalesce(sum(b.price), 0) as "book_price_total", count(p.user_id) as
> "pencil_count", coalesce(sum(b.price), 0) as "pencil_price_total"
> from users u
> left join books b on b.user_id = u.user_id and b.created = '2012-01-01'
> left join pencils p on p.user_id = u.user_id and p.created = '2012-01-01'
> group by u.user_id
> order by u.user_id
>
> If you need something more efficient, summary tables may help - hard to
> say without knowing more about the real data.
>
> Tom
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general

I take that back, now im incorrectly not getting user 2's results

Re: Whats the most efficient query for this result?

От
"David Johnston"
Дата:
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Nick
Sent: Tuesday, January 17, 2012 12:59 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Whats the most efficient query for this result?

On Jan 17, 3:33 am, t...@audioboundary.com (Tom Molesworth) wrote:
> Hi Nick,
>
> On 17/01/12 00:18, Nick wrote:
>
> > I have three tables (users, books, pencils) and would like to get a
> > list of all users with a count and total price of their books and
> > pencils for 2012-01-01...
>
> > So with this data...
>
> > users (user_id)
> > 1
> > 2
> > 3
>
> > books (user_id, price, created)
> > 1 | $10 | 2012-01-01
> > 1 | $10 | 2012-01-01
> > 3 | $10 | 2012-01-01
>
> > pencils
> > 1 | $.50 | 2012-01-02
> > 3 | $.50 | 2012-01-01
> > 3 | $.50 | 2012-01-02
>
> > What is the most efficient way to get this result...
>
> > query_result (user_id, book_count, book_price_total, pencil_count,
> > pencil_price_total)
> > 1 | 2 | $20 | 0 | $0
> > 2 | 0 | $0 | 0 | $0
> > 3 | 1 | $10 | 1 | $.50
>
> Seems straightforward enough - left join the tables, group the result
> on user_id - so I'd write it as:
>
> select u.user_id, count(b.user_id) as "book_count",
> coalesce(sum(b.price), 0) as "book_price_total", count(p.user_id) as
> "pencil_count", coalesce(sum(b.price), 0) as "pencil_price_total"
> from users u
> left join books b on b.user_id = u.user_id and b.created = '2012-01-01'
> left join pencils p on p.user_id = u.user_id and p.created = '2012-01-01'
> group by u.user_id
> order by u.user_id
>
> If you need something more efficient, summary tables may help - hard
> to say without knowing more about the real data.
>
> Tom
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) To
> make changes to your
> subscription:http://www.postgresql.org/mailpref/pgsql-general

I take that back, now im incorrectly not getting user 2's results

---------------------------------------------------------------------

Not Tested & Psuedo Code But...

SELECT user_id, COALESCE(books_agg.count,0), COALESCE(books_agg.sum,0.00),
... [same for pencils]
FROM user u
LEFT JOIN ( SELECT user_id, COUNT(*), SUM(price) FROM book GROUP BY user_id
) books_agg ON ( books_agg.user_id = u.user_id )
LEFT JOIN ( SELECT user_id, COUNT(*), SUM(price) FROM pencils GROUP BY
user_id ) pencils_agg USING ( pencils_agg.user_id = u.user_id )
-- NOTE: THERE IS NO GROUP BY IN THIS PART OF THE QUERY; only in the
sub-queries

Basically do all your INDEPENDENT calculations first then simply JOIN the
various results to each other while replacing missing JOINs with reasonable
default values.

David J.


Re: Whats the most efficient query for this result?

От
Tom Molesworth
Дата:
On 17/01/12 17:51, Nick wrote:
> On Jan 17, 3:33 am, t...@audioboundary.com (Tom Molesworth) wrote:
>> Hi Nick,
>>
>> On 17/01/12 00:18, Nick wrote:
>>
>>
>>> What is the most efficient way to get this result...
>>> query_result (user_id, book_count, book_price_total, pencil_count,
>>> pencil_price_total)
>>> 1 | 2 | $20 | 0 | $0
>>> 2 | 0 | $0 | 0 | $0
>>> 3 | 1 | $10 | 1 | $.50
>> Seems straightforward enough - left join the tables, group the result on
>> user_id - so I'd write it as:
>>
>> select u.user_id, count(b.user_id) as "book_count",
>> coalesce(sum(b.price), 0) as "book_price_total", count(p.user_id) as
>> "pencil_count", coalesce(sum(b.price), 0) as "pencil_price_total"
>> from users u
>> left join books b on b.user_id = u.user_id and b.created = '2012-01-01'
>> left join pencils p on p.user_id = u.user_id and p.created = '2012-01-01'
>> group by u.user_id
>> order by u.user_id
>>
>> If you need something more efficient, summary tables may help - hard to
>> say without knowing more about the real data.
>>
>> Tom
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
>> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general
> Thanks Tom. Thats what I originally thought it would be, but my
> results (disregarding the date clause) show that user 1 has 2 pencils
> instead of 1, and user 3 has 2 books instead of 1.
>
> I guess the LEFT JOIN is joining the first LEFT JOIN instead of the
> users table.
>
> Any other thoughts on how to get books and pencils to individually
> LEFT JOIN the users table?
>
Looking at it again, I think the missing part is the created fields -
you'll probably need both of those in the group by clause as well to get
meaningful numbers. I think that makes:

select u.user_id,
  count(b.user_id) as "book_count",
  coalesce(sum(b.price), 0) as "book_price_total",
  count(p.user_id) as "pencil_count",
  coalesce(sum(p.price), 0) as "pencil_price_total"
from tst.users u
left join tst.books b on b.user_id = u.user_id and b.created = '2012-01-01'
left join tst.pencils p on p.user_id = u.user_id and p.created =
'2012-01-01'
group by u.user_id, p.created, b.created
order by u.user_id;

Tom

Re: Whats the most efficient query for this result?

От
David Johnston
Дата:
On Jan 17, 2012, at 21:08, Tom Molesworth <tom@audioboundary.com> wrote:

> On 17/01/12 17:51, Nick wrote:
>> On Jan 17, 3:33 am, t...@audioboundary.com (Tom Molesworth) wrote:
> Looking at it again, I think the missing part is the created fields - you'll probably need both of those in the group
byclause as well to get meaningful numbers. I think that makes: 
>
> select u.user_id,
> count(b.user_id) as "book_count",
> coalesce(sum(b.price), 0) as "book_price_total",
> count(p.user_id) as "pencil_count",
> coalesce(sum(p.price), 0) as "pencil_price_total"
> from tst.users u
> left join tst.books b on b.user_id = u.user_id and b.created = '2012-01-01'
> left join tst.pencils p on p.user_id = u.user_id and p.created = '2012-01-01'
> group by u.user_id, p.created, b.created
> order by u.user_id;
>

Why?

What reason is there to include the "created" fields in the GROUP BY but not place them into the corresponding SELECT
output?

The true issue is that the aggregates are operating on two independent joins.  If you have 3 pencil records and two
bookrecords you end up effectively CROSS JOINing them to get 6 sale records which are then aggregated.  You have to
ensurethat at most one record is on the right side of each join so that 1 X 1 -> 1.  You can only do this by performing
separateaggregations for each independent dataset. 

David J.