Обсуждение: Help with rewriting query

Поиск

Список

Период

Сортировка

Help with rewriting query

От

Junaili Lie

Дата:

08 июня 2005 г., 19:34:35

Hi,
I have the following table:
person - primary key id, and some attributes
food - primary key id, foreign key p_id reference to table person.

table food store all the food that a person is eating. The more recent
food is indicated by the higher food.id.

I need to find what is the most recent food a person ate for every person.
The query:
select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
by f.p_id will work.
But I understand this is not the most efficient way. Is there another
way to rewrite this query? (maybe one that involves order by desc
limit 1)

Thank you in advance.

Re: Help with rewriting query

От

Tobias Brox

Дата:

08 июня 2005 г., 19:56:46

[Junaili Lie - Wed at 12:34:32PM -0700]
> select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
> by f.p_id will work.
> But I understand this is not the most efficient way. Is there another
> way to rewrite this query? (maybe one that involves order by desc
> limit 1)

eventually, try something like

  select p.id,(select f.id from food f where f.p_id=p.id order by f.id desc limit 1)
  from person p

not tested, no warranties.

Since subqueries can be inefficient, use "explain analyze" to see which one
is actually better.

This issue will be solved in future versions of postgresql.

--
Tobias Brox, +47-91700050
Tallinn

Re: Help with rewriting query

От

Junaili Lie

Дата:

08 июня 2005 г., 22:48:31

Hi,
The suggested query below took forever when I tried it.
In addition, as suggested by Tobias, I also tried to create index on
food(p_id, id), but still no goal (same query plan).
Here is the explain:
TEST1=# explain select f.p_id, max(f.id) from Food f, Person p where
(f.p_id = p.id) group by p.id;
                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=0.00..214585.51 rows=569 width=16)
  ->  Merge Join  (cost=0.00..200163.50 rows=2884117 width=16)
        Merge Cond: ("outer".id = "inner".p_id)
        ->  Index Scan using person_pkey on person p
(cost=0.00..25.17 rows=569 width=8)
        ->  Index Scan using person_id_food_index on food f
(cost=0.00..164085.54 rows=2884117 width=16)
(5 rows)




TEST1=# explain select p.id, (Select f.id from food f where
f.p_id=p.id order by f.id desc limit 1) from person p;
                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Seq Scan on Person p  (cost=100000000.00..100007015.24 rows=569 width=8)
  SubPlan
    ->  Limit  (cost=0.00..12.31 rows=1 width=8)
          ->  Index Scan Backward using food_pkey on food f
(cost=0.00..111261.90 rows=9042 width=8)
                Filter: (p_id = $0)
(5 rows)

any ideas or suggestions is appreciate.


On 6/8/05, Tobias Brox <tobias@nordicbet.com> wrote:
> [Junaili Lie - Wed at 12:34:32PM -0700]
> > select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
> > by f.p_id will work.
> > But I understand this is not the most efficient way. Is there another
> > way to rewrite this query? (maybe one that involves order by desc
> > limit 1)
>
> eventually, try something like
>
>  select p.id,(select f.id from food f where f.p_id=p.id order by f.id desc limit 1)
>  from person p
>
> not tested, no warranties.
>
> Since subqueries can be inefficient, use "explain analyze" to see which one
> is actually better.
>
> This issue will be solved in future versions of postgresql.
>
> --
> Tobias Brox, +47-91700050
> Tallinn
>

Re: Help with rewriting query

От

Jim Johannsen

Дата:

09 июня 2005 г., 01:47:17

How about
    SELECT p_id, f_id
    FROM
       person as p
          LEFT JOIN
    (SELECT f.p_id, max(f.id), f_item
        FROM food)   as f
             ON   p.p_id   =   f.p_id

Create an index on Food (p_id, seq #)

This may not gain any performance, but worth a try.  I don't have any
data similar to this to test it on.  Let us know.

I assume that the food id is a sequential number across all people.
Have you thought of a date field and a number representing what meal was
last eaten, i.e. 1= breakfast, 2 = mid morning snack etc.  Or a date
field and the food id code?



Junaili Lie wrote:

>Hi,
>The suggested query below took forever when I tried it.
>In addition, as suggested by Tobias, I also tried to create index on
>food(p_id, id), but still no goal (same query plan).
>Here is the explain:
>TEST1=# explain select f.p_id, max(f.id) from Food f, Person p where
>(f.p_id = p.id) group by p.id;
>                                                  QUERY PLAN
>----------------------------------------------------------------------------------------------------------------
> GroupAggregate  (cost=0.00..214585.51 rows=569 width=16)
>  ->  Merge Join  (cost=0.00..200163.50 rows=2884117 width=16)
>        Merge Cond: ("outer".id = "inner".p_id)
>        ->  Index Scan using person_pkey on person p
>(cost=0.00..25.17 rows=569 width=8)
>        ->  Index Scan using person_id_food_index on food f
>(cost=0.00..164085.54 rows=2884117 width=16)
>(5 rows)
>
>
>
>
>TEST1=# explain select p.id, (Select f.id from food f where
>f.p_id=p.id order by f.id desc limit 1) from person p;
>                                               QUERY PLAN
>-----------------------------------------------------------------------------------------------------------
> Seq Scan on Person p  (cost=100000000.00..100007015.24 rows=569 width=8)
>  SubPlan
>    ->  Limit  (cost=0.00..12.31 rows=1 width=8)
>          ->  Index Scan Backward using food_pkey on food f
>(cost=0.00..111261.90 rows=9042 width=8)
>                Filter: (p_id = $0)
>(5 rows)
>
>any ideas or suggestions is appreciate.
>
>
>On 6/8/05, Tobias Brox <tobias@nordicbet.com> wrote:
>
>
>>[Junaili Lie - Wed at 12:34:32PM -0700]
>>
>>
>>>select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
>>>by f.p_id will work.
>>>But I understand this is not the most efficient way. Is there another
>>>way to rewrite this query? (maybe one that involves order by desc
>>>limit 1)
>>>
>>>
>>eventually, try something like
>>
>> select p.id,(select f.id from food f where f.p_id=p.id order by f.id desc limit 1)
>> from person p
>>
>>not tested, no warranties.
>>
>>Since subqueries can be inefficient, use "explain analyze" to see which one
>>is actually better.
>>
>>This issue will be solved in future versions of postgresql.
>>
>>--
>>Tobias Brox, +47-91700050
>>Tallinn
>>
>>
>>
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match
>
>
>
>

Re: Help with rewriting query

От

Bruno Wolff III

Дата:

09 июня 2005 г., 03:00:18

On Wed, Jun 08, 2005 at 15:48:27 -0700,
  Junaili Lie <junaili@gmail.com> wrote:
> Hi,
> The suggested query below took forever when I tried it.
> In addition, as suggested by Tobias, I also tried to create index on
> food(p_id, id), but still no goal (same query plan).
> Here is the explain:
> TEST1=# explain select f.p_id, max(f.id) from Food f, Person p where
> (f.p_id = p.id) group by p.id;

The above is going to require reading all the food table (assuming no
orphaned records), so the plan below seems reasonable.

>                                                   QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
>  GroupAggregate  (cost=0.00..214585.51 rows=569 width=16)
>   ->  Merge Join  (cost=0.00..200163.50 rows=2884117 width=16)
>         Merge Cond: ("outer".id = "inner".p_id)
>         ->  Index Scan using person_pkey on person p
> (cost=0.00..25.17 rows=569 width=8)
>         ->  Index Scan using person_id_food_index on food f
> (cost=0.00..164085.54 rows=2884117 width=16)
> (5 rows)
>
>
>
>
> TEST1=# explain select p.id, (Select f.id from food f where
> f.p_id=p.id order by f.id desc limit 1) from person p;

Using a subselect seems to be the best hope of getting better performance.
I think you almost got it right, but in order to use the index on
(p_id, id) you need to order by f.p_id desc, f.id desc. Postgres won't
deduce this index can be used because f.p_id is constant in the subselect,
you need to give it some help.

>                                                QUERY PLAN
> -----------------------------------------------------------------------------------------------------------
>  Seq Scan on Person p  (cost=100000000.00..100007015.24 rows=569 width=8)
>   SubPlan
>     ->  Limit  (cost=0.00..12.31 rows=1 width=8)
>           ->  Index Scan Backward using food_pkey on food f
> (cost=0.00..111261.90 rows=9042 width=8)
>                 Filter: (p_id = $0)
> (5 rows)
>
> any ideas or suggestions is appreciate.
>
>
> On 6/8/05, Tobias Brox <tobias@nordicbet.com> wrote:
> > [Junaili Lie - Wed at 12:34:32PM -0700]
> > > select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
> > > by f.p_id will work.
> > > But I understand this is not the most efficient way. Is there another
> > > way to rewrite this query? (maybe one that involves order by desc
> > > limit 1)
> >
> > eventually, try something like
> >
> >  select p.id,(select f.id from food f where f.p_id=p.id order by f.id desc limit 1)
> >  from person p
> >
> > not tested, no warranties.
> >
> > Since subqueries can be inefficient, use "explain analyze" to see which one
> > is actually better.
> >
> > This issue will be solved in future versions of postgresql.
> >
> > --
> > Tobias Brox, +47-91700050
> > Tallinn
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

Re: Help with rewriting query

От

"Kevin Grittner"

Дата:

09 июня 2005 г., 04:02:00

This is a pattern which I've seen many of times.  I call it a "best
choice" query -- you can easily match a row from one table against any
of a number of rows in another, the trick is to pick the one that
matters most.  I've generally found that I want the query results to
show more than the columns used for making the choice (and there can be
many), which rules out the min/max technique.  What works in a pretty
straitforward way, and generally optimizes at least as well as the
alternatives, is to join to the set of candidate rows and add a "not
exists" test to eliminate all but the best choice.

For your example, I've taken some liberties and added hypothetical
columns from both tables to the result set, to demonstrate how that
works.  Feel free to drop them or substitute actual columns as you see
fit.  This will work best if there is an index for the food table on
p_id and id.  Please let me know whether this works for you.

select p.id as p_id, p.fullname, f.id, f.foodtype, f.ts
from food f join person p
on f.p_id = p.id
and not exists (select * from food f2 where f2.p_id = f.p_id and f2.id >
f.id)
order by p_id

Note that this construct works for inner or outer joins and works
regardless of how complex the logic for picking the best choice is.  I
think one reason this tends to optimize well is that an EXISTS test can
finish as soon as it finds one matching row.

-Kevin


>>> Junaili Lie <junaili@gmail.com> 06/08/05 2:34 PM >>>
Hi,
I have the following table:
person - primary key id, and some attributes
food - primary key id, foreign key p_id reference to table person.

table food store all the food that a person is eating. The more recent
food is indicated by the higher food.id.

I need to find what is the most recent food a person ate for every
person.
The query:
select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
by f.p_id will work.
But I understand this is not the most efficient way. Is there another
way to rewrite this query? (maybe one that involves order by desc
limit 1)

Thank you in advance.

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if
your
      joining column's datatypes do not match

Re: Help with rewriting query

От

Junaili Lie

Дата:

10 июня 2005 г., 01:26:13

Hi Bruno,
I followed your suggestion.
The query plan shows that it uses the index (id, person_id). However,
the execution time is still slow. I have to do ctl-C to stop it.
Maybe something is wrong with my postgresql config.
It's running Solaris on dual Opteron, 4GB.
I allocated around 128MB for sorting and more than 80% for
effective_cache_size and shared_buffers = 32768.
Any further ideas is much appreciated.




On 6/8/05, Bruno Wolff III <bruno@wolff.to> wrote:
> On Wed, Jun 08, 2005 at 15:48:27 -0700,
>  Junaili Lie <junaili@gmail.com> wrote:
> > Hi,
> > The suggested query below took forever when I tried it.
> > In addition, as suggested by Tobias, I also tried to create index on
> > food(p_id, id), but still no goal (same query plan).
> > Here is the explain:
> > TEST1=# explain select f.p_id, max(f.id) from Food f, Person p where
> > (f.p_id = p.id) group by p.id;
>
> The above is going to require reading all the food table (assuming no
> orphaned records), so the plan below seems reasonable.
>
> >                                                   QUERY PLAN
> > ----------------------------------------------------------------------------------------------------------------
> >  GroupAggregate  (cost=0.00..214585.51 rows=569 width=16)
> >   ->  Merge Join  (cost=0.00..200163.50 rows=2884117 width=16)
> >         Merge Cond: ("outer".id = "inner".p_id)
> >         ->  Index Scan using person_pkey on person p
> > (cost=0.00..25.17 rows=569 width=8)
> >         ->  Index Scan using person_id_food_index on food f
> > (cost=0.00..164085.54 rows=2884117 width=16)
> > (5 rows)
> >
> >
> >
> >
> > TEST1=# explain select p.id, (Select f.id from food f where
> > f.p_id=p.id order by f.id desc limit 1) from person p;
>
> Using a subselect seems to be the best hope of getting better performance.
> I think you almost got it right, but in order to use the index on
> (p_id, id) you need to order by f.p_id desc, f.id desc. Postgres won't
> deduce this index can be used because f.p_id is constant in the subselect,
> you need to give it some help.
>
> >                                                QUERY PLAN
> > -----------------------------------------------------------------------------------------------------------
> >  Seq Scan on Person p  (cost=100000000.00..100007015.24 rows=569 width=8)
> >   SubPlan
> >     ->  Limit  (cost=0.00..12.31 rows=1 width=8)
> >           ->  Index Scan Backward using food_pkey on food f
> > (cost=0.00..111261.90 rows=9042 width=8)
> >                 Filter: (p_id = $0)
> > (5 rows)
> >
> > any ideas or suggestions is appreciate.
> >
> >
> > On 6/8/05, Tobias Brox <tobias@nordicbet.com> wrote:
> > > [Junaili Lie - Wed at 12:34:32PM -0700]
> > > > select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
> > > > by f.p_id will work.
> > > > But I understand this is not the most efficient way. Is there another
> > > > way to rewrite this query? (maybe one that involves order by desc
> > > > limit 1)
> > >
> > > eventually, try something like
> > >
> > >  select p.id,(select f.id from food f where f.p_id=p.id order by f.id desc limit 1)
> > >  from person p
> > >
> > > not tested, no warranties.
> > >
> > > Since subqueries can be inefficient, use "explain analyze" to see which one
> > > is actually better.
> > >
> > > This issue will be solved in future versions of postgresql.
> > >
> > > --
> > > Tobias Brox, +47-91700050
> > > Tallinn
> > >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: the planner will ignore your desire to choose an index scan if your
> >       joining column's datatypes do not match
>

Re: Help with rewriting query

От

Junaili Lie

Дата:

10 июня 2005 г., 01:30:40

Hi Kevin,
Thanks for the reply.
I tried that query. It definately faster, but not fast enough (took
around 50 second to complete).
I have around 2.5 million on food and 1000 on person.
Here is the query plan:
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..11662257.52 rows=1441579 width=16)
   Merge Cond: ("outer".id = "inner".p_id)
   ->  Index Scan using person_pkey on person p  (cost=0.00..25.17
rows=569 width=8)
   ->  Index Scan using p_id_food_index on food f
(cost=0.00..11644211.28 rows=1441579 width=16)
         Filter: (NOT (subplan))
         SubPlan
           ->  Index Scan using p_id_food_index on food f2
(cost=0.00..11288.47 rows=2835 width=177)
                 Index Cond: (p_id = $0)
                 Filter: (id > $1)
(9 rows)

I appreciate if you have further ideas to troubleshoot this issue.
Thank you!

On 6/8/05, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> This is a pattern which I've seen many of times.  I call it a "best
> choice" query -- you can easily match a row from one table against any
> of a number of rows in another, the trick is to pick the one that
> matters most.  I've generally found that I want the query results to
> show more than the columns used for making the choice (and there can be
> many), which rules out the min/max technique.  What works in a pretty
> straitforward way, and generally optimizes at least as well as the
> alternatives, is to join to the set of candidate rows and add a "not
> exists" test to eliminate all but the best choice.
>
> For your example, I've taken some liberties and added hypothetical
> columns from both tables to the result set, to demonstrate how that
> works.  Feel free to drop them or substitute actual columns as you see
> fit.  This will work best if there is an index for the food table on
> p_id and id.  Please let me know whether this works for you.
>
> select p.id as p_id, p.fullname, f.id, f.foodtype, f.ts
> from food f join person p
> on f.p_id = p.id
> and not exists (select * from food f2 where f2.p_id = f.p_id and f2.id >
> f.id)
> order by p_id
>
> Note that this construct works for inner or outer joins and works
> regardless of how complex the logic for picking the best choice is.  I
> think one reason this tends to optimize well is that an EXISTS test can
> finish as soon as it finds one matching row.
>
> -Kevin
>
>
> >>> Junaili Lie <junaili@gmail.com> 06/08/05 2:34 PM >>>
> Hi,
> I have the following table:
> person - primary key id, and some attributes
> food - primary key id, foreign key p_id reference to table person.
>
> table food store all the food that a person is eating. The more recent
> food is indicated by the higher food.id.
>
> I need to find what is the most recent food a person ate for every
> person.
> The query:
> select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
> by f.p_id will work.
> But I understand this is not the most efficient way. Is there another
> way to rewrite this query? (maybe one that involves order by desc
> limit 1)
>
> Thank you in advance.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>      joining column's datatypes do not match
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>

Re: Help with rewriting query

От

Tobias Brox

Дата:

10 июня 2005 г., 08:49:53

[Junaili Lie - Thu at 06:26:09PM -0700]
> Hi Bruno,
> I followed your suggestion.
> The query plan shows that it uses the index (id, person_id). However,
> the execution time is still slow. I have to do ctl-C to stop it.

What is the estimate planner cost?

> Maybe something is wrong with my postgresql config.
> It's running Solaris on dual Opteron, 4GB.
> I allocated around 128MB for sorting and more than 80% for
> effective_cache_size and shared_buffers = 32768.
> Any further ideas is much appreciated.

Sounds a bit excessive.  Compare with the vanilla configuration, and see
what is faster.

--
Tobias Brox, +47-91700050

Re: Help with rewriting query

От

Bruno Wolff III

Дата:

10 июня 2005 г., 12:07:19

On Thu, Jun 09, 2005 at 18:26:09 -0700,
  Junaili Lie <junaili@gmail.com> wrote:
> Hi Bruno,
> I followed your suggestion.
> The query plan shows that it uses the index (id, person_id). However,
> the execution time is still slow. I have to do ctl-C to stop it.
> Maybe something is wrong with my postgresql config.
> It's running Solaris on dual Opteron, 4GB.
> I allocated around 128MB for sorting and more than 80% for
> effective_cache_size and shared_buffers = 32768.
> Any further ideas is much appreciated.

It might be useful to see that plan and the actual query you used. There were
only 569 entries in the people table, so I find it hard to believe that an
index look up per person is taking so long that you need to cancel the query.

>
>
>
>
> On 6/8/05, Bruno Wolff III <bruno@wolff.to> wrote:
> > On Wed, Jun 08, 2005 at 15:48:27 -0700,
> >  Junaili Lie <junaili@gmail.com> wrote:
> > > Hi,
> > > The suggested query below took forever when I tried it.
> > > In addition, as suggested by Tobias, I also tried to create index on
> > > food(p_id, id), but still no goal (same query plan).
> > > Here is the explain:
> > > TEST1=# explain select f.p_id, max(f.id) from Food f, Person p where
> > > (f.p_id = p.id) group by p.id;
> >
> > The above is going to require reading all the food table (assuming no
> > orphaned records), so the plan below seems reasonable.
> >
> > >                                                   QUERY PLAN
> > > ----------------------------------------------------------------------------------------------------------------
> > >  GroupAggregate  (cost=0.00..214585.51 rows=569 width=16)
> > >   ->  Merge Join  (cost=0.00..200163.50 rows=2884117 width=16)
> > >         Merge Cond: ("outer".id = "inner".p_id)
> > >         ->  Index Scan using person_pkey on person p
> > > (cost=0.00..25.17 rows=569 width=8)
> > >         ->  Index Scan using person_id_food_index on food f
> > > (cost=0.00..164085.54 rows=2884117 width=16)
> > > (5 rows)
> > >
> > >
> > >
> > >
> > > TEST1=# explain select p.id, (Select f.id from food f where
> > > f.p_id=p.id order by f.id desc limit 1) from person p;
> >
> > Using a subselect seems to be the best hope of getting better performance.
> > I think you almost got it right, but in order to use the index on
> > (p_id, id) you need to order by f.p_id desc, f.id desc. Postgres won't
> > deduce this index can be used because f.p_id is constant in the subselect,
> > you need to give it some help.
> >
> > >                                                QUERY PLAN
> > > -----------------------------------------------------------------------------------------------------------
> > >  Seq Scan on Person p  (cost=100000000.00..100007015.24 rows=569 width=8)
> > >   SubPlan
> > >     ->  Limit  (cost=0.00..12.31 rows=1 width=8)
> > >           ->  Index Scan Backward using food_pkey on food f
> > > (cost=0.00..111261.90 rows=9042 width=8)
> > >                 Filter: (p_id = $0)
> > > (5 rows)
> > >
> > > any ideas or suggestions is appreciate.
> > >
> > >
> > > On 6/8/05, Tobias Brox <tobias@nordicbet.com> wrote:
> > > > [Junaili Lie - Wed at 12:34:32PM -0700]
> > > > > select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
> > > > > by f.p_id will work.
> > > > > But I understand this is not the most efficient way. Is there another
> > > > > way to rewrite this query? (maybe one that involves order by desc
> > > > > limit 1)
> > > >
> > > > eventually, try something like
> > > >
> > > >  select p.id,(select f.id from food f where f.p_id=p.id order by f.id desc limit 1)
> > > >  from person p
> > > >
> > > > not tested, no warranties.
> > > >
> > > > Since subqueries can be inefficient, use "explain analyze" to see which one
> > > > is actually better.
> > > >
> > > > This issue will be solved in future versions of postgresql.
> > > >
> > > > --
> > > > Tobias Brox, +47-91700050
> > > > Tallinn
> > > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 9: the planner will ignore your desire to choose an index scan if your
> > >       joining column's datatypes do not match
> >

Re: Help with rewriting query

От

"Kevin Grittner"

Дата:

10 июня 2005 г., 19:50:17

With your current (apparently well-normalized) schema, I don't see how
you can get a better query plan than that.  There may be something you
can do in terms of memory configuration to get it to execute somewhat
faster, but the only way to make it really fast is to de-normalize.
This is something which is often necessary for performance.

If you add a column to the person table for "last_food_id" and triggers
to maintain it when the food table is modified, voila!  You have a
simple and fast way to get the results you want.

-Kevin


>>> Junaili Lie <junaili@gmail.com> 06/09/05 8:30 PM >>>
Hi Kevin,
Thanks for the reply.
I tried that query. It definately faster, but not fast enough (took
around 50 second to complete).
I have around 2.5 million on food and 1000 on person.
Here is the query plan:
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..11662257.52 rows=1441579 width=16)
   Merge Cond: ("outer".id = "inner".p_id)
   ->  Index Scan using person_pkey on person p  (cost=0.00..25.17
rows=569 width=8)
   ->  Index Scan using p_id_food_index on food f
(cost=0.00..11644211.28 rows=1441579 width=16)
         Filter: (NOT (subplan))
         SubPlan
           ->  Index Scan using p_id_food_index on food f2
(cost=0.00..11288.47 rows=2835 width=177)
                 Index Cond: (p_id = $0)
                 Filter: (id > $1)
(9 rows)

I appreciate if you have further ideas to troubleshoot this issue.
Thank you!

On 6/8/05, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> This is a pattern which I've seen many of times.  I call it a "best
> choice" query -- you can easily match a row from one table against any
> of a number of rows in another, the trick is to pick the one that
> matters most.  I've generally found that I want the query results to
> show more than the columns used for making the choice (and there can
be
> many), which rules out the min/max technique.  What works in a pretty
> straitforward way, and generally optimizes at least as well as the
> alternatives, is to join to the set of candidate rows and add a "not
> exists" test to eliminate all but the best choice.
>
> For your example, I've taken some liberties and added hypothetical
> columns from both tables to the result set, to demonstrate how that
> works.  Feel free to drop them or substitute actual columns as you see
> fit.  This will work best if there is an index for the food table on
> p_id and id.  Please let me know whether this works for you.
>
> select p.id as p_id, p.fullname, f.id, f.foodtype, f.ts
> from food f join person p
> on f.p_id = p.id
> and not exists (select * from food f2 where f2.p_id = f.p_id and f2.id
>
> f.id)
> order by p_id
>
> Note that this construct works for inner or outer joins and works
> regardless of how complex the logic for picking the best choice is.  I
> think one reason this tends to optimize well is that an EXISTS test
can
> finish as soon as it finds one matching row.
>
> -Kevin
>
>
> >>> Junaili Lie <junaili@gmail.com> 06/08/05 2:34 PM >>>
> Hi,
> I have the following table:
> person - primary key id, and some attributes
> food - primary key id, foreign key p_id reference to table person.
>
> table food store all the food that a person is eating. The more recent
> food is indicated by the higher food.id.
>
> I need to find what is the most recent food a person ate for every
> person.
> The query:
> select f.p_id, max(f.id) from person p, food f where p.id=f.p_id group
> by f.p_id will work.
> But I understand this is not the most efficient way. Is there another
> way to rewrite this query? (maybe one that involves order by desc
> limit 1)
>
> Thank you in advance.
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if
> your
>      joining column's datatypes do not match
>
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Re: Help with rewriting query

От

Tobias Brox

Дата:

11 июня 2005 г., 10:00:31

[Kevin Grittner - Fri at 02:49:57PM -0500]
> If you add a column to the person table for "last_food_id" and triggers
> to maintain it when the food table is modified, voila!  You have a
> simple and fast way to get the results you want.

Reminds me about the way the precursor software of our product was made,
whenever it was needed to check the balance of a customer, it was needed to
scan the whole transaction table and sum up all transactions.  This
operation eventually took 3-4 seconds before we released the new software,
and the customers balance was supposed to show up at several web pages :-)

By now we have the updated balance both in the customer table and as
"post_balance" in the transaction table.  Sometimes redundancy is good.
Much easier to solve inconsistency problems as well :-)

--
Tobias Brox, +47-91700050

Re: Help with rewriting query

От

"Kevin Grittner"

Дата:

13 июня 2005 г., 14:21:32

I've done a lot of work with a bookkeeping system where we have such
redundancy built in.  The auditors, however, need to be able to generate
lists of the financial transaction detail to support balances.  These
reports are among the most demanding in the system.  I shudder to think
how unacceptable performance would be without the redundancy.

Also, due to multiple media failures, and backup process problems (on
another database product), a large database was badly mangled.  The
redundancies allowed us to reconstruct much data, and to at least
identify what was missing for the rest.

There is, of course, some cost for the redundancy.  Up front, someone
needs to code routines to maintain it.  It needs to be checked against
the underlying detail periodically, to prevent "drift".  And there is a
cost, usually pretty minimal, for the software to do the work.

I strongly recommend that some form of trigger (either native to the
database or, if portability is an issue, within a middle tier framework)
do the work of maintaining the redundant data.  If you rely on
application code to maintain it, you can expect that sooner or later it
will get missed.


>>> Tobias Brox <tobias@nordicbet.com> 06/11/05 4:59 AM >>>
[
Reminds me about the way the precursor software of our product was made,
whenever it was needed to check the balance of a customer, it was needed
to
scan the whole transaction table and sum up all transactions.  This
operation eventually took 3-4 seconds before we released the new
software,
and the customers balance was supposed to show up at several web pages
:-)

By now we have the updated balance both in the customer table and as
"post_balance" in the transaction table.  Sometimes redundancy is good.
Much easier to solve inconsistency problems as well :-)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Help with rewriting query