Обсуждение: need much better query perfomance

Поиск
Список
Период
Сортировка

need much better query perfomance

От
andy@mixonic.com (Andy)
Дата:
Here's the problem:

Table track has 100,000 rows
Table album has 20,000 rows
Table album_track is a lookup table linking tracks to albums and has
80,000 rows

The track table has a PK, a deleted field ('Y' for deleted tracks /
null otherwise), and several track-specific fields.
The album table also has a PK and a deleted field ('Y' or null), as
well as several album-specific fields.
The album_track table simple has two columns: The PKs of the album and
track tables.

A deleted track may still exist in an not-deleted album.  A deleted
track is simply one that cannot be added to new albums.
Likewise, a deleted album may contain not-deleted tracks.  It is
simply an album that the user has deleted.

So even though we've marked deleted albums & tracks in the database,
we haven't removed "deleted" tracks from our fileserver.  Now we're
almost out of space, so we need to find all tracks that are marked as
deleted and which are not a part of any non-deleted albums.

I wrote what must be a very naive query to find such tracks.  Doing an
explain on this query gave a huge time estimate.  My query is below.
Would someone please suggest a faster approach?

  select t.track_id
    from track t
    where t.deleted = 'Y'
      UNION
  select track_id from track where track_id not in (
    select at.track_id from album_track at, album alb
      where at.album_id = alb.album_id and alb.deleted is null
  )

The above query should work, but it takes too long to execute.  The
sample query below, however, is fast but incorrect:

  select t.track_id
    from track t
    where t.deleted = 'Y'
      UNION
  select at.track_id
    from album_track at, album alb
    where at.album_id = alb.album_id and alb.deleted = 'Y'

This query is incorrect because it the bottom-most select is getting
all deleted albums, but a track might be in both deleted AND
non-deleted albums.  This incorrect query will return all the
"completely deleted" tracks, but will also return tracks that are part
of both deleted and non-deleted albums, and we don't want this.

Re: need much better query perfomance

От
Stephan Szabo
Дата:
On 24 Jan 2003, Andy wrote:

> I wrote what must be a very naive query to find such tracks.  Doing an
> explain on this query gave a huge time estimate.  My query is below.
> Would someone please suggest a faster approach?
>
>   select t.track_id
>     from track t
>     where t.deleted = 'Y'
>       UNION
>   select track_id from track where track_id not in (
>     select at.track_id from album_track at, album alb
>       where at.album_id = alb.album_id and alb.deleted is null
>   )

Try converting the in query into EXISTS
something like
select track_id from track where not exists (
 select at.track_id from album_track at, album alb where
  at.album_id=alb.album_id and alb.deleted is null and
  track.track_id=at.track_id;
 );


Re: need much better query perfomance

От
Tom Lane
Дата:
andy@mixonic.com (Andy) writes:
>   select track_id from track where track_id not in (
>     select at.track_id from album_track at, album alb
>       where at.album_id = alb.album_id and alb.deleted is null
>   )

> The above query should work, but it takes too long to execute.

The performance of NOT IN pretty much sucks :-(.  (7.4 will be
better, but that doesn't help you today.)  I'd suggest rewriting
to avoid that.  Perhaps

CREATE TEMP TABLE keepers AS
select at.track_id from album_track at, album alb
where at.album_id = alb.album_id and alb.deleted is null;

CREATE TEMP TABLE zappers AS
SELECT track_id FROM track EXCEPT SELECT track_id FROM keepers;

DELETE FROM track WHERE track_id = zappers.track_id;

Untested, use at your own risk, etc.

            regards, tom lane

Re: need much better query perfomance

От
"Leon Oosterwijk"
Дата:
We had a lot of problems with the NOT IN. It is indeed horrible as far as
performance goes. I rewrote all our NOT IN queries to LEFT OUTER JOINS. this
boosted performance and does not change the basic structure of the query.


SELECT . . .
FROM a LEFT OUTER JOIN b
WHERE b.pk IS NOT NULL


Sincerely,

Leon Oosterwijk
Dave Ramsey Inc.
leono@daveramsey.com


> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Tom Lane
> Sent: Wednesday, January 29, 2003 1:14 AM
> To: Andy
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] need much better query perfomance
>
>
> andy@mixonic.com (Andy) writes:
> >   select track_id from track where track_id not in (
> >     select at.track_id from album_track at, album alb
> >       where at.album_id = alb.album_id and alb.deleted is null
> >   )
>
> > The above query should work, but it takes too long to execute.
>
> The performance of NOT IN pretty much sucks :-(.  (7.4 will be
> better, but that doesn't help you today.)  I'd suggest rewriting
> to avoid that.  Perhaps
>
> CREATE TEMP TABLE keepers AS
> select at.track_id from album_track at, album alb
> where at.album_id = alb.album_id and alb.deleted is null;
>
> CREATE TEMP TABLE zappers AS
> SELECT track_id FROM track EXCEPT SELECT track_id FROM keepers;
>
> DELETE FROM track WHERE track_id = zappers.track_id;
>
> Untested, use at your own risk, etc.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly


Re: need much better query perfomance

От
Steve Crawford
Дата:
First the basics - have you run vacuum analyze? Also, if you can have
exclusive access to the database for a while, "vacuum full".

I haven't had my coffee but the query doesn't look right. If I understand the
question you want to find those tracks which are marked for deletion and do
not appear on any "active" (non-deleted) album.

I don't think union is the correct solution. I'd try something like:

select track_id from track where track.deleted = 'Y'
and
not exists (select * from album_track where
album_track.track_id = track.track_id
and
album_track.album_id = album.album.id
and
album.deleted is null)

Cheers,
Steve


>     from track t
>     where t.deleted = 'Y'
>       UNION
>   select track_id from track where track_id not in (
>     select at.track_id from album_track at, album alb
>       where at.album_id = alb.album_id and alb.deleted is null
>   )


On Friday 24 January 2003 3:28 pm, Andy wrote:
> Here's the problem:
>
> Table track has 100,000 rows
> Table album has 20,000 rows
> Table album_track is a lookup table linking tracks to albums and has
> 80,000 rows
>
> The track table has a PK, a deleted field ('Y' for deleted tracks /
> null otherwise), and several track-specific fields.
> The album table also has a PK and a deleted field ('Y' or null), as
> well as several album-specific fields.
> The album_track table simple has two columns: The PKs of the album and
> track tables.
>
> A deleted track may still exist in an not-deleted album.  A deleted
> track is simply one that cannot be added to new albums.
> Likewise, a deleted album may contain not-deleted tracks.  It is
> simply an album that the user has deleted.
>
> So even though we've marked deleted albums & tracks in the database,
> we haven't removed "deleted" tracks from our fileserver.  Now we're
> almost out of space, so we need to find all tracks that are marked as
> deleted and which are not a part of any non-deleted albums.
>
> I wrote what must be a very naive query to find such tracks.  Doing an
> explain on this query gave a huge time estimate.  My query is below.
> Would someone please suggest a faster approach?
>
>   select t.track_id
>     from track t
>     where t.deleted = 'Y'
>       UNION
>   select track_id from track where track_id not in (
>     select at.track_id from album_track at, album alb
>       where at.album_id = alb.album_id and alb.deleted is null
>   )
>
> The above query should work, but it takes too long to execute.  The
> sample query below, however, is fast but incorrect:
>
>   select t.track_id
>     from track t
>     where t.deleted = 'Y'
>       UNION
>   select at.track_id
>     from album_track at, album alb
>     where at.album_id = alb.album_id and alb.deleted = 'Y'
>
> This query is incorrect because it the bottom-most select is getting
> all deleted albums, but a track might be in both deleted AND
> non-deleted albums.  This incorrect query will return all the
> "completely deleted" tracks, but will also return tracks that are part
> of both deleted and non-deleted albums, and we don't want this.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org