Обсуждение: PostgreSQL 9.1 : why is this query slow?

Поиск
Список
Период
Сортировка

PostgreSQL 9.1 : why is this query slow?

От
Joost Kraaijeveld
Дата:
Hi All,

I have a table with 665605 rows (counted, vacuum-ed):
CREATE TABLE unique_words
( filename text NOT NULL,
  filetype text NOT NULL,
  word text NOT NULL,
  count integer,)

The query is:
select f.word , count(f.word) from
unique_words as f,
unique_words as s ,
unique_words as n
where
(f.word = s.word and s.word = n.word)
and
(f.filetype = 'f' and s.filetype = 's' and n.filetype = 'n')
group by f.word

Explain says:
"GroupAggregate  (cost=0.00..67237557.88 rows=1397 width=6)"
"  ->  Nested Loop  (cost=0.00..27856790.31 rows=7876150720 width=6)"
"        ->  Nested Loop  (cost=0.00..118722.04 rows=14770776 width=12)"
"              ->  Index Scan using idx_unique_words_filetype_word on unique_words f  (cost=0.00..19541.47 rows=92098
width=6)"
"                    Index Cond: (filetype = 'f'::text)"
"              ->  Index Scan using idx_unique_words_filetype_word on unique_words s  (cost=0.00..0.91 rows=13
width=6)"
"                    Index Cond: ((filetype = 's'::text) AND (word = f.word))"
"        ->  Index Scan using idx_unique_words_filetype_word on unique_words n  (cost=0.00..1.33 rows=44 width=6)"
"              Index Cond: ((filetype = 'n'::text) AND (word = f.word))"


The right answer should be 3808 different words (according to a Java
program I wrote).

This query takes more than 1 hour (after which I cancelled the query).
My questions are:
- Is this to be expected?
- Especially as the query over just 1 join takes 32 secs? (on f.word =
s.word omitting everything for n )
- Why does explain say it takes "7876150720 rows"?
- Is there a way to rephrase the query that makes it faster?
- Could another table layout help (f,s,n are all possibilities for
filetype)?
- Anything else?????

TIA

--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl


Re: PostgreSQL 9.1 : why is this query slow?

От
Joost Kraaijeveld
Дата:
On Mon, 2011-11-28 at 17:42 +0100, Joost Kraaijeveld wrote:
> - Is there a way to rephrase the query that makes it faster?
This query goes faster (6224 ms, but I am not sure it gives the correct
answer as the result differs from my Java program):

select word, count (word) from unique_words
where
word in (select word from unique_words where
     word in ( select word from unique_words where filetype = 'f')
     and
     filetype = 's')
and
filetype = 'n'
group by word


--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl


Re: PostgreSQL 9.1 : why is this query slow?

От
"Kevin Grittner"
Дата:
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> wrote:

> This query goes faster (6224 ms, but I am not sure it gives the
> correct answer as the result differs from my Java program):

It seems clear that you want to see words which appear with all
three types of files, but it's not clear what you want the count to
represent.  The number of times the word appears in filetype 'n'
references (as specified in your second query)?  The number of
permutations of documents which incorporate one 'f' document, one
's' document, and one 'n' document (as specified in your first
query).  Something else, like the total number of times the word
appears?

-Kevin

Re: PostgreSQL 9.1 : why is this query slow?

От
Joost Kraaijeveld
Дата:
On Mon, 2011-11-28 at 11:05 -0600, Kevin Grittner wrote:
> Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> wrote:
>
> > This query goes faster (6224 ms, but I am not sure it gives the
> > correct answer as the result differs from my Java program):
>
> It seems clear that you want to see words which appear with all
> three types of files, but it's not clear what you want the count to
> represent.  The number of times the word appears in filetype 'n'
> references (as specified in your second query)?  The number of
> permutations of documents which incorporate one 'f' document, one
> 's' document, and one 'n' document (as specified in your first
> query).  Something else, like the total number of times the word
> appears?
I would like the answer to be "the number of times the word appears in
all three the queries", the intersection of the three queries.

--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl


Re: PostgreSQL 9.1 : why is this query slow?

От
"Kevin Grittner"
Дата:
Joost Kraaijeveld <J.Kraaijeveld@Askesis.nl> wrote:

> I would like the answer to be "the number of times the word
> appears in all three the queries", the intersection of the three
> queries.

That's still not entirely clear to me.  If there are two 'f' rows,
three 's' rows, and four 'n' rows, do you want to see an answer of 2
(which seems like the intersection you request here), 9 (which is
the sum), 24 (which is the product), or something else?

If you really want the intersection, perhaps:

with x as
  (
    select
        word,
        count(*) as countall,
        count(case when filetype = 'f' then 1 else null end)
          as countf,
        count(case when filetype = 's' then 1 else null end) as
          as counts,
        count(case when filetype = 'n' then 1 else null end) as
          as countn
      from unique_words
  )
select word, least(countf, counts, countn) from x
  where countf > 0 and counts > 0 and countn > 0
  order by word;

-Kevin

Re: PostgreSQL 9.1 : why is this query slow?

От
"Kevin Grittner"
Дата:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:

> If you really want the intersection, perhaps:

Or maybe closer:

with x as
  (
    select
        word,
        count(*) as countall,
        count(case when filetype = 'f' then 1 else null end)
          as countf,
        count(case when filetype = 's' then 1 else null end)
          as counts,
        count(case when filetype = 'n' then 1 else null end)
          as countn
      from unique_words
      group by word
  )
select word, least(countf, counts, countn) from x
  where countf > 0 and counts > 0 and countn > 0
  order by word;

Cranked out rather quickly and untested.

-Kevin

Re: PostgreSQL 9.1 : why is this query slow?

От
Tomas Vondra
Дата:
On 28.11.2011 17:42, Joost Kraaijeveld wrote:
> - Why does explain say it takes "7876150720 rows"?

Any idea where this number came from? No matter what I do, the nested
loop row estimates are alway very close to the product of the two
estimates (outer rows * inner rows).

Tomas

Re: PostgreSQL 9.1 : why is this query slow?

От
Joost Kraaijeveld
Дата:
On Mon, 2011-11-28 at 11:36 -0600, Kevin Grittner wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> wrote:
>
> > If you really want the intersection, perhaps:
>
> Or maybe closer:
>
> with x as
>   (
>     select
>         word,
>         count(*) as countall,
>         count(case when filetype = 'f' then 1 else null end)
>           as countf,
>         count(case when filetype = 's' then 1 else null end)
>           as counts,
>         count(case when filetype = 'n' then 1 else null end)
>           as countn
>       from unique_words
>       group by word
>   )
> select word, least(countf, counts, countn) from x
>   where countf > 0 and counts > 0 and countn > 0
>   order by word;
>
> Cranked out rather quickly and untested.

I tested it and it worked as advertised. Takes ~ 3 secs to complete.
Thanks.

--
Groeten,

Joost Kraaijeveld
Askesis B.V.
Molukkenstraat 14
6524NB Nijmegen
tel: 024-3888063 / 06-51855277
fax: 024-3608416
web: www.askesis.nl