Re: varchar index joins not working?

Поиск
Список
Период
Сортировка
От Richard Huxton
Тема Re: varchar index joins not working?
Дата
Msg-id 47FDD3D1.9070208@archonet.com
обсуждение исходный текст
Ответ на varchar index joins not working?  ("Adam Gundy" <adam@starsilk.net>)
Ответы Re: varchar index joins not working?
Список pgsql-performance
Adam Gundy wrote:
> I'm hitting an unexpected problem with postgres 8.3 - I have some
> tables which use varchar(32) for their unique IDs which I'm attempting
> to join using some simple SQL:
>
> select *
> from group_access, groups
> where group_access.groupid = groups.groupid and
>          group_access.uid = '7275359408f44591d0717e16890ce335';
>
> there's a unique index on group_access.groupid, and a non-unique index
> on groups.groupid. both are non-null.

What about group_access.uid - I'd have thought that + groups pkey is
probably the sensible combination here.

> the problem is: if groupid (in both tables) is varchar, I cannot force
> postgres (no matter how hard I try) to do an index scan. it ends up
> reading the entire groups table (pretty large!):

OK

>                                                         QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------
>  Hash Join  (cost=8.89..41329.88 rows=119940 width=287) (actual
> time=0.202..935.136 rows=981 loops=1)

That's because it's expecting 119,940 rows to match (rather than the
actual 981 you do get). If you were getting that many results this is
probably a sensible plan.

>    Hash Cond: ((groups.groupid)::text = (group_access.groupid)::text)
>    ->  Seq Scan on groups  (cost=0.00..31696.48 rows=1123348
> width=177) (actual time=0.011..446.091 rows=1125239 loops=1)

It's got a good idea of the total number of rows in groups.

>    ->  Hash  (cost=8.51..8.51 rows=30 width=110) (actual
> time=0.148..0.148 rows=30 loops=1)
>          ->  Seq Scan on group_access  (cost=0.00..8.51 rows=30
> width=110) (actual time=0.014..0.126 rows=30 loops=1)

And also group_access. Oh, the seq-scan doesn't really matter here. It
probably *is* faster to read all 30 rows in one burst rather than go to
the index and then back to the table.

>                Filter: ((uid)::text = '7275359408f44591d0717e16890ce335'::text)
>  Total runtime: 935.443 ms
> (7 rows)
>
> if I disable seq_scan, I get this:
>
>                                                                    QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------
>  Merge Join  (cost=1.47..106189.61 rows=120004 width=287) (actual
> time=0.100..1532.353 rows=981 loops=1)

It's still thinking it's going to get 120 thousand rows.

> it's running an index scan across the entire table (no condition applied) :-(
>
> so, just for the hell of it, I tried making groupid a char(32),
> despite repeated assertions in this group that there's no performance
> difference between the two:

There's no performance difference between the two.

>  Nested Loop  (cost=4.48..253.85 rows=304 width=291) (actual
> time=0.715..22.906 rows=984 loops=1)

> (this last plan is actually against a smaller test DB, but I get the
> same behavior with it, seq scan for varchar or index scan for char,
> and the results returned are identical for this query)

The char(32) thing isn't important here, what is important is that it's
expecting ~300 rows rather than 120,000. It's still wrong, but it's
close enough to make sense.

So - the question is - why is PG expecting so many matches to your join.
How many distinct values do you have in groups.groupid and
group_access.group_id?

--
   Richard Huxton
   Archonet Ltd

В списке pgsql-performance по дате отправления:

Предыдущее
От: PFC
Дата:
Сообщение: Re: large tables and simple "= constant" queries using indexes
Следующее
От: Matthew
Дата:
Сообщение: Re: large tables and simple "= constant" queries using indexes