Re: varchar index joins not working?
От | Richard Huxton |
---|---|
Тема | Re: varchar index joins not working? |
Дата | |
Msg-id | 47FDD3D1.9070208@archonet.com обсуждение исходный текст |
Ответ на | varchar index joins not working? ("Adam Gundy" <adam@starsilk.net>) |
Ответы |
Re: varchar index joins not working?
|
Список | pgsql-performance |
Adam Gundy wrote: > I'm hitting an unexpected problem with postgres 8.3 - I have some > tables which use varchar(32) for their unique IDs which I'm attempting > to join using some simple SQL: > > select * > from group_access, groups > where group_access.groupid = groups.groupid and > group_access.uid = '7275359408f44591d0717e16890ce335'; > > there's a unique index on group_access.groupid, and a non-unique index > on groups.groupid. both are non-null. What about group_access.uid - I'd have thought that + groups pkey is probably the sensible combination here. > the problem is: if groupid (in both tables) is varchar, I cannot force > postgres (no matter how hard I try) to do an index scan. it ends up > reading the entire groups table (pretty large!): OK > QUERY PLAN > -------------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=8.89..41329.88 rows=119940 width=287) (actual > time=0.202..935.136 rows=981 loops=1) That's because it's expecting 119,940 rows to match (rather than the actual 981 you do get). If you were getting that many results this is probably a sensible plan. > Hash Cond: ((groups.groupid)::text = (group_access.groupid)::text) > -> Seq Scan on groups (cost=0.00..31696.48 rows=1123348 > width=177) (actual time=0.011..446.091 rows=1125239 loops=1) It's got a good idea of the total number of rows in groups. > -> Hash (cost=8.51..8.51 rows=30 width=110) (actual > time=0.148..0.148 rows=30 loops=1) > -> Seq Scan on group_access (cost=0.00..8.51 rows=30 > width=110) (actual time=0.014..0.126 rows=30 loops=1) And also group_access. Oh, the seq-scan doesn't really matter here. It probably *is* faster to read all 30 rows in one burst rather than go to the index and then back to the table. > Filter: ((uid)::text = '7275359408f44591d0717e16890ce335'::text) > Total runtime: 935.443 ms > (7 rows) > > if I disable seq_scan, I get this: > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------ > Merge Join (cost=1.47..106189.61 rows=120004 width=287) (actual > time=0.100..1532.353 rows=981 loops=1) It's still thinking it's going to get 120 thousand rows. > it's running an index scan across the entire table (no condition applied) :-( > > so, just for the hell of it, I tried making groupid a char(32), > despite repeated assertions in this group that there's no performance > difference between the two: There's no performance difference between the two. > Nested Loop (cost=4.48..253.85 rows=304 width=291) (actual > time=0.715..22.906 rows=984 loops=1) > (this last plan is actually against a smaller test DB, but I get the > same behavior with it, seq scan for varchar or index scan for char, > and the results returned are identical for this query) The char(32) thing isn't important here, what is important is that it's expecting ~300 rows rather than 120,000. It's still wrong, but it's close enough to make sense. So - the question is - why is PG expecting so many matches to your join. How many distinct values do you have in groups.groupid and group_access.group_id? -- Richard Huxton Archonet Ltd
В списке pgsql-performance по дате отправления:
Следующее
От: MatthewДата:
Сообщение: Re: large tables and simple "= constant" queries using indexes