Re: Join with an array
От | Oleg Bartunov |
---|---|
Тема | Re: Join with an array |
Дата | |
Msg-id | Pine.GSO.4.63.0602231502240.27379@ra.sai.msu.su обсуждение исходный текст |
Ответ на | Join with an array (Markus Schiltknecht <markus@bluegap.ch>) |
Ответы |
Re: Join with an array
(Markus Schiltknecht <markus@bluegap.ch>)
|
Список | pgsql-hackers |
Markus, have you seen contrib/intarray ? Oleg On Thu, 23 Feb 2006, Markus Schiltknecht wrote: > Hi, > > I'm trying to speed up a query with a lookup table. This lookup table > gets very big and should still fit into memory. It does not change very > often. Given these facts I decided to use an array, as follows: > > CREATE TABLE lookup_table (id INT PRIMARY KEY, items INT[] NOT NULL); > > I know this is not considered good database design, but it saves a lot > of overhead for tuple visibility compared to a 1:1 table. > > To fetch an item via the lookup_table I tried to use the following > query: > > SELECT i.id, i.title FROM item i > JOIN lookup_table lut ON i.id = ANY(lut.items) > WHERE lut.id = $LOOKUP_ID; > > Unfortunately that one seems to always use a sequential scan over items. > As the items array in the lookup table often has only 3 - 10 entries > (compared to about 1 mio rows in the item table) this is a very > expensive operation. > > I tried to circumvent the problem with generate_series: > > SELECT i.id, i.title FROM generate_series(0, $MAX) s > JOIN lookup_table lut ON s = ANY(lut.items) > JOIN item i ON s = i.id > WHERE lut.id = $LOOKUP_ID; > > That query uses the index to lookup the item, but as soon as $MAX gets > bigger than 10000 generate_series takes too long and too many > comparisons s = ANY(lut.items) need to be done. > > I think it would be possible to write a function generate_series(INT[]) > which returns all the elements of the array. The query would then look > something like: > > SELECT i.id, i.title > FROM generate_series(SELECT lut.items FROM lookup_table lut WHERE > lut.id = $LOOKUP_ID) s > JOIN item i ON s = i.id; > > Do you see any problem in implementing such function? Does something > similar already existt? > > Why does the first query use a seqscan instead of the index on items? Do > I miss anything? What problems do I face if I want to teach the planner > to use the index in the first query [1]? > > Regards > > Markus > > [1]: generally in most cases like "JOIN .. ON x IN ANY($ARRAY)" where > $ARRAY is reasonably small. > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
В списке pgsql-hackers по дате отправления: