many to many performance

Поиск
Список
Период
Сортировка
От Chavdar Kopoev
Тема many to many performance
Дата
Msg-id 200811261701375310108@kopoev.com
обсуждение исходный текст
Список pgsql-performance
Hello,

I have following common situation:

Category IDs: about 50 000
Document IDs: about 3 000 000
Many to many relationship.
A document id have a relation with 10 up to 1000 category ids
One query, with input set of document ids, resulting set of category ids, having relation with input ids. (very often
thisquery is called with more than 500k of input document ids) 

I use custom written datawarehouse, file storage, and memory kept "id offset" like indecies. So a query for all 3
milliondocument ids, resulting almost all category ids take less than a second on desktop machine. Abstracting from
concreterealization, query plan is like: 
1. for each input document id: look up an array by document id and retrive a list of ralated category ids.
1.1 for each category id in the list: look up an array value by category id and mark it as found
2. traverse category array to extract category ids marked as found

I want to use as a data storage postgresql. Tried several data structures, testing btree, gin, gist indecies over them,
butbest achieved performance for a 10 times smaller dataset (10k cat ids, 100k doc ids, 1m relations) is slower more
than5 times. 

I read about postgresql bitmap indecies and "data lookup" when scanning indecies to get a value for current
transaction.Downloaded latest v8.4 snapshot, compiled it, but as I see there is no bitmap index in it. Maybe if I
downloadHEAD revision I will find them there, dont know. 

Anyway, I want to ask, have anyone faced similar situation, and is there any way to achive closer to optimal
performanceusing postgresql functionality and extensibility? 

Regards,
Chavdar Kopoev


В списке pgsql-performance по дате отправления:

Предыдущее
От: "Andrus"
Дата:
Сообщение: Re: Increasing pattern index query speed
Следующее
От: Richard Huxton
Дата:
Сообщение: Re: Increasing pattern index query speed