Re: optimalisation with EXCEPT clause

Поиск
Список
Период
Сортировка
От Kincel, Martin
Тема Re: optimalisation with EXCEPT clause
Дата
Msg-id A5ED43533E983E4685C9E6156BE8874F0840DCBA@kenya.tronet.as
обсуждение исходный текст
Ответ на Re: optimalisation with EXCEPT clause  (Grzegorz Jaśkiewicz <gryzman@gmail.com>)
Список pgsql-general
Thank you for the answer Grzegorz.

> if you have a primary key on the table, and you should, you might get better performance using LEFT JOIN.

Well as far as I know, the result of such JOIN is a cartezian product, which is not exactly what I need. I need the
samestructure as table 'data' has. Or am I missing a trick how LEFT OUTER JOIN can be used instead of EXCEPT? :)
 

> EXCEPT will compare all columns, which might not be that fast, especially if those are text. (hence why I always tell
othersto use int as key in a table, but that's a different story). 
 

There is no good int to start using as a key in my 'data'. I would have to create one (out of some hash function,
diggestingthe whole row probably), but there is a strong possibility of adding colums into 'data' latter on, which
wouldrequire recalculation of such 'hash' column over and over again for millions of rows. While not impossible,
cerainlysomething I would like to avoid. 
 
Moreover, if one creates and maintains such hash column by hand and on his own, it is very likely, that he will forgot
something,or even mess it up completely. However, if there is a tool (something like an index on all colums) available
inthe database itself, I would be eager to use it.
 

Thanks again,
Winco


> -- 
> GJ



    * From: "Kincel, Martin" <MKincel@soitron.com>
    * To: <pgsql-general@postgresql.org>
    * Subject: optimalisation with EXCEPT clause
    * Date: Tue, 13 Apr 2010 17:01:18 +0200
    * Message-id: <A5ED43533E983E4685C9E6156BE8874F0840D83D@kenya.tronet.as>

Hello,


everyday I collect a couple of thousands rows of unique data from our
systems and I INSERT them into the table. Since I need no duplicate
data, I use EXCEPT clause when INSERTing, like this:

===
INSERT INTO data SELECT * FROM new_collected_data() EXCEPT SELECT * FROM
data;
===

It works exactly as I need, but there is a small issue I am thinking
about how to improve. Yes it's performance, what else? :)

Since I am INSERTing new_collected_data() in 10000-rows chunks into a
table already containing millions of rows, it takes a few minutes
(literally), which is something I understand and accept. 
However, I am wondering whether there is any way how to improve the
performance, either via indices, or ALTERing TABLE with UNIQUE
constraint or something else I might have completely forgot about.

Does anyone have any recommended approach how to speed up queries
containing EXCEPT clause? 


Thanks a lot,
Winco


В списке pgsql-general по дате отправления:

Предыдущее
От: Ostrovsky Eugene
Дата:
Сообщение: modification time & transaction synchronisation problem
Следующее
От: Greg Smith
Дата:
Сообщение: Re: readline library not found