Re: Cursor + upsert (astronomical data)

Поиск
Список
Период
Сортировка
От Marc Mamin
Тема Re: Cursor + upsert (astronomical data)
Дата
Msg-id B6F6FD62F2624C4C9916AC0175D56D8828AC3042@jenmbs01.ad.intershop.net
обсуждение исходный текст
Ответ на Re: Cursor + upsert (astronomical data)  (Craig James <cjames@emolecules.com>)
Список pgsql-performance
[Craig]
>>If you haven't looked at clustering algorithms yet, you might want to do so.
>>Your problem is a special case of clustering, where you have a large number
>>of small clusters.  A good place to start is the overview on Wikipedia:
>>http://en.wikipedia.org/wiki/Cluster_analysis

According to this list, your method is similar to http://en.wikipedia.org/wiki/Basic_sequential_algorithmic_scheme,
but with what seems to be some logical errors.

>The simplest approach I could think of is that I process each row of the 3e8 rows
>sequentially and ask:
>Do I have an identifier in the radius of 1 arcsec?
>No: Generate one and assign me to it.
>Yes: Update it and assigne me to it. The update is done as weighted average – I keep the
>number of how many observations the identifier has been computed. The result is that the
>identifier will have average coordinates of all the observations it identifies – it will
>be the center.
 

Let say you have 2 single measures on a line at arcsec 1 and 2.1. which hence correspond to 2 ipix_cat.
Now add a new measure at 1.9: as you choose any of the possible adjacent ipix_cat whitout considering the least distance, you may end with an ipix_at at 1.45 which is at less than 1 arcsec then the next one.
   Moreover you have raised the weight of both ipix_cat.
     Which increases the lock probability when trying a nutlithreaded upgrade.
     
The max distance between 2 observations belonging to a same ipix_cat tends to 2 arcsec with your method. If this is ok, you should probbaly modify your method so that the 2 first points of my example would have megred to a single ipix_cat. You could use your weigth for this: increase your search radius to 2arcsec and then reject the candidates located between 1 and 2 arsec depending on their weight. The additional work load might be compensated by the smaller number of ipix_cat that woul will have.     
     




В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Very slow planning performance on partition table
Следующее
От: Vitalii Tymchyshyn
Дата:
Сообщение: Re: Cursor + upsert (astronomical data)