Re: MD5 aggregate

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема Re: MD5 aggregate
Дата
Msg-id CAEZATCUzc6V1t+rmQtyhssGAPJ6g=s6YwsROTVgq+Hrqi90TYQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: MD5 aggregate  (Hannu Krosing <hannu@2ndQuadrant.com>)
Список pgsql-hackers
On 14 June 2013 16:09, Hannu Krosing <hannu@2ndquadrant.com> wrote:
> What skytools/pgq/londiste uses for comparing tables on master
> and slave is query like this
>
> select sum(hashtext(t.*::text)) from <yourtable> t;
>
> This is non-modulo sum and does not use md5 but relies on
> whatever the hashtext() du jour is :)
>
> So it is not comparable to anything external (like the md5sum
> compatible idea above) but is usually good enough for fast
> checks of compatible tables.
>
> As tables are unordered by definition anyway, this should be
> good enough for most SQL.
>
> The speed comes from both fast(er) hashtext() function and
> avoiding the sort.
>

That sounds like a pretty good approach. We could do that if we had a
version of md5() that returned numeric. My impression is that numeric
computations are pretty fast compared to the sorting overhead.

On the other hand, if there is a usable index, select md5_agg(..) from
(sub-query) will do and index scan rather than a sort, making it much
faster than using an ORDER BY in the aggregate.

Regards,
Dean



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fujii Masao
Дата:
Сообщение: pg_restore -l with a directory archive
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_filedump 9.3: checksums (and a few other fixes)