Collect frequency statistics for arrays

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Collect frequency statistics for arrays
Дата
Msg-id CAPpHfdvTfDZ7OeFGUdv9s=2EKV9cDF3AjXznbNrp1xbzwF7kpA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Collect frequency statistics for arrays
Список pgsql-hackers
Hi!

There is updated version of patch. General list of changes since reviewed version:
1) Distinct slot is used for length histogram.
2) Standard statistics is collected for arrays.
3) Most common values and most common elements are mapped to distinct columns of pg_stats view, because both of them are calculated for arrays.
4) Description of lossy counting algorithm was copied from compute_tsvector_stats with corresponding changes in it.
5) In estimation functions comments about assumtions were added.

Accuracy testing

Following files are attached.
datasets.sql - sql script which generates test datasets
arrayanalyze.php - php script which does accuracy testing
results.sql - dump of table with tests results

As we can see from testing results, estimates seem to be quite accurate in most part of test cases. When length of constant array exceeds 30, estimate of "column <@ const" is very inaccurate for arrat_test3 table. It's related with skipping of length histogram usage because of high CPU usage during estimate (see array_sel.c:888).

------
With best regards,
Alexander Korotkov.
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Florian Pflug
Дата:
Сообщение: Re: lexemes in prefix search going through dictionary modifications
Следующее
От: Alvaro Herrera
Дата:
Сообщение: isolationtester's "dry run" mode