Re: Large DB

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Large DB
Дата
Msg-id 28229.1080922581@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Large DB  (Manfred Koizar <mkoi-pg@aon.at>)
Ответы Re: Large DB  (Manfred Koizar <mkoi-pg@aon.at>)
Список pgsql-general
Manfred Koizar <mkoi-pg@aon.at> writes:
> The first step, however, (acquire_sample_rows() in analyze.c) has to
> read more rows than finally end up in the sample.  It visits less than
> O(nblocks) pages but certainly more than O(1).

> A vague feeling tries to tell me that the number of page reads is
> somehow related to the harmonic numbers 1 + 1/2 + 1/3 + ... + 1/n, which
> grow like O(ln(n)).

Good guess.  Vitter's paper says the expected time to sample n rows from
a table of size N is O(n * (1 + log(N/n))).

> I have an idea how this could be done with O(1) page reads.

The hard part is getting a genuinely random sample when we don't know N
in advance.  We do however know the table size in blocks, so if you're
willing to make assumptions about constant tuple density you could do
something different.  (But the tuple density assumption is exactly the
weak spot of what we've got, so I'm unconvinced that would be a big step
forward.)

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Bruno Wolff III
Дата:
Сообщение: Re: row-level security model
Следующее
От:
Дата:
Сообщение: Re: Compound keys and foreign constraints