Re: O(samplesize) tuple sampling, proof of concept

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: O(samplesize) tuple sampling, proof of concept
Дата
Msg-id 16177.1081204229@sss.pgh.pa.us
обсуждение исходный текст
Ответ на O(samplesize) tuple sampling, proof of concept  (Manfred Koizar <mkoi-pg@aon.at>)
Ответы Re: O(samplesize) tuple sampling, proof of concept  (Manfred Koizar <mkoi-pg@aon.at>)
Список pgsql-patches
Manfred Koizar <mkoi-pg@aon.at> writes:
>>> Because I was afraid that method 1 might be too expensive in terms of
>>> CPU cycles, I implemented a small variation that skips tuples without
>>> checking them for visibility; this is sampling_method 2.
>>
>> And?  Does it matter?

> There's a clearly visible difference for mid-size relations.  I'm not
> sure whether this can be attributed to visibility bit updating or other
> noise-contributing factors.

I think it would have to be visibility-bit updates.  Can you try it with
a pre-vacuumed relation, so that there is no slowdown for updates?

The update cost will have to be paid sooner or later by some xact, and
on the whole it's better that it be done by a maintenance xact than by
a foreground client; so even if there is a noticeable slowdown here,
that doesn't seem like a reason not to inspect all the tuples.

>>> I find -u diffs close to unreadable for reviewing purposes.  Please
>>> submit diffs in -c format in future.

> De gustibus non est disputandum :-)
> Fortunately this patch wouldn't look much different.  There is just a
> bunch of "+" lines.

Yeah, so I managed to read it anyway ;-).  It's the ones with
intricately intermixed "-" and "+" that I find difficult to follow.

>> AFAICS the rows will *always* be sorted already, and so this qsort is an
>> utter waste of cycles.  No?

> I don't think so.  We get the *blocks* in the correct order.  But tuples
> are still sampled by the Vitter method which starts to replace random
> tuples after the pool is filled.

Duh, you're right --- I was thinking that the old code doesn't need a
qsort, but it does.  This seems a tad annoying considering that we know
the tuples were inserted into the pool in increasing order.  I wonder if
it's worth using a more complex data structure to keep track of both
orders at once?  I suppose that could easily wind up costing more than
the qsort though ...

            regards, tom lane

В списке pgsql-patches по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: O(samplesize) tuple sampling, proof of concept
Следующее
От: Manfred Koizar
Дата:
Сообщение: Re: O(samplesize) tuple sampling, proof of concept