Re: ANALYZE sampling is too good

Поиск
Список
Период
Сортировка
От Claudio Freire
Тема Re: ANALYZE sampling is too good
Дата
Msg-id CAGTBQpa-+wQg6ZL_+6X3dvX=8wn0LaLQ-3WDsVNfFpo_WYwu3g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: ANALYZE sampling is too good  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: ANALYZE sampling is too good
Список pgsql-hackers
On Mon, Dec 9, 2013 at 8:14 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 12/09/2013 11:56 PM, Claudio Freire wrote:
>> Without patches to the kernel, it is much better.
>>
>> posix_fadvise interferes with read-ahead, so posix_fadvise on, say,
>> bitmap heap scans (or similarly sorted analyze block samples) run at 1
>> IO / block, ie horrible, whereas aio can do read coalescence and
>> read-ahead when the kernel thinks it'll be profitable, significantly
>> increasing IOPS. I've seen everything from a 2x to 10x difference.
>
>
> How did you test that, given that we don't actually have an asynchronous I/O
> implementation? I don't recall any recent patches floating around either to
> do that. When Greg Stark investigated this back in 2007-2008 and implemented
> posix_fadvise() for bitmap heap scans, posix_fadvise certainly gave a
> significant speedup on the test data he used. What kind of a data
> distribution gives a slowdown like that?

That's basically my summarized experience from working on this[0]
patch, and the feedback given there about competing AIO work.

Sequential I/O was the biggest issue. I had to actively avoid
fadvising on sequential I/O, or sequential-ish I/O, which was a huge
burden on fadvise logic.

>
> I took a stab at using posix_fadvise() in ANALYZE. It turned out to be very
> easy, patch attached. Your mileage may vary, but I'm seeing a nice gain from
> this on my laptop. Taking a 30000 page sample of a table with 717717 pages
> (ie. slightly larger than RAM), ANALYZE takes about 6 seconds without the
> patch, and less than a second with the patch, with
> effective_io_concurrency=10. If anyone with a good test data set loaded
> would like to test this and post some numbers, that would be great.

Kernel version?

I raised this issue on LKML, and, while I got no news on this front,
they might have silently fixed it. I'd have to check the sources
again.

[0] http://www.postgresql.org/message-id/COL116-W162AEAA64173E77D4597EEA3670@phx.gbl



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jim Nasby
Дата:
Сообщение: Re: [RFC] Shouldn't we remove annoying FATAL messages from server log?
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: ANALYZE sampling is too good