Re: [HACKERS] Maintaining cluster order on insert

Поиск
Список
Период
Сортировка
От Ron Mayer
Тема Re: [HACKERS] Maintaining cluster order on insert
Дата
Msg-id 44DD13C6.2020609@cheapcomplexdevices.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Maintaining cluster order on insert  (Heikki Linnakangas <heikki@enterprisedb.com>)
Список pgsql-patches
Heikki Linnakangas wrote:
> Ron Mayer wrote:
>> In my case my biggest/slowest tables are clustered by zip-code (which
>> does a reasonable job at keeping counties/cities/etc on the
>> same pages too)....
>
> No deletes? If the tables grow over time, you probably would need to run
> CLUSTER every now and then to get the best performance, though the patch
> would alleviate that quite a lot.

Yup, pretty much no deletes; since it's a historical archive of
some government documents with address info.  Though I the
live system may periodically expunge data, say, 10+years old.

> Do you have a development environment where you could test what effect
> the patch would have? It would be interesting to have a real-world use
> case, since I don't have one myself at the moment.

I have a development environment, but it doesn't have the same
real-time-growing behavior, and only a small region of the country.
I suppose I could pre-load N-1 years and cluster it, and then
incrementally insert the last year of data to simulate the effect.

But sure, I'll attempt to try the patch; but don't really have any
good benchmarking environment to give any definitive results.  If
an anecdotal "this is how it feels to me" is useful, I can give one
of those.

>> > Your best bet might be to partition the table into two subtables, one
>> > with "stable" data and one with the fresh data.
>>
>> Hmm... that should work well for me too....
>
> Umm, if your inserts are uniformly distributed across the country, you
> wouldn't have a stable part, right?

Hmm.  Maybe.  I was thinking when archiving to the large table
an "order by" clause when inserting from the new partition to the
stable partition could at least make the big table "piecewise"
clustered so most records for a zip code fit in the same few disk
pages, even though those pages would still end up lying around
far apart on the disk.

I wonder what part of "CLUSTER" gives the most benefit - that
most records of a type fit on a few blocks;  or that those blocks
are next to each other so can be read sequentially?





В списке pgsql-patches по дате отправления:

Предыдущее
От: Jim Nasby
Дата:
Сообщение: Re: [HACKERS] Forcing current WAL file to be archived
Следующее
От: David Fetter
Дата:
Сообщение: Re: New variable server_version_num