Re: [HACKERS] sorting big tables :(

Поиск
Список
Период
Сортировка
От The Hermit Hacker
Тема Re: [HACKERS] sorting big tables :(
Дата
Msg-id Pine.BSF.3.96.980520103925.14056O-100000@hub.org
обсуждение исходный текст
Ответ на Re: [HACKERS] sorting big tables :(  (Bruce Momjian <maillist@candle.pha.pa.us>)
Ответы Re: [HACKERS] sorting big tables :(  (Bruce Momjian <maillist@candle.pha.pa.us>)
Re: [HACKERS] sorting big tables :(  (Tom <tom@sdf.com>)
Список pgsql-hackers
On Wed, 20 May 1998, Bruce Momjian wrote:

> > > Well, I think it might be optimised slightly. Am I right that postgres
> > > uses heap (i.e. they look like tables) files during sorting? While this
> > > is a merge sort, those files doesn't have to be a table-like files.
> > > Certainly, they might variable length records without pages (aren't they
> > > used sequentially). Moreover we would consider packing tape files before
> > > writting them down if necessary. Of course it will result in some
> > > performance dropdown. However it's better to have less performance that
> > > being unable to sort it at all.
> > >
> > > Last question... What's the purpose of such a big sort? If somebody gets
> > > 40M of sorted records in a result of some query, what would he do with
> > > it? Is he going to spent next years on reading this lecture? I mean,
> > > isn't it worth to query the database for necessary informations only and
> > > then sort it?
> >
> >     this I don't know...I never even really thought about that,
> > actually...Michael? :)  Only you can answer that one.
>
> I have an idea.  Can he run CLUSTER on the data?  If so, the sort will
> not use small batches, and the disk space during sort will be reduced.
> However, I think CLUSTER will NEVER finish on such a file, unless it is
> already pretty well sorted.

    Okay...then we *do* have a table size limit problem?  Tables that
just get too large to be manageable?  Maybe this is one area we should be
looking at as far as performance is concerned?

    One thing that just pop'd to mind, concerning the above CLUSTER
command...what would it take to have *auto-cluster'ng*?  Maybe provide a
means of marking a field in a table for this purpose?

    One of the things that the Unix FS does is auto-defragmenting, at
least the UFS one does.  Whenever the system is idle (from my
understanding), the kernel uses that time to clean up the file systems, to
reduce the file system fragmentation.

    This is by no means SQL92, but it would be a neat
"extension"...let me specify a "CLUSTER on" field.  Then, as I'm entering
data into the database, periodically check for fragmentation of the data
and clean up accordingly.  If done by the system, reasonably often, it
shouldn't take up *too* much time, as most of the data should already be
in order...

    That would have the side-benefit of speeding up the "ORDER by" on
that field also...




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [HACKERS] sorting big tables :(
Следующее
От: "Thomas G. Lockhart"
Дата:
Сообщение: Re: [HACKERS] Re: [DOCS] Re: FE/BE protocol revision patch