Re: [HACKERS] \dt and disk access

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: [HACKERS] \dt and disk access
Дата
Msg-id f336f4eae22a60b3f2c09348a2230c8b
обсуждение исходный текст
Ответ на [HACKERS] \dt and disk access  (Bruce Momjian <maillist@candle.pha.pa.us>)
Список pgsql-hackers
>
> Bruce Momjian wrote:
> >
> > > Can you make the size of the result set above which diskfiles will be used
> > > configurable? That way ppl with loads of RAM can use huge buffers, and ppl
> > > with little RAM can keep that RAM free for other processes.
> >
> > If I need a configuration value, I will either determine the amount of
> > RAM portably, or base the value on the number of shared buffers
> > requested with -B.
>
> Why don't use additional flag as it suggested by Mark ?
> Using -B is not good: the size of shared memory segment may be
> limited by OS (under FreeBSD 2.1.0 it's limited to 4M) or system
> administrator and so backend will use 2-4 M of RAM for sort on
> box with 192 M RAM ?

OK, I will use a new flag.

>
> This flag may be useful for joinpath.c:EnoughMemoryForHashjoin() too...
>
> Also note that following
> >       - make psort read directly from the executor node below it
> >       (instead of an input relation)
>
> it will be impossible to know the size of result before sort startup.
> So you have to use palloc up to in-memory limit and switch to
> 'tape' files dynamically.

I like this idea.  I was struggling on how I was going to determine the
size of the result anyway.

I have checked the Mariposa source changes Paul mentioned.  They do
indeed change the behavior or psort().  It still uses tape files, so I
will need to increase the memory used for each sort, and only create the
tape files if the initial sort does not fit within the allocated memory.

>
> Also
> >       - makes the Sort node read directly from the last set of psort runs
> >       (instead of an output relation)
>
> require changes to ExecSortMarkPos()/ExecSortRestrPos() which
> use heap_markpos()/heap_restrpos() (because of last set of
> psort is not normal heap relation).
>
> But both changes of nodeSort.c are what we really need.

With the new psort(), you can do multiple sorts at the same time.  The
new psort() comment says:

 *      The old psort.c's routines formed a temporary relation from the merged
 * sort files. This version keeps the files around instead of generating the
 * relation from them, and provides interface functions to the file so that
 * you can grab tuples, mark a position in the file, restore a position in the
 * file. You must now explicitly call an interface function to end the sort,
 * psort_end, when you are done.
 *      Now most of the global variables are stuck in the Sort nodes, and
 * accessed from there (they are passed to all the psort routines) so that
 * each sort running has its own separate state. This is facilitated by having
 * the Sort nodes passed in to all the interface functions.
 *      The one global variable that all the sorts still share is SortMemory.
 *      You should now be allowed to run two or more psorts concurrently,
 * so long as the memory they eat up is not greater than SORTMEM, the initial
 * value of SortMemory.                                         -Rex 2.15.1995
 *
 *    Use the tape-splitting method (Knuth, Vol. III, pp281-86) in the future.

I am uploading mariposa-alpha-1.tar.gz to the postgreSQL ftp incoming
directory because I think I am going to need help on this one.  The
official mariposa ftp site is very, very slow and unreliable.  This
release is dated June, 1996, and is the newest available.

- --
Bruce Momjian
maillist@candle.pha.pa.us

------------------------------

В списке pgsql-hackers по дате отправления: