Обсуждение: sort on huge table

Поиск

Список

Период

Сортировка

sort on huge table

От

Tatsuo Ishii

Дата:

13 октября 1999 г., 09:03:45

I came across problems with sorting a huge (2.4GB) table.

o it took 46 minutes to complete following query:
select * from test2 order by i desc limit 100;
 to get 0 results.
i|t-+-(0 rows)
 I assume this is a failure.
 note: this is Pentium III x2 with 512MB RAM running RedHat Linux 6.0.

o I got NOTICE:  BufFileRead: should have flushed after writing at the very end of the processing.

o it produced 7 sort temp files each having size of 1.4GB (total 10GB)

Here is the table I used for testing(no index):

CREATE TABLE test2 (       i int4,       t text);

This has 10000000 records and the table file sizes are:

$ ls -ls test2*
1049604 -rw-------   1 postgres postgres 1073741824 Oct  4 18:32 test2
1049604 -rw-------   1 postgres postgres 1073741824 Oct  5 01:19 test2.1327420 -rw-------   1 postgres postgres
334946304Oct 13 17:40 test2.2
 

--
Tatsuo Ishii

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

13 октября 1999 г., 13:18:50

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> I came across problems with sorting a huge (2.4GB) table.

The current sorting code will fail if the data volume exceeds whatever
the maximum file size is on your OS.  (Actually, if long is 32 bits,
it might fail at 2gig even if your OS can handle 4gig; not sure, but
it is doing signed-long arithmetic with byte offsets...)

I am just about to commit code that fixes this by allowing temp files
to have multiple segments like tables can.

> o it took 46 minutes to complete following query:

What -S setting are you using?  Increasing it should reduce the time
to sort, so long as you don't make it so large that the backend starts
to swap.  The current default seems to be 512 (Kb) which is probably
on the conservative side for modern machines.

> o it produced 7 sort temp files each having size of 1.4GB (total 10GB)

Yes, I've been seeing space consumption of about 4x the actual data
volume.  Next step is to revise the merge algorithm to reduce that.
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

13 октября 1999 г., 14:11:51

I wrote:
> The current sorting code will fail if the data volume exceeds whatever
> the maximum file size is on your OS.  (Actually, if long is 32 bits,
> it might fail at 2gig even if your OS can handle 4gig; not sure, but
> it is doing signed-long arithmetic with byte offsets...)

> I am just about to commit code that fixes this by allowing temp files
> to have multiple segments like tables can.

OK, committed.  I have tested this code using a small RELSEG_SIZE,
and it seems to work, but I don't have the spare disk space to try
a full-scale test with > 4Gb of data.  Anyone care to try it?

I have not yet done anything about the excessive space consumption
(4x data volume), so plan on using 16+Gb of diskspace to sort a 4+Gb
table --- and that's not counting where you put the output ;-)
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Tatsuo Ishii

Дата:

14 октября 1999 г., 00:35:08

> > The current sorting code will fail if the data volume exceeds whatever
> > the maximum file size is on your OS.  (Actually, if long is 32 bits,
> > it might fail at 2gig even if your OS can handle 4gig; not sure, but
> > it is doing signed-long arithmetic with byte offsets...)
> 
> > I am just about to commit code that fixes this by allowing temp files
> > to have multiple segments like tables can.
> 
> OK, committed.  I have tested this code using a small RELSEG_SIZE,
> and it seems to work, but I don't have the spare disk space to try
> a full-scale test with > 4Gb of data.  Anyone care to try it?

I will test it with my 2GB table. Creating 4GB would probably be
possible, but I don't have enough sort space for that:-) I ran my
previous test on 6.5.2, not on current. I hope current is stable
enough to perform my testing.

> I have not yet done anything about the excessive space consumption
> (4x data volume), so plan on using 16+Gb of diskspace to sort a 4+Gb
> table --- and that's not counting where you put the output ;-)

Talking about the -S, I did use the default since setting -S seems to
consume too much memory. For example, if I set it to 128MB, backend
process grows over 512MB and it was killed due to swap space was run
out. Maybe 4x law can be also applicated to -S?
---
Tatsuo Ishii

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

14 октября 1999 г., 13:41:14

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> OK, committed.  I have tested this code using a small RELSEG_SIZE,
>> and it seems to work, but I don't have the spare disk space to try
>> a full-scale test with > 4Gb of data.  Anyone care to try it?

> I will test it with my 2GB table. Creating 4GB would probably be
> possible, but I don't have enough sort space for that:-)

OK.  I am working on reducing the space requirement, but it would be
nice to test the bottom-level multi-temp-file code before layering
more stuff on top of it.  Anyone else have a whole bunch of free
disk space they could try a big sort with?

> I ran my previous test on 6.5.2, not on current. I hope current is
> stable enough to perform my testing.

It seems reasonably stable here, though I'm not doing much except
testing... main problem is you'll need to initdb, which means importing
your large dataset...

> Talking about the -S, I did use the default since setting -S seems to
> consume too much memory. For example, if I set it to 128MB, backend
> process grows over 512MB and it was killed due to swap space was run
> out. Maybe 4x law can be also applicated to -S?

If the code is working correctly then -S should be obeyed ---
approximately, anyway, since psort.c only counts the actual tuple data;
it doesn't know anything about AllocSet overhead &etc.  But it looked
to me like there might be some plain old memory leaks in psort.c, which
could account for actual usage being much more than intended.  I am
going to work on cleaning up psort.c after I finish building
infrastructure for it.
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Tatsuo Ishii

Дата:

14 октября 1999 г., 14:00:14

>> I will test it with my 2GB table. Creating 4GB would probably be
>> possible, but I don't have enough sort space for that:-)
>
>OK.  I am working on reducing the space requirement, but it would be
>nice to test the bottom-level multi-temp-file code before layering
>more stuff on top of it.  Anyone else have a whole bunch of free
>disk space they could try a big sort with?
>
>> I ran my previous test on 6.5.2, not on current. I hope current is
>> stable enough to perform my testing.
>
>It seems reasonably stable here, though I'm not doing much except
>testing... main problem is you'll need to initdb, which means importing
>your large dataset...

I have done the 2GB test on current (with your fixes). This time the
sorting query worked great! I saw lots of temp files, but the total
disk usage was almost same as before (~10GB). So I assume this is ok.

>> Talking about the -S, I did use the default since setting -S seems to
>> consume too much memory. For example, if I set it to 128MB, backend
>> process grows over 512MB and it was killed due to swap space was run
>> out. Maybe 4x law can be also applicated to -S?
>
>If the code is working correctly then -S should be obeyed ---
>approximately, anyway, since psort.c only counts the actual tuple data;
>it doesn't know anything about AllocSet overhead &etc.  But it looked
>to me like there might be some plain old memory leaks in psort.c, which
>could account for actual usage being much more than intended.  I am
>going to work on cleaning up psort.c after I finish building
>infrastructure for it.

I did set the -S to 8MB, and it seems boost the performance. It took
only 22:37 (previous result was ~45:00).
---
Tatsuo Ishii

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

14 октября 1999 г., 14:05:14

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> I have done the 2GB test on current (with your fixes). This time the
> sorting query worked great! I saw lots of temp files, but the total
> disk usage was almost same as before (~10GB). So I assume this is ok.

Sounds like it is working then.  Thanks for running the test.  I'll try
to finish the next step this weekend.
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

16 октября 1999 г., 19:30:03

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> I have done the 2GB test on current (with your fixes). This time the
> sorting query worked great! I saw lots of temp files, but the total
> disk usage was almost same as before (~10GB). So I assume this is ok.

I have now committed another round of changes that reduce the temp file
size to roughly the volume of data to be sorted.  It also reduces the
number of temp files --- there will be only one per GB of sort data.
If you could try sorting a table larger than 4GB with this code, I'd be
much obliged.  (It *should* work, of course, but I just want to be sure
there are no places that will have integer overflows when the logical
file size exceeds 4GB.)  I'd also be interested in how the speed
compares to the old code on a large table.

Still need to look at the memory-consumption issue ... and CREATE INDEX
hasn't been taught about any of these fixes yet.
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

17 октября 1999 г., 21:30:21

OK, I have now finished up my psort reconstruction project.  Sort nodes
and btree CREATE INDEX now use the same sorting module, which is better
than either one was to start with.

This resolves the following TODO items:

* Make index creation use psort code, because it is now faster(Vadim)
* Allow creation of sort temp tables > 1 Gig

Also, sorting will now notice if it runs out of disk space, which it
frequently would not before :-(.  Both memory and disk space are used
more sparingly than before, as well.
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Bruce Momjian

Дата:

18 октября 1999 г., 00:12:23

> OK, I have now finished up my psort reconstruction project.  Sort nodes
> and btree CREATE INDEX now use the same sorting module, which is better
> than either one was to start with.
> 
> This resolves the following TODO items:
> 
> * Make index creation use psort code, because it is now faster(Vadim)
> * Allow creation of sort temp tables > 1 Gig
> 
> Also, sorting will now notice if it runs out of disk space, which it
> frequently would not before :-(.  Both memory and disk space are used
> more sparingly than before, as well.

Great.  TODO changes made.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] sort on huge table

От

Tatsuo Ishii

Дата:

18 октября 1999 г., 05:09:27

>Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> I have done the 2GB test on current (with your fixes). This time the
>> sorting query worked great! I saw lots of temp files, but the total
>> disk usage was almost same as before (~10GB). So I assume this is ok.
>
>I have now committed another round of changes that reduce the temp file
>size to roughly the volume of data to be sorted.  It also reduces the
>number of temp files --- there will be only one per GB of sort data.
>If you could try sorting a table larger than 4GB with this code, I'd be
>much obliged.  (It *should* work, of course, but I just want to be sure
>there are no places that will have integer overflows when the logical
>file size exceeds 4GB.)  I'd also be interested in how the speed
>compares to the old code on a large table.
>
>Still need to look at the memory-consumption issue ... and CREATE INDEX
>hasn't been taught about any of these fixes yet.

I tested with a 1GB+ table (has a segment file) and a 4GB+ table (has
four segment files) and got same error message:

ERROR:  ltsWriteBlock: failed to write block 131072 of temporary file               Perhaps out of disk space?

Of course disk space is enough, and no physical errors were
reported. Seems the error is raised when the temp file hits 1GB?
--
Tatsuo Ishii

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

18 октября 1999 г., 13:38:36

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> If you could try sorting a table larger than 4GB with this code, I'd be
>> much obliged.

> ERROR:  ltsWriteBlock: failed to write block 131072 of temporary file
>                 Perhaps out of disk space?

Drat.  I'll take a look --- thanks for running the test.
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

19 октября 1999 г., 02:18:45

I wrote:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>>> If you could try sorting a table larger than 4GB with this code, I'd be
>>> much obliged.

>> ERROR:  ltsWriteBlock: failed to write block 131072 of temporary file
>> Perhaps out of disk space?

> Drat.  I'll take a look --- thanks for running the test.

That's what I get for not testing the interaction between logtape.c
and buffile.c at a segment boundary --- it didn't work, of course :-(.
I rebuilt with a small RELSEG_SIZE and debugged it.  I'm still concerned
about possible integer overflow problems, so please update and try again
with a large file.
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Tatsuo Ishii

Дата:

19 октября 1999 г., 07:49:50

>That's what I get for not testing the interaction between logtape.c
>and buffile.c at a segment boundary --- it didn't work, of course :-(.
>I rebuilt with a small RELSEG_SIZE and debugged it.  I'm still concerned
>about possible integer overflow problems, so please update and try again
>with a large file.

It worked with 2GB+ table but was much slower than before.

Before(with 8MB sort memory): 22 minutes

After(with 8MB sort memory): 1 hour and 5 minutes
After(with 80MB sort memory): 42 minutes.
--
Tatsuo Ishii

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

19 октября 1999 г., 13:17:53

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> It worked with 2GB+ table but was much slower than before.
> Before(with 8MB sort memory): 22 minutes
> After(with 8MB sort memory): 1 hour and 5 minutes
> After(with 80MB sort memory): 42 minutes.

Oh dear.  I had tested it with smaller files and concluded that it was
no slower than before ... I guess there is some effect I'm not seeing
here.  Can you tell whether the extra time is computation or I/O (how
much does the runtime of the backend change between old and new code)?
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Tatsuo Ishii

Дата:

20 октября 1999 г., 00:06:01

>Oh dear.  I had tested it with smaller files and concluded that it was
>no slower than before ... I guess there is some effect I'm not seeing
>here.  Can you tell whether the extra time is computation or I/O (how
>much does the runtime of the backend change between old and new code)?

How can I do this? Maybe I should run the backend in stand alone mode?
---
Tatsuo Ishii

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

30 октября 1999 г., 16:38:20

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> It worked with 2GB+ table but was much slower than before.
> Before(with 8MB sort memory): 22 minutes
> After(with 8MB sort memory): 1 hour and 5 minutes
> After(with 80MB sort memory): 42 minutes.

I've committed some changes to tuplesort.c to try to improve
performance.  Would you try your test case again with current
sources?  Also, please see if you can record the CPU time
consumed by the backend while doing the sort.
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Tatsuo Ishii

Дата:

01 ноября 1999 г., 04:35:57

>Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> It worked with 2GB+ table but was much slower than before.
>> Before(with 8MB sort memory): 22 minutes
>> After(with 8MB sort memory): 1 hour and 5 minutes
>> After(with 80MB sort memory): 42 minutes.
>
>I've committed some changes to tuplesort.c to try to improve
>performance.  Would you try your test case again with current
>sources?  Also, please see if you can record the CPU time
>consumed by the backend while doing the sort.

It's getting better, but still slower than before.

52:50 (with 8MB sort memory)

ps shows 7:15 was consumed by the backend. I'm going to test with 80MB 
sort memory.
--
Tatsuo Ishii

Re: [HACKERS] sort on huge table

От

Tatsuo Ishii

Дата:

01 ноября 1999 г., 05:10:58

>>> It worked with 2GB+ table but was much slower than before.
>>> Before(with 8MB sort memory): 22 minutes
>>> After(with 8MB sort memory): 1 hour and 5 minutes
>>> After(with 80MB sort memory): 42 minutes.
>>
>>I've committed some changes to tuplesort.c to try to improve
>>performance.  Would you try your test case again with current
>>sources?  Also, please see if you can record the CPU time
>>consumed by the backend while doing the sort.
>
>It's getting better, but still slower than before.
>
>52:50 (with 8MB sort memory)
>
>ps shows 7:15 was consumed by the backend. I'm going to test with 80MB 
>sort memory.

Done.

32:06 (with 80MB sort memory)
CPU time was 5:11.
--
Tatsuo Ishii

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

01 ноября 1999 г., 13:43:02

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>>>> It worked with 2GB+ table but was much slower than before.
>>>> Before(with 8MB sort memory): 22 minutes
>>>> After(with 8MB sort memory): 1 hour and 5 minutes
>>>> After(with 80MB sort memory): 42 minutes.
>> 
> It's getting better, but still slower than before.
> 52:50 (with 8MB sort memory)
> ps shows 7:15 was consumed by the backend.
> 32:06 (with 80MB sort memory)
> CPU time was 5:11.

OK, so it's basically all I/O time, which is what I suspected.

What's causing this is the changes I made to reduce disk space usage;
the price of that savings is more-random access to the temporary file.
Apparently your setup is not coping very well with that.

The original code used seven separate temp files, each of which was
written and read in a purely sequential fashion.  Only problem: as
the merge steps proceed, all the data is read from one temp file and
dumped into another, and because of the way the merges are overlapped,
you end up with total space usage around 4X the actual data volume.

What's in there right now is just the same seven-tape merge algorithm,
but all the "tapes" are stored in a single temp file.  As soon as any
block of a "tape" is read in, it's recycled to become available space
for the current "output tape" (since we know we won't need to read that
block again).  This is why the disk space usage is roughly actual data
volume and not four times as much.  However, the access pattern to this
single temp file looks a lot more random to the OS than the access
patterns for the original temp files.

I figured that I could get away with this from a performance standpoint
because, while the old code processed each temp file sequentially, the
read and write accesses were interleaved --- on average, you'd expect
a merge pass to read one block from each of the N source tapes in the
same time span that it is writing N blocks to the current output tape;
on average, no two successive block read or write requests will go to
the same temp file.  So it appears to me that the old code should cause
a disk seek for each block read or written.  The new code's behavior
can't be any worse than that; it's just doing those seeks within one
temp file instead of seven.

Of course the flaw in this reasoning is that it assumes the OS isn't
getting in the way.  On the HPUX system I've been testing on, the
performance does seem to be about the same, but evidently it's much
worse on your system.  (Exactly what OS are you running, anyway, and
on what hardware?)  I speculate that your OS is applying some sort of
read-ahead algorithm that is getting hopelessly confused by lots of
seeks within a single file.  Perhaps it's reading the next block in
sequence after every program-requested read, and then throwing away that
work when it sees the program lseek the file instead of reading.

Next question is what to do about it.  I don't suppose we have any way
of turning off the OS' read-ahead algorithm :-(.  We could forget about
this space-recycling improvement and go back to separate temp files.
The objection to that, of course, is that while sorting might be faster,
it doesn't matter how fast the algorithm is if you don't have the disk
space to execute it.

A possible compromise is to use separate temp files but drop the
polyphase merge and go to a balanced merge, which'd still access each
temp file sequentially but would have only a 2X space penalty instead of
4X (since all the data starts on one set of tapes and gets copied to the
other set during a complete merge pass).  The balanced merge is a little
slower than polyphase --- more merge passes --- but the space savings
probably justify it.

One thing I'd like to know before we make any decisions is whether
this problem is widespread.  Can anyone else run performance tests
of the speed of large sorts, using current sources vs. 6.5.* ?
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Bruce Momjian

Дата:

01 ноября 1999 г., 14:54:09

> Of course the flaw in this reasoning is that it assumes the OS isn't
> getting in the way.  On the HPUX system I've been testing on, the
> performance does seem to be about the same, but evidently it's much
> worse on your system.  (Exactly what OS are you running, anyway, and
> on what hardware?)  I speculate that your OS is applying some sort of
> read-ahead algorithm that is getting hopelessly confused by lots of
> seeks within a single file.  Perhaps it's reading the next block in
> sequence after every program-requested read, and then throwing away that
> work when it sees the program lseek the file instead of reading.
> 
> Next question is what to do about it.  I don't suppose we have any way
> of turning off the OS' read-ahead algorithm :-(.  We could forget about
> this space-recycling improvement and go back to separate temp files.
> The objection to that, of course, is that while sorting might be faster,
> it doesn't matter how fast the algorithm is if you don't have the disk
> space to execute it.

That is the key.  On BSDI, the kernel code is more complicated.  If it
does a read on an already open file, and the requested buffer is not in
core, it assumes that the readahead that was performed by the previous
read was useless, and scales back the readahead algorithm.  At least
that is my interpretation of the code and comments.

I suspect other OS's do similar work, but it is possible they do it more
simplistically, saying if someone does _any_ seek, they must be
accessing it non-sequentially, so read-ahead should be turned off.

Read-ahead on random file access is a terrible thing, and most OS's
figure out a way to turn off read-ahead in non-sequential cases.  Of
course, lack of read-ahead in sequential access also is a problem.

Tatsuo, what OS are you using?  Maybe I can check the kernel to see how
it is behaving.

> 
> A possible compromise is to use separate temp files but drop the
> polyphase merge and go to a balanced merge, which'd still access each
> temp file sequentially but would have only a 2X space penalty instead of
> 4X (since all the data starts on one set of tapes and gets copied to the
> other set during a complete merge pass).  The balanced merge is a little
> slower than polyphase --- more merge passes --- but the space savings
> probably justify it.
> 
> One thing I'd like to know before we make any decisions is whether
> this problem is widespread.  Can anyone else run performance tests
> of the speed of large sorts, using current sources vs. 6.5.* ?

I may be able to test that today on BSDI, but I doubt BSDI is typical.
They are probably state-of-the-art in kernel algorithm design.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] sort on huge table

От

Bruce Momjian

Дата:

01 ноября 1999 г., 16:02:55

> Next question is what to do about it.  I don't suppose we have any way
> of turning off the OS' read-ahead algorithm :-(.  We could forget about
> this space-recycling improvement and go back to separate temp files.
> The objection to that, of course, is that while sorting might be faster,
> it doesn't matter how fast the algorithm is if you don't have the disk
> space to execute it.


Look what I found. I downloaded Linux kernel source for 2.2.0, and
started looking for the word 'ahead' in the file system files.  I found
that read-ahead seems to be controlled by f_reada, and look where I
found it being turned off?  Seems like any seek turns off read-ahead on
Linux.

When you do a read or write, it seems to be turned on again.  Once you
read/write, the next read/write will do read-ahead, assuming you don't
do any lseek() before the second read/write().

Seems like the algorithm in psort now is rarely having read-ahead on
Linux, while other OS's check to see if the read-ahead was eventually
used, and control read-ahead that way.

read-head also seems be off on the first read from a file.

---------------------------------------------------------------------------

/**  linux/fs/ext2/file.c
...
/** Make sure the offset never goes beyond the 32-bit mark..*/
static long long ext2_file_lseek(struct file *file,long long offset,int origin)
{struct inode *inode = file->f_dentry->d_inode;
switch (origin) {    case 2:        offset += inode->i_size;        break;    case 1:        offset += file->f_pos;}if
(((unsignedlong long) offset >> 32) != 0) {
 
#if BITS_PER_LONG < 64    return -EINVAL;
#else    if (offset > ext2_max_sizes[EXT2_BLOCK_SIZE_BITS(inode->i_sb)])        return -EINVAL;
#endif} if (offset != file->f_pos) {    file->f_pos = offset;    file->f_reada = 0;    file->f_version =
++event;}returnoffset;
 
}


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] sort on huge table

От

Bruce Momjian

Дата:

01 ноября 1999 г., 16:34:55

> Next question is what to do about it.  I don't suppose we have any way
> of turning off the OS' read-ahead algorithm :-(.  We could forget about
> this space-recycling improvement and go back to separate temp files.
> The objection to that, of course, is that while sorting might be faster,
> it doesn't matter how fast the algorithm is if you don't have the disk
> space to execute it.

If I am correct on the Linux seek thing, and Tatsuo is running Linux, is
there any way to fake out the kernel on only Linux, so we issue two
reads in a row before doing a seek?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] sort on huge table

От

"Aaron J. Seigo"

Дата:

01 ноября 1999 г., 17:08:55

hi...

> Look what I found. I downloaded Linux kernel source for 2.2.0, and
> started looking for the word 'ahead' in the file system files.  I found
> that read-ahead seems to be controlled by f_reada, and look where I
> found it being turned off?  Seems like any seek turns off read-ahead on
> Linux.

the current kernel is 2.2.13... =)

that said, the fs/ext2/file.c is the same in 2.2.13 as it is in 2.2.0 (just
checked).. i'm going to put this out on the linux kernel mailing list and see
what comes back, though, as this seems to be an issue that should be
resolved if accurate....



-- 
Aaron J. Seigo
Sys Admin

Re: [HACKERS] sort on huge table

От

Bruce Momjian

Дата:

01 ноября 1999 г., 17:14:56

> hi...
> 
> > Look what I found. I downloaded Linux kernel source for 2.2.0, and
> > started looking for the word 'ahead' in the file system files.  I found
> > that read-ahead seems to be controlled by f_reada, and look where I
> > found it being turned off?  Seems like any seek turns off read-ahead on
> > Linux.
> 
> the current kernel is 2.2.13... =)

I need to know what kernel the tester is using.  I doubt it is the most
current one.

> that said, the fs/ext2/file.c is the same in 2.2.13 as it is in 2.2.0 (just
> checked).. i'm going to put this out on the linux kernel mailing list and see
> what comes back, though, as this seems to be an issue that should be
> resolved if accurate....

I am not sure I am accurate either, but I think I am.

It would be nice to get the kernel fixed, though a fix for that is
rarely trivial.

Let us know what your find out.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

01 ноября 1999 г., 20:04:57

Bruce Momjian <maillist@candle.pha.pa.us> writes:
> If I am correct on the Linux seek thing, and Tatsuo is running Linux, is
> there any way to fake out the kernel on only Linux, so we issue two
> reads in a row before doing a seek?

I dunno.  I see that f_reada is turned off by a seek in the extract you
posted, but I wasn't clear on what turns it on again, nor what happens
after it is turned on.

After further thought I am not sure that read-ahead or lack of it is
the problem.  The changes I committed over the weekend were to try to
improve locality of access to the temp file by reading tuples from
logical tapes in bursts --- in a merge pass that's reading N logical
tapes, it now tries to grab SortMem/N bytes worth of tuples off any one
source tape at a time, rather than just reading an 8K block at a time
from each tape as the first cut did.  That seemed to improve performance
on both my system and Tatsuo's, but his is still far below the speed of
the 6.5 code.  I'm not sure I understand why.  The majority of the block
reads or writes *should* be sequential now, given a reasonable SortMem
(and he tested with quite large settings).  I'm afraid there is some
aspect of the kernel's behavior on his system that we don't have a clue
about...
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Lamar Owen

Дата:

01 ноября 1999 г., 20:29:58

Tom Lane wrote:
> the 6.5 code.  I'm not sure I understand why.  The majority of the block
> reads or writes *should* be sequential now, given a reasonable SortMem
> (and he tested with quite large settings).  I'm afraid there is some
> aspect of the kernel's behavior on his system that we don't have a clue
> about...

How could I go about duplicating this?? Having multiple RedHat systems
available (both of the 2.2 and 2.0 variety), I'd be glad to test it
here. I'm pulling a cvs update as I write this.  If possible, I'd like
to duplicate it exactly.

Also, from prior discussions with Thomas, there is a RedHat 6.0 machine
at hub.org for testing purposes.

--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11

Re: [HACKERS] sort on huge table

От

Tatsuo Ishii

Дата:

01 ноября 1999 г., 20:43:58

>Of course the flaw in this reasoning is that it assumes the OS isn't
>getting in the way.  On the HPUX system I've been testing on, the
>performance does seem to be about the same, but evidently it's much
>worse on your system.  (Exactly what OS are you running, anyway, and
>on what hardware?)  I speculate that your OS is applying some sort of
>read-ahead algorithm that is getting hopelessly confused by lots of
>seeks within a single file.  Perhaps it's reading the next block in
>sequence after every program-requested read, and then throwing away that
>work when it sees the program lseek the file instead of reading.

Ok. Here are my settings.

RedHat Linux 6.0 (kernel 2.2.5-smp)
Pentium III 500MHz x 2
RAM: 512MB
Disk: Ultra Wide SCSI 9GB x 4 + Hardware RAID (RAID 5).

Also, I could provide testing scripts to reproduce my tests.

>Next question is what to do about it.  I don't suppose we have any way
>of turning off the OS' read-ahead algorithm :-(.  We could forget about
>this space-recycling improvement and go back to separate temp files.
>The objection to that, of course, is that while sorting might be faster,
>it doesn't matter how fast the algorithm is if you don't have the disk
>space to execute it.
>
>A possible compromise is to use separate temp files but drop the
>polyphase merge and go to a balanced merge, which'd still access each
>temp file sequentially but would have only a 2X space penalty instead of
>4X (since all the data starts on one set of tapes and gets copied to the
>other set during a complete merge pass).  The balanced merge is a little
>slower than polyphase --- more merge passes --- but the space savings
>probably justify it.

I think it depends on the disk space available. Ideally it should be
able to choice the sort algorithm. If it's impossible, the algorithm
that requires least sort space requires would be the way we go. Since
the performance problem only occurs when a table is huge.

>One thing I'd like to know before we make any decisions is whether
>this problem is widespread.  Can anyone else run performance tests
>of the speed of large sorts, using current sources vs. 6.5.* ?

I will test with 6.5.2 again.
--
Tatsuo Ishii

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

01 ноября 1999 г., 20:49:58

Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> RedHat Linux 6.0 (kernel 2.2.5-smp)
> Pentium III 500MHz x 2
> RAM: 512MB
> Disk: Ultra Wide SCSI 9GB x 4 + Hardware RAID (RAID 5).

OK, no problem with inadequate hardware anyway ;-).  Bruce's concern
about simplistic read-ahead algorithm in Linux may apply though.

> Also, I could provide testing scripts to reproduce my tests.

Please.  That would be very handy so that we can make sure we are all
comparing the same thing.  I assume the scripts can be tweaked to vary
the amount of disk space used?  I can't scare up more than a couple
hundred meg at the moment.  (The natural state of a disk drive is
"full" ...)

> I think it depends on the disk space available. Ideally it should be
> able to choice the sort algorithm.

I was hoping to avoid that, because of the extra difficulty of testing
and maintenance.  But it may be the only answer.
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

01 ноября 1999 г., 21:09:58

Lamar Owen <lamar.owen@wgcr.org> writes:
> How could I go about duplicating this?? Having multiple RedHat systems
> available (both of the 2.2 and 2.0 variety), I'd be glad to test it
> here. I'm pulling a cvs update as I write this.  If possible, I'd like
> to duplicate it exactly.

Me too (modulo disk space issues --- maybe we should try to compare
sorts of say 100MB, rather than 2GB).  Tatsuo said he'd make his test
script available.
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Bruce Momjian

Дата:

02 ноября 1999 г., 01:02:02

> Lamar Owen <lamar.owen@wgcr.org> writes:
> > How could I go about duplicating this?? Having multiple RedHat systems
> > available (both of the 2.2 and 2.0 variety), I'd be glad to test it
> > here. I'm pulling a cvs update as I write this.  If possible, I'd like
> > to duplicate it exactly.
> 
> Me too (modulo disk space issues --- maybe we should try to compare
> sorts of say 100MB, rather than 2GB).  Tatsuo said he'd make his test
> script available.

I would be very interested if Tatsuo could comment out the f_reada line
in the function I posted, and see if the new kernel is faster on 7.0
sorts.  That would clearly show the cause.  I wouldn't be surprised if
7.0 sorts became faster than 6.5.* sorts.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] sort on huge table

От

Bruce Momjian

Дата:

30 ноября 1999 г., 00:03:08

Was this resolved?


> >That's what I get for not testing the interaction between logtape.c
> >and buffile.c at a segment boundary --- it didn't work, of course :-(.
> >I rebuilt with a small RELSEG_SIZE and debugged it.  I'm still concerned
> >about possible integer overflow problems, so please update and try again
> >with a large file.
> 
> It worked with 2GB+ table but was much slower than before.
> 
> Before(with 8MB sort memory): 22 minutes
> 
> After(with 8MB sort memory): 1 hour and 5 minutes
> After(with 80MB sort memory): 42 minutes.
> --
> Tatsuo Ishii
> 
> ************
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

30 ноября 1999 г., 02:14:10

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Was this resolved?

I tweaked the code some, and am waiting for retest results from Tatsuo.

I think the poor results he is seeing might be platform-dependent; on
my machine current code seems to be faster than 6.5.* ... but on the
other hand I don't have the disk space to run a multi-gig sort test.

Can anyone else take the time to compare speed of large sorts between
6.5.* and current code?
        regards, tom lane

>> It worked with 2GB+ table but was much slower than before.
>> 
>> Before(with 8MB sort memory): 22 minutes
>> 
>> After(with 8MB sort memory): 1 hour and 5 minutes
>> After(with 80MB sort memory): 42 minutes.
>> --
>> Tatsuo Ishii

Re: [HACKERS] sort on huge table

От

Hannu Krosing

Дата:

30 ноября 1999 г., 06:42:13

Tom Lane wrote:
> 
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Was this resolved?
> 
> I tweaked the code some, and am waiting for retest results from Tatsuo.
> 
> I think the poor results he is seeing might be platform-dependent; on
> my machine current code seems to be faster than 6.5.* ... but on the
> other hand I don't have the disk space to run a multi-gig sort test.
> 
> Can anyone else take the time to compare speed of large sorts between
> 6.5.* and current code?

Is there a howto for running an additional development backend ?

If there is, I could test it on a dual P!!!500MHz IBM Netfinity 
M20 with 1GB memory and >30 GB RAID5 disks.

---------------
Hannu

Re: [HACKERS] sort on huge table

От

Peter Eisentraut

Дата:

30 ноября 1999 г., 09:30:15

On Mon, 29 Nov 1999, Tom Lane wrote:

> Can anyone else take the time to compare speed of large sorts between
> 6.5.* and current code?

I have a few Linux and FreeBSD machines with rather normal hardware I
could use, but I'm not all that familiar with what you were working on, so
I'd need exact specifications or, better yet, a script.

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden

Re: [HACKERS] sort on huge table

От

Tom Lane

Дата:

30 ноября 1999 г., 14:34:18

Peter Eisentraut <e99re41@DoCS.UU.SE> writes:
> On Mon, 29 Nov 1999, Tom Lane wrote:
>> Can anyone else take the time to compare speed of large sorts between
>> 6.5.* and current code?

> I have a few Linux and FreeBSD machines with rather normal hardware I
> could use, but I'm not all that familiar with what you were working on, so
> I'd need exact specifications or, better yet, a script.

Tatsuo posted his sort test script to pgsql-hackers on 02 Nov 1999
13:07:32 +0900; you can get it from the archives.
        regards, tom lane

Re: [HACKERS] sort on huge table

От

Peter Eisentraut

Дата:

30 ноября 1999 г., 18:20:47

I ran the sort script without change, the resulting file was about 250MB
in size. Not sure what kind of sizes you were looking for.

6.5.3     696.01 real         0.03 user         0.02 sys

"7.0" from last Saturday     957.73 real         0.03 user         0.02 sys
one more time     936.41 real         0.04 user         0.01 sys


FreeBSD 3.3, 200MHz Pentium (P55C), 128MB RAM
both installations where done without extras (bare ./configure)

That almost seems too wacko to be true. I'll be happy to rerun them, with
other sizes if you want.

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden

initdb.sh fixed

От

Bruce Momjian

Дата:

18 декабря 1999 г., 02:07:18

OK, initdb should now work.  There were a variety of non-portable things
in initdb.sh, like assuming $EUID is defined, and other shell script and
command args that do not exist on BSDI.

I think I got them all.  If anyone sees problems, let me know.  This is
not really Peter's fault.  It takes a long time to know what is
portable and what is not portable.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: initdb.sh fixed

От

Peter Eisentraut

Дата:

18 декабря 1999 г., 17:02:36

People, I thought you would at least test this once before applying it. I
explicitly (maybe not explicitly enough) mentioned it, because writing
complex shell scripts is a nut job. Maybe this thing should really be
written in C. Then there will be no EUID, no echo, no function, no grep,
no whoami, or other problems. Perhaps the whole genbki.sh thing could be
scrapped then, with initdb interpreting the DATA() macros itself. It
would even reduce the overhead of calling postgres about 12 times and
could get it down to 2 or 3. A project for 7.1?

On 1999-12-17, Bruce Momjian mentioned:

> OK, initdb should now work.  There were a variety of non-portable things
> in initdb.sh, like assuming $EUID is defined, and other shell script and
> command args that do not exist on BSDI.

Hmm, that $EUID seems to have be the root of all trouble because then the
'insert ( data data data )' bootstrap commands are containing gaps. On the
other hand, this was one of the key things that were supposed to be
improved because relying on $USER was not su-safe. Maybe $UID would work,
since initdb isn't supposed to be setuid anyway.

> I think I got them all.  If anyone sees problems, let me know.  This is
> not really Peter's fault.  It takes a long time to know what is
> portable and what is not portable.

The more time I spend with this the more I think that the only thing
that's portable is echo. Oh wait, that's not portable either. :)

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden

Re: initdb.sh fixed

От

Bruce Momjian

Дата:

18 декабря 1999 г., 18:22:29

[Charset ISO-8859-1 unsupported, filtering to ASCII...]
> People, I thought you would at least test this once before applying it. I
> explicitly (maybe not explicitly enough) mentioned it, because writing
> complex shell scripts is a nut job. Maybe this thing should really be
> written in C. Then there will be no EUID, no echo, no function, no grep,
> no whoami, or other problems. Perhaps the whole genbki.sh thing could be
> scrapped then, with initdb interpreting the DATA() macros itself. It
> would even reduce the overhead of calling postgres about 12 times and
> could get it down to 2 or 3. A project for 7.1?

I had enough trouble applying the patch, let alone testing it...

Making it in C presents all sorts of portability problems that are even
harder to figure.  There is no portability free lunch.  I think a script
is the way to go with this.

The big problem seems to be reliance on bash-isms like $UID and
functions with spaces like:

function func () {
}

Only bash knows about that.  I have written enough shells scripts to
know that, but it is hard to get that knowledge.

Also, env args without quotes around them is a problem.

All fixed now.

> 
> On 1999-12-17, Bruce Momjian mentioned:
> 
> > OK, initdb should now work.  There were a variety of non-portable things
> > in initdb.sh, like assuming $EUID is defined, and other shell script and
> > command args that do not exist on BSDI.
> 
> Hmm, that $EUID seems to have be the root of all trouble because then the
> 'insert ( data data data )' bootstrap commands are containing gaps. On the
> other hand, this was one of the key things that were supposed to be
> improved because relying on $USER was not su-safe. Maybe $UID would work,
> since initdb isn't supposed to be setuid anyway.

Again, a bash-ism.  Let's face, it, the postgres binary is going to
croak on root anyway, so we are just doing an extra check in initdb.

> 
> > I think I got them all.  If anyone sees problems, let me know.  This is
> > not really Peter's fault.  It takes a long time to know what is
> > portable and what is not portable.
> 
> The more time I spend with this the more I think that the only thing
> that's portable is echo. Oh wait, that's not portable either. :)

Don't think so.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: initdb.sh fixed

От

Peter Eisentraut

Дата:

19 декабря 1999 г., 22:13:56

On 1999-12-18, Bruce Momjian mentioned:

> The big problem seems to be reliance on bash-isms like $UID and
> functions with spaces like:

Bash tells me that is it's invoked as 'sh' it will behave like 'sh', but
it's lying ...

> > 'insert ( data data data )' bootstrap commands are containing gaps. On the
> > other hand, this was one of the key things that were supposed to be
> > improved because relying on $USER was not su-safe. Maybe $UID would work,
> > since initdb isn't supposed to be setuid anyway.
> 
> Again, a bash-ism.  Let's face, it, the postgres binary is going to
> croak on root anyway, so we are just doing an extra check in initdb.

But the point was to initialize to superuser id in Postgres as that
number, but we might as well start them out at 0, like it is now.

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden

Re: [HACKERS] Re: initdb.sh fixed7

От

Bruce Momjian

Дата:

20 декабря 1999 г., 00:31:50

> > > 'insert ( data data data )' bootstrap commands are containing gaps. On the
> > > other hand, this was one of the key things that were supposed to be
> > > improved because relying on $USER was not su-safe. Maybe $UID would work,
> > > since initdb isn't supposed to be setuid anyway.
> > 
> > Again, a bash-ism.  Let's face, it, the postgres binary is going to
> > croak on root anyway, so we are just doing an extra check in initdb.
> 
> But the point was to initialize to superuser id in Postgres as that
> number, but we might as well start them out at 0, like it is now.

I am now using:
POSTGRES_SUPERUSERID="`id -u 2>/dev/null || echo 0`"

Let's see how portable that is?

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: initdb.sh fixed

От

Bruce Momjian

Дата:

20 декабря 1999 г., 00:31:53

[Charset ISO-8859-1 unsupported, filtering to ASCII...]
> On 1999-12-18, Bruce Momjian mentioned:
> 
> > The big problem seems to be reliance on bash-isms like $UID and
> > functions with spaces like:
> 
> Bash tells me that is it's invoked as 'sh' it will behave like 'sh', but
> it's lying ...

Yes, certain _extensions_ show through.

> 
> > > 'insert ( data data data )' bootstrap commands are containing gaps. On the
> > > other hand, this was one of the key things that were supposed to be
> > > improved because relying on $USER was not su-safe. Maybe $UID would work,
> > > since initdb isn't supposed to be setuid anyway.
> > 
> > Again, a bash-ism.  Let's face, it, the postgres binary is going to
> > croak on root anyway, so we are just doing an extra check in initdb.
> 
> But the point was to initialize to superuser id in Postgres as that
> number, but we might as well start them out at 0, like it is now.

Seems either $USER or $LOGNAME should be set in all cases.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] Re: initdb.sh fixed

От

Tom Lane

Дата:

20 декабря 1999 г., 01:28:50

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Seems either $USER or $LOGNAME should be set in all cases.

One or both is probably set in most shell environments ... but
it's not necessarily *right*.  If you've su'd to postgres from
your login account, these env vars may still reflect your login.

> I am now using:
>    POSTGRES_SUPERUSERID="`id -u 2>/dev/null || echo 0`"
> Let's see how portable that is?

Some quick experimentation shows that id -u isn't too trustworthy,
which is a shame because it's the POSIX standard.  But I find that
the SunOS implementation ignores -u:

$ id -u
uid=6902(tgl) gid=50(users0) groups=50(users0)

And no doubt there will be platforms that haven't got "id" at all.

It might be best to provide a little bitty C program that calls
geteuid() and prints the result...
        regards, tom lane

Re: [HACKERS] Re: initdb.sh fixed

От

Bruce Momjian

Дата:

20 декабря 1999 г., 01:46:50

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Seems either $USER or $LOGNAME should be set in all cases.
> 
> One or both is probably set in most shell environments ... but
> it's not necessarily *right*.  If you've su'd to postgres from
> your login account, these env vars may still reflect your login.
> 
> > I am now using:
> >    POSTGRES_SUPERUSERID="`id -u 2>/dev/null || echo 0`"
> > Let's see how portable that is?
> 
> Some quick experimentation shows that id -u isn't too trustworthy,
> which is a shame because it's the POSIX standard.  But I find that
> the SunOS implementation ignores -u:
> 
> $ id -u
> uid=6902(tgl) gid=50(users0) groups=50(users0)
> 
> And no doubt there will be platforms that haven't got "id" at all.
> 
> It might be best to provide a little bitty C program that calls
> geteuid() and prints the result...

We could argue that Postgres is the super-user for the database, it
should be zero userid.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] Re: initdb.sh fixed

От

Tom Lane

Дата:

20 декабря 1999 г., 02:20:51

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> We could argue that Postgres is the super-user for the database, it
> should be zero userid.

Actually, that's quite a good thought --- is there *any* real need
for initdb to extract the UID of the postgres user?  What we do need,
I think, is the *name* of the postgres user, which we might perhaps
get with something like
whoami 2>/dev/null || id -u -n 2>/dev/null || echo postgres
        regards, tom lane

Re: [HACKERS] Re: initdb.sh fixed

От

Bruce Momjian

Дата:

20 декабря 1999 г., 02:53:51

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > We could argue that Postgres is the super-user for the database, it
> > should be zero userid.
> 
> Actually, that's quite a good thought --- is there *any* real need
> for initdb to extract the UID of the postgres user?  What we do need,
> I think, is the *name* of the postgres user, which we might perhaps
> get with something like
> 
>     whoami 2>/dev/null || id -u -n 2>/dev/null || echo postgres

We currently have:
 EffectiveUser=`id -n -u 2> /dev/null` || EffectiveUser=`whoami 2> /dev/null`

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] Re: initdb.sh fixed

От

Tom Lane

Дата:

20 декабря 1999 г., 03:25:52

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> We currently have:
>   EffectiveUser=`id -n -u 2>/dev/null` || EffectiveUser=`whoami 2>/dev/null`

OK, but is that really portable?  I'd feel more comfortable with

EffectiveUser=`id -n -u 2>/dev/null || whoami 2>/dev/null`

because it's clearer what will happen.  I wouldn't have expected an
error inside a backquoted subcommand to determine the error result of
the command as a whole, which is what the first example is depending on.
In a quick test it seemed to work with the ksh I tried it on, but I
wonder how many shells work that way...
        regards, tom lane

Re: [HACKERS] Re: initdb.sh fixed

От

Bruce Momjian

Дата:

20 декабря 1999 г., 03:40:54

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > We currently have:
> >   EffectiveUser=`id -n -u 2>/dev/null` || EffectiveUser=`whoami 2>/dev/null`
> 
> OK, but is that really portable?  I'd feel more comfortable with
> 
> EffectiveUser=`id -n -u 2>/dev/null || whoami 2>/dev/null`
> 
> because it's clearer what will happen.  I wouldn't have expected an
> error inside a backquoted subcommand to determine the error result of
> the command as a whole, which is what the first example is depending on.
> In a quick test it seemed to work with the ksh I tried it on, but I
> wonder how many shells work that way...

Change applied.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] Re: initdb.sh fixed

От

Peter Eisentraut

Дата:

24 декабря 1999 г., 19:29:52

On Sun, 19 Dec 1999, Bruce Momjian wrote:

> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > We could argue that Postgres is the super-user for the database, it
> > > should be zero userid.
> > 
> > Actually, that's quite a good thought --- is there *any* real need
> > for initdb to extract the UID of the postgres user?  What we do need,
> > I think, is the *name* of the postgres user, which we might perhaps
> > get with something like
> > 
> >     whoami 2>/dev/null || id -u -n 2>/dev/null || echo postgres
> 
> We currently have:
> 
>   EffectiveUser=`id -n -u 2> /dev/null` || EffectiveUser=`whoami 2> /dev/null`
> 

If neither one of these resulted in anything it will ask you to provide a
string with --username. But I figure one must have one of those.

-- 
Peter Eisentraut                  Sernanders vaeg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: sort on huge table