Re: [HACKERS] sorting big tables :(
От | The Hermit Hacker |
---|---|
Тема | Re: [HACKERS] sorting big tables :( |
Дата | |
Msg-id | Pine.BSF.3.96.980519234300.7420A-100000@thelab.hub.org обсуждение исходный текст |
Ответ на | Re: [HACKERS] sorting big tables :( (Bruce Momjian <maillist@candle.pha.pa.us>) |
Список | pgsql-hackers |
On Tue, 19 May 1998, Bruce Momjian wrote: > > > > On Sun, 17 May 1998, Bruce Momjian wrote: > > > > > > > > I have a big table. 40M rows. > > > > > > On the disk, it's size is: > > > > > > 2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table. > > > > > > How should one decide based on table size how much room is needed? > > > > > > Tape sort is a standard Knuth sorting. It basically sorts in pieces, > > > and merges. If you don't do this, the accessing around gets very poor > > > as you page fault all over the file, and the cache becomes useless. > > Right. I wasn't reading the right chapter. Internal sorting is much > > different than external sorts. Internal suggests the use of a Quicksort > > algorithim. > > Marc and I discussed over lunch. If I did a select * into, would it not > > make more sense to sort the results into the resulting table rather than > > into pieces and then copy into a table? From my limited knowlege, I think > > this should save 8/7 N the space. > > In this issue, I think there must be a lot more overhead than necessary. > > Not sure if the internal tape is the same structure as a real table, but > I doubt it. I seem to remember there is less overhead. > > > The table consists of only > > int4, int4, int2 > > I read 10 bytes / row of actual data here. > > Instead, 40M/2gigs is about > > 50 bytes / record > > What is there other than oid (4? bytes) > > Internal stuff so it looks like a real table, even though it is a > result, I think. Okay...I get to jump in here with both feet and arms flailing :) Michael and I had lunch today and talked about this, and I asked him to send an email in to the list about it...unfortunately, he didn't translate our chat very well for here :) This whole things makes absolutely no sense to me, as far as why it takes 2.5 times more space to *sort* the table as the table size itself. He starts with a 2gig table, and it runs out of disk space on a 9gig file system... Now, looking at question 3.26 in the FAQ, we have: 40 bytes + each row header (approximate) 10 bytes + two int4 fields + one int2 field 4 bytes + pointer on page to tuple -------- = 54 bytes per row The data page size in PostgreSQL is 8192(8k) bytes, so: 8192 bytes per page ------------------- = 157 rows per database page (rounded up) 54 bytes per row 40000000 data rows ----------------- = 254777 database pages 157 rows per page 254777 database pages * 8192 bytes per page = 2,087,133,184 or ~1.9gig Now, as a text file, this would amount to, what...~50MB? So, if I were to do a 'copy out' to a text file, a Unix sort and then a 'copy in', I would use up *less* disk space (by several orders of magnitude) then doing the sort inside of PostgreSQL? Why? Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
В списке pgsql-hackers по дате отправления: