Re: Compression and on-disk sorting
| От | Bruce Momjian |
|---|---|
| Тема | Re: Compression and on-disk sorting |
| Дата | |
| Msg-id | 200605181607.k4IG75Y09778@candle.pha.pa.us обсуждение исходный текст |
| Ответ на | Re: Compression and on-disk sorting (Greg Stark <gsstark@mit.edu>) |
| Ответы |
Re: Compression and on-disk sorting
|
| Список | pgsql-hackers |
Uh, TODO already has:
o %Add a GUC variable to control the tablespace for temporary objects and sort files
It could start with a random tablespace from a supplied list and cycle through the list.
Do we need to add to this?
---------------------------------------------------------------------------
Greg Stark wrote:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
>
> > Which means we need all the interface bits to be able to tell PostgreSQL
> > where every single temp storage area is. Presumably much of the
> > tablespace mechanism could be used for this, but it's still a bunch of
> > work. And you can't just say "I have 8 spindles", you have to tell
> > PostgreSQL exactly where to put each temporary area (unless you just
> > have it put one on every tablespace you have defined).
>
> Yes, if you have more than one temporary area you definitely need a way to
> tell Postgres where to put them since obviously they won't be in the postgres
> base directory. But I think that's all Postgres really needs to know.
>
> One could imagine a more complex version where Postgres has meta information
> about the bandwidth and seek penalty for each sort area separately. Presumably
> also for each table space. But that's a whole lot more complexity than
> Postgres's current cost model.
>
>
> > > that it should strive to maximize sequential reads within one temp area and
> > > expect switching between temp areas (which represent multiple spindles) to be
> > > better than multiplexing multiple tapes within a single temp area (which
> > > represents a single spindle).
> >
> > Which adds yet more complexity to all the code that uses the temp area.
> > And as others have brought up, you still have to allow for the case when
> > splitting all of this out into multiple files means you end up using
> > substantially more disk space. That further drives up the complexity.
>
> You also have to consider that it won't always be a benefit to spread the sort
> over multiple sort areas. If there's only one sort going on and you can reach
> a 1:1 ratio between tapes and spindles then I think it would be a huge
> benefit. Effectively boosting the sort speed by random_page_cost.
>
> But if you don't have as many spindles as your algorithm needs tapes
> then it's unclear which to multiplex down and whether you gain any benefit
> once you're multiplexing over simply using a single sort area.
>
> And worse, if there are multiple sorts going on in the system then you're not
> going to get sequential access even if you have multiple sort areas available.
> If you have N sort areas and N sorts are going on then you're probably better
> off multiplexing each one down to a single sort area and letting them each
> proceed without interfering with each other rather than having each one hog
> all the sort areas and forcing the OS to do the multiplexing blindly.
>
> > My point is that unless someone shows that there's a non-trivial
> > performance gain here, it's not going to happen.
>
> I think two extreme cases are well worth pursuing:
>
> 1) Use n sort areas for n tapes making everything purely sequential access.
> That would be useful for large DSS systems where large sorts are running
> and i/o bandwidth is high for sequential access. That gives effectively a
> random_page_cost speed boost.
>
> 2) Use the current algorithm unchanged but have each sort use a different sort
> area in some sort of round-robin fashion. That helps the OLTP type
> environment (ignoring for the moment that OLTP environments really ought
> not be doing disk sorts) where people complain about unreliable execution
> times more than slow execution times. If you can provide enough sort areas
> then it would remove one big reason other queries concurrent impact the
> execution time of queries.
>
> --
> greg
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
-- Bruce Momjian http://candle.pha.pa.us EnterpriseDB http://www.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
В списке pgsql-hackers по дате отправления: