Обсуждение: PostgreSQL underestimates sorting

Поиск
Список
Период
Сортировка

PostgreSQL underestimates sorting

От
Markus Schaber
Дата:
Hi,

PostgreSQL 8.1 (and, back then, 7.4) have the tendency to underestimate
the costs of sort operations, compared to index scans.

The Backend allocates gigs of memory (we've set sort_mem to 1 gig), and
then starts spilling out more Gigs of temporary data to the disk. So the
execution gets - in the end - much slower compared to an index scan, and
wastes lots of disk space.

We did not manage to tune the config values appropriately, at least not
without causing other query plans to suffer badly.

Are there some nice ideas how to shift the planners preferences slightly
towards index scans, without affecting other queries?

There's one thing that most of those queries have in common: They
include TOAST data (large strings, PostGIS geometries etc.), and I
remember that there are known problems with estimating the TOAST costs.
This may be part of the problem, or may be irrelevant.


Thanks,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: PostgreSQL underestimates sorting

От
"Steinar H. Gunderson"
Дата:
On Wed, Nov 22, 2006 at 11:17:23AM +0100, Markus Schaber wrote:
> The Backend allocates gigs of memory (we've set sort_mem to 1 gig), and
> then starts spilling out more Gigs of temporary data to the disk.

How much RAM is in the server? Remember that sort_mem is _per sort_, so if
you have multiple sorts, it might allocate several multiples of the amount
you set up.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: PostgreSQL underestimates sorting

От
Markus Schaber
Дата:
Hi, Steinar,

Steinar H. Gunderson wrote:
> On Wed, Nov 22, 2006 at 11:17:23AM +0100, Markus Schaber wrote:
>> The Backend allocates gigs of memory (we've set sort_mem to 1 gig), and
>> then starts spilling out more Gigs of temporary data to the disk.
>
> How much RAM is in the server? Remember that sort_mem is _per sort_, so if
> you have multiple sorts, it might allocate several multiples of the amount
> you set up.

That one machine has 16 Gigs of ram, and about 10 Gigs tend to be "free"
/ part of the Linux blocklayer cache.

The temporary data is not swapping, it's the Postgres on-disk sort
algorithm.

HTH,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: PostgreSQL underestimates sorting

От
Markus Schaber
Дата:
Hi, Frank,

Frank Wiles wrote:


>> The temporary data is not swapping, it's the Postgres on-disk sort
>> algorithm.
>
>    Are you actually running a query where you have a GB of data
>    you need to sort?  If not I fear you may be causing the system
>    to swap by setting it this high.

Yes, the table itself is about 40 Gigs in size, thus much larger than
the memory. The machine has 16 Gigs of ram, and 10-12 Gigs are available
for PostgreSQL + Disk Cache.

There's no swapping, only 23 MB of swap are used (40 Gigs are available).

That's one example configuration, there are others on different machines
where it turns out that forcing index usage leads to faster queries, and
less overall ressource consumption. (Or, at least, faster delivery of
the first part of the result so the application can begin to process it
asynchroneously).

Thanks,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

Re: PostgreSQL underestimates sorting

От
Frank Wiles
Дата:
On Wed, 22 Nov 2006 15:28:12 +0100
Markus Schaber <schabi@logix-tt.com> wrote:

> Hi, Steinar,
>
> Steinar H. Gunderson wrote:
> > On Wed, Nov 22, 2006 at 11:17:23AM +0100, Markus Schaber wrote:
> >> The Backend allocates gigs of memory (we've set sort_mem to 1
> >> gig), and then starts spilling out more Gigs of temporary data to
> >> the disk.
> >
> > How much RAM is in the server? Remember that sort_mem is _per
> > sort_, so if you have multiple sorts, it might allocate several
> > multiples of the amount you set up.
>
> That one machine has 16 Gigs of ram, and about 10 Gigs tend to be
> "free" / part of the Linux blocklayer cache.
>
> The temporary data is not swapping, it's the Postgres on-disk sort
> algorithm.

   Are you actually running a query where you have a GB of data
   you need to sort?  If not I fear you may be causing the system
   to swap by setting it this high.

 ---------------------------------
   Frank Wiles <frank@wiles.org>
   http://www.wiles.org
 ---------------------------------


Re: PostgreSQL underestimates sorting

От
"Simon Riggs"
Дата:
On Wed, 2006-11-22 at 11:17 +0100, Markus Schaber wrote:

> PostgreSQL 8.1 (and, back then, 7.4) have the tendency to underestimate
> the costs of sort operations, compared to index scans.
>
> The Backend allocates gigs of memory (we've set sort_mem to 1 gig), and
> then starts spilling out more Gigs of temporary data to the disk. So the
> execution gets - in the end - much slower compared to an index scan, and
> wastes lots of disk space.
>
> We did not manage to tune the config values appropriately, at least not
> without causing other query plans to suffer badly.

8.2 has substantial changes to sort code, so you may want to give the
beta version a try to check for how much better it works. That's another
way of saying that sort in 8.1 and before has some performance problems
when you are sorting more than 6 * 2 * work_mem (on randomly sorted
data) and the cost model doesn't get this right, as you observe.

Try enabling trace_sort (available in both 8.1 and 8.2) and post the
results here please, which would be very useful to have results on such
a large real-world sort.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com