Обсуждение: Sort method: external merge

Поиск
Список
Период
Сортировка

Sort method: external merge

От
wstrzalka
Дата:
It's kind of lame questions, possibly I'm missing something but my
doubts are as follow:

When planner/executor needs to sort rowsit sorts whole records (i
think so). So in the case when there are many wide columns it takes
quite a lot of memory and sort goes out to the disk because it excess
the work_mem.

Isn't it possible to sort only fields that order matters & some row
identifier/position (don't really know what - oid/ctid are tight to
table but something temporary tight to 'resultset')? It would take
much less memory and could be processed in the work_mem more often.


#  select sum(length(title)) from contacts;
 sum
------
 4225
(1 row)


# explain analyze SELECT * FROM contacts ORDER BY title;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Sort  (cost=6303.07..6369.65 rows=26634 width=269) (actual
time=71.945..92.989 rows=26634 loops=1)
   Sort Key: title
   Sort Method:  external sort  Disk: 7368kB
   ->  Seq Scan on contacts  (cost=0.00..1456.34 rows=26634 width=269)
(actual time=0.008..10.995 rows=26634 loops=1)


Re: Sort method: external merge

От
Jeff Davis
Дата:
On Wed, 2009-02-04 at 02:15 -0800, wstrzalka wrote:
> Isn't it possible to sort only fields that order matters & some row
> identifier/position (don't really know what - oid/ctid are tight to
> table but something temporary tight to 'resultset')? It would take
> much less memory and could be processed in the work_mem more often.

ctid is not sufficient for all uses of Sort, because sometimes you are
sorting tuples that don't come from a table. Consider sorting the
results after a function has been applied to some other field -- how can
you get the result of that function?

What you're talking about is kind of like building an index on the fly.
You might as well just make a BTree normally, which is exactly the kind
of result structure you are suggesting: a sorted mapping between the
sort key and the ctid.

Even in the case where you already have an index, the index scan can
actually be more expensive. The reason is that accessing tuples in a
table by ctid is random access (except in the case of a clustered
table), while pulling the tuples from an external sort is more
sequential.

If you are interested, you can take this discussion to pgsql-hackers.

Regards,
    Jeff Davis