Re: PoC: Partial sort

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: PoC: Partial sort
Дата
Msg-id CAApHDvrbq348M8dYj-7O4VaE5PS9ZoQ_34rGvaaN1QYXL2SP_A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PoC: Partial sort  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: PoC: Partial sort
Список pgsql-hackers
On Sun, Dec 29, 2013 at 4:51 AM, Alexander Korotkov <aekorotkov@gmail.com> wrote:
I've compiled it with clang. Yeah, there was mixed declarations. I've rechecked it with gcc, now it gives no warnings. I didn't try it with visual studio, but I hope it will be OK.


Thanks for the patch. It now compiles without any problems.
I've been doing a bit of testing with the patch testing a few different workloads. One thing that I've found is that in my test case when the table only contains 1 tuple for any given presort columns that the query is actually slower than when there are say 100 tuples to sort for any given presort group.

Here is my test case:

DROP TABLE IF EXISTS temperature_readings;

CREATE TABLE temperature_readings (
  readingid SERIAL NOT NULL,
  timestamp TIMESTAMP NOT NULL,
  locationid INT NOT NULL,
  temperature INT NOT NULL,
  PRIMARY KEY (readingid)
);

INSERT INTO temperature_readings (timestamp,locationid,temperature)
SELECT ts.timestamp, loc.locationid, -10 + random() * 40
FROM generate_series('1900-04-01','2000-04-01','1 day'::interval) ts(timestamp)
CROSS JOIN generate_series(1,1) loc(locationid);

VACUUM ANALYZE temperature_readings;

-- Warm buffers
SELECT AVG(temperature) FROM temperature_readings;

explain (buffers, analyze) select * from temperature_readings order by timestamp,locationid; -- (seqscan -> sort) 70.805ms

-- create an index to allow presorting on timestamp.
CREATE INDEX temperature_readings_timestamp_idx ON temperature_readings (timestamp);

-- warm index buffers
SELECT COUNT(*) FROM (SELECT timestamp FROM temperature_readings ORDER BY timestamp) c;

explain (buffers, analyze) select * from temperature_readings order by timestamp,locationid; -- index scan -> partial sort 253.032ms

The first query without the index to presort on runs in 70.805 ms, the 2nd query uses the index to presort and runs in 253.032 ms.

I ran the code through a performance profiler and found that about 80% of the time is spent in tuplesort_end and tuplesort_begin_heap. 

If it was possible to devise some way to reuse any previous tuplesortstate perhaps just inventing a reset method which clears out tuples, then we could see performance exceed the standard seqscan -> sort. The code the way it is seems to lookup the sort functions from the syscache for each group then allocate some sort space, so quite a bit of time is also spent in palloc0() and pfree()

If it was not possible to do this then maybe adding a cost to the number of sort groups would be better so that the optimization is skipped if there are too many sort groups.

Regards

David Rowley




 
------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Upgrade to Autoconf 2.69