Re: POC: GROUP BY optimization

Поиск
Список
Период
Сортировка
От Andrey Lepikhov
Тема Re: POC: GROUP BY optimization
Дата
Msg-id e3602ccb-e643-2e79-ed2c-1175a80533a1@postgrespro.ru
обсуждение исходный текст
Ответ на Re: POC: GROUP BY optimization  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
On 20/7/2023 18:46, Tomas Vondra wrote:
> On 7/20/23 08:37, Andrey Lepikhov wrote:
>> On 3/10/2022 21:56, Tom Lane wrote:
>>> Revert "Optimize order of GROUP BY keys".
>>>
>>> This reverts commit db0d67db2401eb6238ccc04c6407a4fd4f985832 and
>>> several follow-on fixes.
>>> ...
>>> Since we're hard up against the release deadline for v15, let's
>>> revert these changes for now.  We can always try again later.
>>
>> It may be time to restart the project. As a first step, I rebased the
>> patch on the current master. It wasn't trivial because of some latest
>> optimizations (a29eab, 1349d27 and 8d83a5d).
>> Now, Let's repeat the review and rewrite the current path according to
>> the reasons uttered in the revert commit.
> 1) procost = 1.0 - I guess we could make this more realistic by doing
> some microbenchmarks and tuning the costs for the most expensive cases.
Ok, some thoughts on this part of the task. As I see, we have not so 
many different operators: 26 with fixed width and 16 with variable width:

SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
   JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen>0
ORDER BY o.oid;

SELECT o.oid,oprcode,typname,typlen FROM pg_operator o
   JOIN pg_type t ON (oprleft = t.oid)
WHERE (oprname='<') AND oprleft=oprright AND typlen<0
ORDER BY o.oid;

Benchmarking procedure of types with fixed length can be something like 
below:

CREATE OR REPLACE FUNCTION pass_sort(typ regtype) RETURNS TABLE (
     nrows integer,
     exec_time float
)  AS $$
DECLARE
   data json;
   step integer;
BEGIN
   SET work_mem='1GB';

   FOR step IN 0..3 LOOP
     SELECT pow(100, step)::integer INTO nrows;
     DROP TABLE IF EXISTS test CASCADE;
     EXECUTE format('CREATE TABLE test AS SELECT gs::%s AS x
                     FROM generate_series(1,%s) AS gs;', typ, nrows);

     EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, FORMAT JSON)
       SELECT * FROM test ORDER BY (x) DESC INTO data;
     SELECT data->0->'Execution Time' INTO exec_time;
     RETURN NEXT;
   END LOOP;
END;
$$ LANGUAGE plpgsql VOLATILE;

Execution of SELECT * FROM pass_sort('integer'); shows quite linear grow 
of the execution time. So, using '2.0 * cpu_operator_cost' as a cost for 
the integer type (as a basis) we can calculate costs for other 
operators. Variable-width types, i think, could require more complex 
technique to check dependency on the length.

Does this way look worthwhile?

-- 
regards,
Andrey Lepikhov
Postgres Professional




В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Zhijie Hou (Fujitsu)"
Дата:
Сообщение: Move global variables of pgoutput to plugin private scope.
Следующее
От: shveta malik
Дата:
Сообщение: Re: Synchronizing slots from primary to standby