Обсуждение: parallel query evaluation

Поиск

Список

Период

Сортировка

parallel query evaluation

От

Oliver Seidel

Дата:

10 ноября 2012 г., 17:05:09

Hi,

I have

             create table x ( att bigint, val bigint, hash varchar(30)
);

with 693million rows.  The query

             create table y as select att, val, count(*) as cnt from x
group by att, val;

ran for more than 2000 minutes and used 14g memory on an 8g physical
RAM machine -- eventually I stopped it.  Doing

             create table y ( att bigint, val bigint, cnt int );
             and something a bit like: for i in `seq 0 255` | xargs -n 1
-P 6
                         psql -c "insert into y select att, val,
count(*) from x where att%256=$1 group by att, val" test

runs 6 out of 256 in 10 minutes -- meaning the whole problem can be
done in just under 3 hours.

Question 1: do you see any reason why the second method would yield a
different result from the first method?
Question 2: is that method generalisabl so that it could be included in
the base system without manual shell glue?

Thanks,

Oliver

Re: parallel query evaluation

От

Tom Lane

Дата:

10 ноября 2012 г., 18:32:42

Oliver Seidel <postgresql@os10000.net> writes:
> I have
>              create table x ( att bigint, val bigint, hash varchar(30)
> );
> with 693million rows.  The query

>              create table y as select att, val, count(*) as cnt from x
> group by att, val;

> ran for more than 2000 minutes and used 14g memory on an 8g physical
> RAM machine

What was the plan for that query?  What did you have work_mem set to?

I can believe such a thing overrunning memory if the planner chose to
use a hash-aggregation plan instead of sort-and-unique, but it would
only do that if it had made a drastic underestimate of the number of
groups implied by the GROUP BY clause.  Do you have up-to-date
statistics for the source table?

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: parallel query evaluation

parallel query evaluation

Re: parallel query evaluation