Re: Index only scan

Поиск
Список
Период
Сортировка
От Ondrej Ivanič
Тема Re: Index only scan
Дата
Msg-id CAM6mieKhkwhgkzsO2aqe51v_HOHGvB89g9BEsE_fypOBLhSm=w@mail.gmail.com
обсуждение исходный текст
Ответ на Index only scan  (Lars Helge Øverland <larshelge@gmail.com>)
Список pgsql-general
Hi,

On 10 October 2012 23:03, Lars Helge Øverland <larshelge@gmail.com> wrote:
> We are now in the process of designing a new component for analytics
> and this feature got me thinking we could utilize postgres over other
> alternatives like column-oriented databases. Basically we will have a
> wide, denormalized table with 20+ columns with relatively low
> cardinality. Typically we will have queries which sums a fact column
> based on where/group by clauses on several dimension columns (standard
> data warehouse stuff). An example would be "select a, b, c, sum(d)
> from analytics where a=1 and b=2 group by a,b,c";
>
> Finally, is there anyone else who are using postgres for this purpose
> and have some good tips to share in order to achieve good performance,
> including index strategies, beyond the standard config best practices?

yes, we had fact table which has around 250 columns and 250mil rows.
The question is if you can partition your data set. For example,
monthly partition. This keeps indexes small but all queries must be
constrained by the same column as is used for partitioning (ie.
monthly partitions -> every query should have "datetime between ...
and ...")

From my experience postgres is not good with large group by queries.
For example, your query:
select a, b, c, sum(d) from analytics where a=1 and b=2 group by a,b,c

could be executed over multiple connections:
insert into t select select a, b, c, sum(d) as d from analytics where
c >= val1 and c < val2 and a=1 and b=2 group by a,b,c
insert into t select select a, b, c, sum(d) as d from analytics where
c >= val2 and c < val3 and a=1 and b=2 group by a,b,c
...
insert into t select select a, b, c, sum(d) as d from analytics where
c >= valN-1 and c < valN and a=1 and b=2 group by a,b,c

and then get the final result:
select a, b, c, sum(d) from t group by a,b,c

You can use pgpool-II parallel query feature instead of manual slicing.

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)
(http://www.linkedin.com/in/ondrejivanic)


В списке pgsql-general по дате отправления:

Предыдущее
От: Gavin Flower
Дата:
Сообщение: Re: Index only scan
Следующее
От: Greg Sabino Mullane
Дата:
Сообщение: Re: Planner chooses multi-column index in 9.2 when maybe it should not