Re: Hardware upgrade for a high-traffic database

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: Hardware upgrade for a high-traffic database
Дата
Msg-id 6EE64EF3AB31D5448D0007DD34EEB3412A744E@Herge.rcsinc.local
обсуждение исходный текст
Ответ на Hardware upgrade for a high-traffic database  ("Jason Coene" <jcoene@gotfrag.com>)
Список pgsql-performance
> This example looks fine, but since userid 51 evidently only has 35
> posts, there's not much time needed to read 'em all and sort 'em.  The
> place where the double-column index will win big is on userids with
> hundreds of posts.
>
> You have to keep in mind that each index costs time to maintain during
> inserts/updates.  So adding an index just because it makes a few
queries
> a little faster probably isn't a win.  You need to make tradeoffs.

IMNSHO, in Jason's case he needs to do everything possible to get his
frequently run queries going as quick as possible.  ISTM he can give up
a little on the update side, especially since he is running fsync=false.
A .3-.5 sec query multiplied over 50-100 users running concurrently adds
up quick.  Ideally, you are looking up records based on a key that takes
you directly to the first record you want and is pointing to the next
number of records in ascending order.  I can't stress enough how
important this is so long as you can deal with the index/update
overhead.

I don't have a huge amount of experience with this in pg, but one of the
tricks we do in the ISAM world is a 'reverse date' system, so that you
can scan forwards on the key to pick up datetimes in descending order.
This is often a win because the o/s cache may assume read/forwards
giving you more cache hits.   There are a few different ways to do this,
but imagine:

create table t
(
    id int,
    ts  timestamp default now(),
    iv  interval  default ('01/01/2050'::timestamp - now())
);

create index t_idx on t(id, iv);
select * from t where id = k order by id, iv limit 5;

The above query should do a much better job pulling up data and should
be easier on your cache.  A further win might be to cluster the table on
this key if the table is really big.

note: interval is poor type to do this with, because it's a 12 byte type
(just used it here for demonstration purposes because it's easy).  With
a little trickery you can stuff it into a time type or an int4 type
(even better!).  If you want to be really clever you can do it without
adding any data to your table at all through functional indexes.

Since the planner can use the same index in the extraction and ordering,
you get some savings...not much, but worthwhile when applied over a lot
of users.  Knowing when and how to apply multiple key/functional indexes
will make you feel like you have 10 times the database you are using
right now.

Merlin

В списке pgsql-performance по дате отправления:

Предыдущее
От: Richard Huxton
Дата:
Сообщение: Re: [GENERAL] How to know which queries are to be optimised?
Следующее
От: Christopher Kings-Lynne
Дата:
Сообщение: Re: [GENERAL] How to know which queries are to be optimised?