Re: Digesting explain analyze

Поиск
Список
Период
Сортировка
От Matthew Wakeling
Тема Re: Digesting explain analyze
Дата
Msg-id alpine.DEB.2.00.1001071115180.25000@aragorn.flymine.org
обсуждение исходный текст
Ответ на Re: Digesting explain analyze  (Jesper Krogh <jesper@krogh.cc>)
Список pgsql-performance
On Thu, 7 Jan 2010, Jesper Krogh wrote:
>> If disk seeks are killing you a kinda crazy idea would be to
>> duplicate the table - clustering one by (id1) and
>> the other one by an index on (id2) and unioning the
>> results of each.
>
> That's doubling the disk space needs for the table. Is there any odds
> that this would benefit when the intitial table significantly exceeds
> available memory by itself?

If the table already greatly exceeds the available RAM, then doubling the
amount of data won't make a massive difference to performance. You're
going to disc for everything anyway.

>> Since each of these duplicates of the table will be clustered
>> by the column you're querying it on, it should just take one
>> seek in each table.
>>
>> Then your query could be something like
>>
>>   select * from (
>>     select * from t1 where id1=2067 order by evalue limit 100
>>     union
>>     select * from t2 where id2=2067 order by evalue limit 100
>>   ) as foo order by evalue limit 100;
>
> This is actually what I ended up with as the best performing query, just
> still on a single table, because without duplication I can add index and
> optimize this one by (id1,evalue) and (id2,evalue). It is still getting
> killed quite a lot by disk IO. So I guess I'm up to:

You're kind of missing the point. The crucial step in the above suggestion
is to cluster the table on the index. This will mean that all the rows
that are fetched together are located together on disc, and you will no
longer be killed by disc IO.

> 1) By better disk (I need to get an estimate how large it actually is
> going to get).

Unless you cluster, you are still going to be limited by the rate at which
the discs can seek. Postgres 8.4 has some improvements here for bitmap
index scans if you have a RAID array, and set the effective_concurrency
setting correctly.

> 2) Stick with one table, but make sure to have enough activity to get a
> large part of the index in the OS-cache anyway. (and add more memory if
> nessesary).

In order to win here, you will need to make memory at least as big as the
commonly-accessed parts of the database. This could get expensive.

> I didnt cluster it, since clustering "locks everything".

You can also test out the hypothesis by copying the table instead:

CREATE NEW TABLE test1 AS SELECT * FROM table1 ORDER BY id1;

Then create an index on id1, and test against that table. The copy will
become out of date quickly, but it will allow you to see whether the
performance benefit is worth it. It will also tell you how long a cluster
will actually take, without actually locking anything.

Matthew

--
 In the beginning was the word, and the word was unsigned,
 and the main() {} was without form and void...

В списке pgsql-performance по дате отправления:

Предыдущее
От: Oleg Bartunov
Дата:
Сообщение: Re: Digesting explain analyze
Следующее
От: Lefteris
Дата:
Сообщение: Air-traffic benchmark