Обсуждение: how fast index works?

Поиск
Список
Период
Сортировка

how fast index works?

От
"Anibal David Acosta"
Дата:

Hi everyone,

 

My question is, if I have a table with 500,000 rows, and a SELECT of one row is returned in 10 milliseconds, if the table has 6,000,000 of rows and everything is OK (statistics, vacuum etc)

can i suppose that elapsed time will be near to 10?

 

 

 

 

Re: how fast index works?

От
Craig James
Дата:
On 9/6/11 11:31 AM, Anibal David Acosta wrote:

Hi everyone,

 

My question is, if I have a table with 500,000 rows, and a SELECT of one row is returned in 10 milliseconds, if the table has 6,000,000 of rows and everything is OK (statistics, vacuum etc)

can i suppose that elapsed time will be near to 10?


Theoretically the index is a B-tree with log(N) performance, so a larger table could be slower.  But in a real database, the entire subtree might fall together in one spot on the disk, so retrieving a record from a 500,000 record database could take the same time as a 6,000,000 record database.

On the other hand, if you do a lot of updates and don't have your autovacuum parameters set right, a 500,000 record index might get quite bloated and slow as it digs through several disk blocks to find one record.

There is no simple answer to your question.  In a well-maintained database, 6,000,000 records are not a problem.

Craig

Re: how fast index works?

От
"Anibal David Acosta"
Дата:

Thanks!

 

 

De: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] En nombre de Craig James
Enviado el: martes, 06 de septiembre de 2011 03:18 p.m.
Para: pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] how fast index works?

 

On 9/6/11 11:31 AM, Anibal David Acosta wrote:

Hi everyone,

 

My question is, if I have a table with 500,000 rows, and a SELECT of one row is returned in 10 milliseconds, if the table has 6,000,000 of rows and everything is OK (statistics, vacuum etc)

can i suppose that elapsed time will be near to 10?


Theoretically the index is a B-tree with log(N) performance, so a larger table could be slower.  But in a real database, the entire subtree might fall together in one spot on the disk, so retrieving a record from a 500,000 record database could take the same time as a 6,000,000 record database.

On the other hand, if you do a lot of updates and don't have your autovacuum parameters set right, a 500,000 record index might get quite bloated and slow as it digs through several disk blocks to find one record.

There is no simple answer to your question.  In a well-maintained database, 6,000,000 records are not a problem.

Craig

Re: how fast index works?

От
Craig Ringer
Дата:
On 7/09/2011 2:31 AM, Anibal David Acosta wrote:

Hi everyone,

 

My question is, if I have a table with 500,000 rows, and a SELECT of one row is returned in 10 milliseconds, if the table has 6,000,000 of rows and everything is OK (statistics, vacuum etc)

can i suppose that elapsed time will be near to 10?



It's not that simple. In addition to the performance scaling Craig James mentioned, there are cache effects.

Your 500,000 row index might fit entirely in RAM. This means that no disk access is required to query and search it, making it extremely fast. If the index on the larger table does NOT fit entirely in RAM, or competes for cache space with other things so it isn't always cached in RAM, then it might be vastly slower.

This is hard to test, because it's not easy to empty the caches. On Linux you can the the VM's drop_caches feature, but that drops *all* caches, including cached disk data from running programs, the PostgreSQL system catalogs, etc. That makes it a rather unrealistic test when the only thing you really want to remove from cache is your index and the table associated with it.

The best way to test whether data of a certain size will perform well is to create dummy data of that size and test with it. Anything else is guesswork.

--
Craig Ringer

Re: how fast index works?

От
Merlin Moncure
Дата:
On Tue, Sep 6, 2011 at 1:31 PM, Anibal David Acosta <aa@devshock.com> wrote:
> Hi everyone,
>
>
>
> My question is, if I have a table with 500,000 rows, and a SELECT of one row
> is returned in 10 milliseconds, if the table has 6,000,000 of rows and
> everything is OK (statistics, vacuum etc)
>
> can i suppose that elapsed time will be near to 10?

The problem with large datasets does not come from the index, but that
they increase cache pressure.  On today's typical servers it's all
about cache, and the fact that disks (at least non ssd drives) are
several orders of magnitude slower than memory.  Supposing you had
infinite memory holding your data files in cache or infinitely fast
disks, looking up a record from a trillion record table would still be
faster than reading a record from a hundred record table that had to
fault to a spinning disk to pull up the data.

merlin

Re: how fast index works?

От
Hany ABOU-GHOURY
Дата:
Hi all,
I am working on project to migrate PostgreSQL from V8.2 to 9.0 and optimise the new DB
has any one done some thing like that before ?
my main Task is the Optimisation part so please share some thoughts
 
Regards
 
Hany

On Wed, Sep 7, 2011 at 1:03 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
On 7/09/2011 2:31 AM, Anibal David Acosta wrote:

Hi everyone,

 

My question is, if I have a table with 500,000 rows, and a SELECT of one row is returned in 10 milliseconds, if the table has 6,000,000 of rows and everything is OK (statistics, vacuum etc)

can i suppose that elapsed time will be near to 10?



It's not that simple. In addition to the performance scaling Craig James mentioned, there are cache effects.

Your 500,000 row index might fit entirely in RAM. This means that no disk access is required to query and search it, making it extremely fast. If the index on the larger table does NOT fit entirely in RAM, or competes for cache space with other things so it isn't always cached in RAM, then it might be vastly slower.

This is hard to test, because it's not easy to empty the caches. On Linux you can the the VM's drop_caches feature, but that drops *all* caches, including cached disk data from running programs, the PostgreSQL system catalogs, etc. That makes it a rather unrealistic test when the only thing you really want to remove from cache is your index and the table associated with it.

The best way to test whether data of a certain size will perform well is to create dummy data of that size and test with it. Anything else is guesswork.

--
Craig Ringer

Re: how fast index works?

От
"Kevin Grittner"
Дата:
Hany ABOU-GHOURY <hghoury@gmail.com> wrote:

> I am working on project to migrate PostgreSQL from V8.2 to 9.0 and
> optimise the new DB

Please don't hijack a thread to start a new topic.  Start a new
thread with a subject line which describes the new topic.

-Kevin