Обсуждение: how fast index works?
Hi everyone,
My question is, if I have a table with 500,000 rows, and a SELECT of one row is returned in 10 milliseconds, if the table has 6,000,000 of rows and everything is OK (statistics, vacuum etc)
can i suppose that elapsed time will be near to 10?
Hi everyone,
My question is, if I have a table with 500,000 rows, and a SELECT of one row is returned in 10 milliseconds, if the table has 6,000,000 of rows and everything is OK (statistics, vacuum etc)
can i suppose that elapsed time will be near to 10?
Theoretically the index is a B-tree with log(N) performance, so a larger table could be slower. But in a real database, the entire subtree might fall together in one spot on the disk, so retrieving a record from a 500,000 record database could take the same time as a 6,000,000 record database.
On the other hand, if you do a lot of updates and don't have your autovacuum parameters set right, a 500,000 record index might get quite bloated and slow as it digs through several disk blocks to find one record.
There is no simple answer to your question. In a well-maintained database, 6,000,000 records are not a problem.
Craig
Thanks!
De: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] En nombre de Craig James
Enviado el: martes, 06 de septiembre de 2011 03:18 p.m.
Para: pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] how fast index works?
On 9/6/11 11:31 AM, Anibal David Acosta wrote:
Hi everyone,
My question is, if I have a table with 500,000 rows, and a SELECT of one row is returned in 10 milliseconds, if the table has 6,000,000 of rows and everything is OK (statistics, vacuum etc)
can i suppose that elapsed time will be near to 10?
Theoretically the index is a B-tree with log(N) performance, so a larger table could be slower. But in a real database, the entire subtree might fall together in one spot on the disk, so retrieving a record from a 500,000 record database could take the same time as a 6,000,000 record database.
On the other hand, if you do a lot of updates and don't have your autovacuum parameters set right, a 500,000 record index might get quite bloated and slow as it digs through several disk blocks to find one record.
There is no simple answer to your question. In a well-maintained database, 6,000,000 records are not a problem.
Craig
Hi everyone,
My question is, if I have a table with 500,000 rows, and a SELECT of one row is returned in 10 milliseconds, if the table has 6,000,000 of rows and everything is OK (statistics, vacuum etc)
can i suppose that elapsed time will be near to 10?
It's not that simple. In addition to the performance scaling Craig James mentioned, there are cache effects.
Your 500,000 row index might fit entirely in RAM. This means that no disk access is required to query and search it, making it extremely fast. If the index on the larger table does NOT fit entirely in RAM, or competes for cache space with other things so it isn't always cached in RAM, then it might be vastly slower.
This is hard to test, because it's not easy to empty the caches. On Linux you can the the VM's drop_caches feature, but that drops *all* caches, including cached disk data from running programs, the PostgreSQL system catalogs, etc. That makes it a rather unrealistic test when the only thing you really want to remove from cache is your index and the table associated with it.
The best way to test whether data of a certain size will perform well is to create dummy data of that size and test with it. Anything else is guesswork.
--
Craig Ringer
On Tue, Sep 6, 2011 at 1:31 PM, Anibal David Acosta <aa@devshock.com> wrote: > Hi everyone, > > > > My question is, if I have a table with 500,000 rows, and a SELECT of one row > is returned in 10 milliseconds, if the table has 6,000,000 of rows and > everything is OK (statistics, vacuum etc) > > can i suppose that elapsed time will be near to 10? The problem with large datasets does not come from the index, but that they increase cache pressure. On today's typical servers it's all about cache, and the fact that disks (at least non ssd drives) are several orders of magnitude slower than memory. Supposing you had infinite memory holding your data files in cache or infinitely fast disks, looking up a record from a trillion record table would still be faster than reading a record from a hundred record table that had to fault to a spinning disk to pull up the data. merlin
On 7/09/2011 2:31 AM, Anibal David Acosta wrote:Hi everyone,
My question is, if I have a table with 500,000 rows, and a SELECT of one row is returned in 10 milliseconds, if the table has 6,000,000 of rows and everything is OK (statistics, vacuum etc)
can i suppose that elapsed time will be near to 10?
It's not that simple. In addition to the performance scaling Craig James mentioned, there are cache effects.
Your 500,000 row index might fit entirely in RAM. This means that no disk access is required to query and search it, making it extremely fast. If the index on the larger table does NOT fit entirely in RAM, or competes for cache space with other things so it isn't always cached in RAM, then it might be vastly slower.
This is hard to test, because it's not easy to empty the caches. On Linux you can the the VM's drop_caches feature, but that drops *all* caches, including cached disk data from running programs, the PostgreSQL system catalogs, etc. That makes it a rather unrealistic test when the only thing you really want to remove from cache is your index and the table associated with it.
The best way to test whether data of a certain size will perform well is to create dummy data of that size and test with it. Anything else is guesswork.
--
Craig Ringer
Hany ABOU-GHOURY <hghoury@gmail.com> wrote: > I am working on project to migrate PostgreSQL from V8.2 to 9.0 and > optimise the new DB Please don't hijack a thread to start a new topic. Start a new thread with a subject line which describes the new topic. -Kevin