Обсуждение: When does CLUSTER occur?

Поиск

Список

Период

Сортировка

When does CLUSTER occur?

От

"Schnabel, Robert D."

Дата:

29 ноября 2012 г., 19:11:53

Hi,

I’m looking for some general info on the behavior of CLUSTER. I add large amounts of data to an existing table something like this:

BEGIN;

DROP INDEX IF EXISTS xgen1011_si_sn;

COMMIT;

BEGIN;

INSERT INTO gen1011

[snip]

COMMIT;

BEGIN;

CREATE INDEX xgen1011_si_sn

ON gen1011

USING btree

(sample_id, snp_number)

WITH (FILLFACTOR=100)

TABLESPACE index_tablespace;

ALTER TABLE gen1011 CLUSTER ON xgen1011_si_sn;

COMMIT;

My question is whether or not the “ALTER TABLE gen1011 CLUSTER ON xgen1011_si_sn” actually clusters the table at that point or if it just tells it to use that index for clustering? If the latter I assume I need to add a “CLUSTER gen1011 ON xgen1011_si_sn” line at the end along with an ANALYSE line?

Thanks,

Bob

Re: When does CLUSTER occur?

От

Ryan Kelly

Дата:

29 ноября 2012 г., 19:15:54

On Thu, Nov 29, 2012 at 04:11:44PM +0000, Schnabel, Robert D. wrote:
> Hi,
>
> I'm looking for some general info on the behavior of CLUSTER.  I add large amounts of data to an existing table
somethinglike this: 
>
> BEGIN;
> DROP INDEX IF EXISTS xgen1011_si_sn;
> COMMIT;
>
> BEGIN;
> INSERT INTO gen1011
> [snip]
> COMMIT;
>
> BEGIN;
> CREATE INDEX xgen1011_si_sn
>   ON gen1011
>   USING btree
>   (sample_id, snp_number)
>   WITH (FILLFACTOR=100)
> TABLESPACE index_tablespace;
> ALTER TABLE gen1011 CLUSTER ON xgen1011_si_sn;
CLUSTER happens right here. Any subsequent modifications to your data
means that the table is no longer "clustered". CLUSTER physically
reorganizes the table then and there, it does not ensure that the table
stays clustered.

> COMMIT;
>
> My question is whether or not the "ALTER TABLE gen1011 CLUSTER ON xgen1011_si_sn" actually clusters the table at that
pointor if it just tells it to use that index for clustering?  If the latter I assume I need to add a "CLUSTER gen1011
ONxgen1011_si_sn" line at the end along with an ANALYSE line? 
>
> Thanks,
> Bob

-Ryan Kelly

Re: When does CLUSTER occur?

От

Jeff Janes

Дата:

29 ноября 2012 г., 20:19:32

On Thu, Nov 29, 2012 at 8:11 AM, Schnabel, Robert D.
<schnabelr@missouri.edu> wrote:

> My question is whether or not the “ALTER TABLE gen1011 CLUSTER ON
> xgen1011_si_sn” actually clusters the table at that point or if it just
> tells it to use that index for clustering?

It just marks it to use that index if/when it is clustered, it doesn't
do the clustering.


Cheers,

Jeff

Re: When does CLUSTER occur?

От

Mike Christensen

Дата:

29 ноября 2012 г., 20:28:31

On Thu, Nov 29, 2012 at 9:19 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Thu, Nov 29, 2012 at 8:11 AM, Schnabel, Robert D.
<schnabelr@missouri.edu> wrote:

> My question is whether or not the “ALTER TABLE gen1011 CLUSTER ON
> xgen1011_si_sn” actually clusters the table at that point or if it just
> tells it to use that index for clustering?

It just marks it to use that index if/when it is clustered, it doesn't
do the clustering.

It's always kinda annoyed me that the CLUSTER command in Postgres doesn't work like it does on Microsoft SQL. In MSSQL, things will stay clustered as more data is inserted. In PG, you'd have to keep running CLUSTER as some sort of maintenance task every so often, so it's more of a one time command..

Re: When does CLUSTER occur?

От

Shaun Thomas

Дата:

29 ноября 2012 г., 20:33:12

On 11/29/2012 11:28 AM, Mike Christensen wrote:

> It's always kinda annoyed me that the CLUSTER command in Postgres
> doesn't work like it does on Microsoft SQL.

It's a natural side-effect of MVCC, unfortunately. Data goes wherever it
goes according to the free space map, or at the end if there are no free
blocks in the table.

I guess that's one thing rollback segments are good for. In-place
modifications of data so order can be maintained.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: When does CLUSTER occur?

От

"Schnabel, Robert D."

Дата:

29 ноября 2012 г., 20:42:59

-----Original Message-----
From: Jeff Janes [mailto:jeff.janes@gmail.com]
Sent: Thursday, November 29, 2012 11:19 AM
To: Schnabel, Robert D.
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] When does CLUSTER occur?

On Thu, Nov 29, 2012 at 8:11 AM, Schnabel, Robert D.
<schnabelr@missouri.edu> wrote:

> My question is whether or not the "ALTER TABLE gen1011 CLUSTER ON
> xgen1011_si_sn" actually clusters the table at that point or if it
> just tells it to use that index for clustering?

It just marks it to use that index if/when it is clustered, it doesn't do the clustering.

Cheers,

Jeff
[Schnabel, Robert D.]
That's what I thought because I just went through this exercise for another table which was previously clustered.  The
correlationon the column for the index was 1.  After I did the drop index, insert data, create index with the cluster
commandthe correlation was <1. 

Thanks for the clarification.

Bob

Re: When does CLUSTER occur?

От

Mike Christensen

Дата:

29 ноября 2012 г., 20:43:37

On Thu, Nov 29, 2012 at 9:33 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:

On 11/29/2012 11:28 AM, Mike Christensen wrote:

It's always kinda annoyed me that the CLUSTER command in Postgres
doesn't work like it does on Microsoft SQL.

It's a natural side-effect of MVCC, unfortunately. Data goes wherever it goes according to the free space map, or at the end if there are no free blocks in the table.

I guess that's one thing rollback segments are good for. In-place modifications of data so order can be maintained.

Yea, in MS SQL, inserts are more expensive with a clustered index because it has to go move junk around on the heap table..

I've just kinda gotten used to the PG approach, as you can always just re-cluster if you do some huge insert of a bunch of data. I guess what I'd be asking for is a feature that would automatically re-cluster whenever new data was committed, however you could just do that with a trigger I suppose..

Re: When does CLUSTER occur?

От

Jeff Janes

Дата:

29 ноября 2012 г., 21:20:18

On Thu, Nov 29, 2012 at 9:33 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:
> On 11/29/2012 11:28 AM, Mike Christensen wrote:
>
>> It's always kinda annoyed me that the CLUSTER command in Postgres
>> doesn't work like it does on Microsoft SQL.
>
>
> It's a natural side-effect of MVCC, unfortunately. Data goes wherever it
> goes according to the free space map, or at the end if there are no free
> blocks in the table.

I think the two things are pretty much orthogonal.  MVCC might
complicate the coding for maintaining clustering (either a little or
lot, depending the approach) but surely doesn't forbid it.

It should be relatively easy, when inserting a new tuple, to first try
blocks which have "nearby" tuples to see if they have room, before
falling back on normal fsm or extension.  It would maintain an
imperfect clustering, but still much better than current behavior.

Cheers,

Jeff

Re: When does CLUSTER occur?

От

Shaun Thomas

Дата:

29 ноября 2012 г., 21:29:00

On 11/29/2012 12:20 PM, Jeff Janes wrote:

> It would maintain an imperfect clustering, but still much better than
> current behavior.

I thought about that, too. The "imperfect clustering" made me erase
everything I'd written. If the clustering is imperfect, it's not really
clustering. It would mean less random reads to restart the seek chain
though, so it would be a perceptible gain. But it's still not real
clustering until the order is maintained indefinitely.

So far as I know, that simply can't be done with MVCC. Especially on an
insert-only table that's clustered on a column unrelated to insert order.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Re: When does CLUSTER occur?

От

Mike Christensen

Дата:

29 ноября 2012 г., 21:34:59

On Thu, Nov 29, 2012 at 10:28 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:

On 11/29/2012 12:20 PM, Jeff Janes wrote:

It would maintain an imperfect clustering, but still much better than
current behavior.

I thought about that, too. The "imperfect clustering" made me erase everything I'd written. If the clustering is imperfect, it's not really clustering. It would mean less random reads to restart the seek chain though, so it would be a perceptible gain. But it's still not real clustering until the order is maintained indefinitely.

So far as I know, that simply can't be done with MVCC. Especially on an insert-only table that's clustered on a column unrelated to insert order.

How is this implemented in MS SQL then? Obviously, MS SQL supports real clustering and has MVCC..

Re: When does CLUSTER occur?

От

Serge Fonville

Дата:

29 ноября 2012 г., 22:27:15

MSSQL uses an index to maintain the cluster.

If a record fits between it places it there, if it doesn't, it considers either moving data or adding it physically out of order (while maintaining the index)

Kind regards/met vriendelijke groet,

Serge Fonville

http://www.sergefonville.nl

Convince Microsoft!

They need to add TRUNCATE PARTITION in SQL Server

https://connect.microsoft.com/SQLServer/feedback/details/417926/truncate-partition-of-partitioned-table

2012/11/29 Mike Christensen <mike@kitchenpc.com>

On Thu, Nov 29, 2012 at 10:28 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:
On 11/29/2012 12:20 PM, Jeff Janes wrote:

It would maintain an imperfect clustering, but still much better than
current behavior.

I thought about that, too. The "imperfect clustering" made me erase everything I'd written. If the clustering is imperfect, it's not really clustering. It would mean less random reads to restart the seek chain though, so it would be a perceptible gain. But it's still not real clustering until the order is maintained indefinitely.

So far as I know, that simply can't be done with MVCC. Especially on an insert-only table that's clustered on a column unrelated to insert order.

How is this implemented in MS SQL then? Obviously, MS SQL supports real clustering and has MVCC..

Re: When does CLUSTER occur?

От

Igor Neyman

Дата:

29 ноября 2012 г., 22:41:05

From: Mike Christensen [mailto:mike@kitchenpc.com]
Sent: Thursday, November 29, 2012 1:35 PM
To: sthomas@optionshouse.com
Cc: Jeff Janes; Schnabel, Robert D.; pgsql-general@postgresql.org
Subject: Re: When does CLUSTER occur?

On Thu, Nov 29, 2012 at 10:28 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:
On 11/29/2012 12:20 PM, Jeff Janes wrote:
It would maintain an imperfect clustering, but still much better than
current behavior.

I thought about that, too. The "imperfect clustering" made me erase everything I'd written. If the clustering is
imperfect,it's not really clustering. It would mean less random reads to restart the seek chain though, so it would be
aperceptible gain. But it's still not real clustering until the order is maintained indefinitely. 

So far as I know, that simply can't be done with MVCC. Especially on an insert-only table that's clustered on a column
unrelatedto insert order. 

How is this implemented in MS SQL then?  Obviously, MS SQL supports real clustering and has MVCC..

--

The way MVCC implemented in Postgres (not allowing "in place" updates) is obviously complicates things in regards to
clustering.

As for MS SQL, here is how it implements clustering:
The index on which the table is clustered actually becomes a "physical holder" (for the lack of better term) of the
table,meaning the complete table tuples are stored in the leaf blocks of the index it is clustered on.  There is no
otherplace in the database where they are stored physically. 

Regards,
Igor Neyman

Re: When does CLUSTER occur?

От

Serge Fonville

Дата:

29 ноября 2012 г., 22:54:14

The data files contain only one active version of every record. Different versions of that same record are maintained in a version store.

Kind regards/met vriendelijke groet,

Serge Fonville

http://www.sergefonville.nl

Convince Microsoft!

They need to add TRUNCATE PARTITION in SQL Server

https://connect.microsoft.com/SQLServer/feedback/details/417926/truncate-partition-of-partitioned-table

2012/11/29 Igor Neyman <ineyman@perceptron.com>

From: Mike Christensen [mailto:mike@kitchenpc.com]
Sent: Thursday, November 29, 2012 1:35 PM
To: sthomas@optionshouse.com
Cc: Jeff Janes; Schnabel, Robert D.; pgsql-general@postgresql.org
Subject: Re: When does CLUSTER occur?

On Thu, Nov 29, 2012 at 10:28 AM, Shaun Thomas <sthomas@optionshouse.com> wrote:
On 11/29/2012 12:20 PM, Jeff Janes wrote:
It would maintain an imperfect clustering, but still much better than
current behavior.

I thought about that, too. The "imperfect clustering" made me erase everything I'd written. If the clustering is imperfect, it's not really clustering. It would mean less random reads to restart the seek chain though, so it would be a perceptible gain. But it's still not real clustering until the order is maintained indefinitely.

So far as I know, that simply can't be done with MVCC. Especially on an insert-only table that's clustered on a column unrelated to insert order.

How is this implemented in MS SQL then? Obviously, MS SQL supports real clustering and has MVCC..

--

The way MVCC implemented in Postgres (not allowing "in place" updates) is obviously complicates things in regards to clustering.

As for MS SQL, here is how it implements clustering:
The index on which the table is clustered actually becomes a "physical holder" (for the lack of better term) of the table, meaning the complete table tuples are stored in the leaf blocks of the index it is clustered on. There is no other place in the database where they are stored physically.

Regards,
Igor Neyman

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: When does CLUSTER occur?

От

"Kevin Grittner"

Дата:

29 ноября 2012 г., 22:55:27

Igor Neyman wrote:

> As for MS SQL, here is how it implements clustering:
> The index on which the table is clustered actually becomes a
> "physical holder" (for the lack of better term) of the table,
> meaning the complete table tuples are stored in the leaf blocks
> of the index it is clustered on. There is no other place in the
> database where they are stored physically.

Right, in a SQL Server clustered table the data row is stored in
the leaf level of the clustered index, and in PostgreSQL we don't
store tuple visibility information within any index; so we would
have a fair amount of work to do before we could use a similar
approach.

-Kevin

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: When does CLUSTER occur?