Re: proposed TODO: non-locking CREATE INDEX / REINDEX

Поиск
Список
Период
Сортировка
От Hannu Krosing
Тема Re: proposed TODO: non-locking CREATE INDEX / REINDEX
Дата
Msg-id 1118664652.4802.6.camel@fuji.krosing.net
обсуждение исходный текст
Ответ на Re: proposed TODO: non-locking CREATE INDEX / REINDEX  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On L, 2005-06-11 at 12:25 -0400, Tom Lane wrote: 
> Hannu Krosing <hannu@tm.ee> writes:
> > On R, 2005-06-10 at 12:12 -0400, Tom Lane wrote:
> >> Have you forgotten Zeno's paradox?  I don't see a
> >> reason to assume the indexer can *ever* catch up --- it's entirely
> >> likely that adding a new unindexed row is faster than adding an index
> >> entry for it.
> 
> > The same is true of doing a lazy vacuum over a table where tuples are
> > constantly added -  there is no guarantee that the vacuum will ever
> > finish.
> 
> No, there definitely is such a guarantee: the vacuum only scans as many
> blocks as were in the relation when it started.  The vacuum need not
> worry about tuples added after it starts, because it couldn't delete
> them under MVCC rules.  And there is no logical-consistency requirement
> for it to promise to scan every tuple, anyway.

But why is such a quarantee needed at all ?

I could not care less, if there is a theorethical quarantee that some
query will end, even if that end will come in 3 billion years, after
filling all available disk space on earth and exhausting 32bit int OID's
over and over again. 

In other words - is there some fundamental reason why all
commands/queries need to finish in finite time (however long) ? 

Is the finite time execution mandated by some SQL standard (ISO/ANSI) ?

If there is, can't it be solved by 'set statement_timeout=NNNN' ?

> >> This implies that you are hoping for an asynchronous change in the
> >> behavior of other processes, which you are not going to get without
> >> taking out locks, which is what you wanted to avoid.

It seems that "asynchronous change" and "locks" are indeed synonymous,
and the most sensible way to achieve such a change is introducing a new
kind of lock.

> > One way to avoid locking, is to allow the "add tuple to index" routine
> > silently succeed if the index already has it.
> 
> ... thereby silently breaking unique-index checking, you mean?

no no. I mean that it will silently succeed only if _exactly_ the same
tuple (the same value AND the same ctid) are already present.

> > Then we can announce the change in behaviour to running backends, wait
> > for all backends to confirm they have learned about it and only then
> > record CTID_INDEX_MAX.
> 
> You can't wait for confirmation from all other backends without
> expecting to create deadlock issues left and right.  And what is it you
> are waiting for, anyway?  For a backend to confirm that it is prepared
> to insert into an index that it can't even see yet (because the CREATE
> INDEX hasn't committed)? 

I am ready to do the CREATE INDEX in several transactions, just like
VACUUM is done. so that after the fastbuild path is finished, all
backends will see it. It just has to be flagged, so that planner will
not consider it for queries until it is up-to-date.

>  In the case of REINDEX, are you expecting
> that backends will be prepared to insert into *both* old and new
> versions of the index?  They'd better, since there's still every
> prospect of the REINDEX failing and rolling back, leaving the old
> version as the live copy.  

Yes, i am prepared to (in fact I expect them to) insert into both
indexes, until then new one is ready. The cost of doing so is peanuts
compared to the alternatives.

> Speaking of rollback, what happens when
> those backends try to insert into the new copy just after the REINDEX
> has failed and rolled back and deleted the new copy?  Or equivalently,
> what happens when they are still trying to insert into the old copy
> just after the REINDEX commits and deletes that one?

I am perfectly prepared for doing the the create index in 24 hours if
the database can continue serving requests during that period, if that
saves me 24 minutes "maintenance" downtime.

I am also willing to wait for all other backends to commit or rollback
their running transactions. Heck, I am even ready to restart every other
backend (close and reconnect all clients), if that enables me to avoid
the downtime.

> The complexity and fragility of what you are proposing vastly outweighs
> any potential benefit, even if it could be made to work at all, which I
> continue to doubt.

The current workaround (using a second database and slony (log-shipping
does not cut it) replication) is far more expensive and complex still.
It requires at least 2X write capacity of the whole database (compared
to 2X single index writes), plus another computer, plus fast network.

And the ability to add new indexes to 24/7 databases without significant
downtime is a requirement for a database for any fast growing business.

-- 
Hannu Krosing <hannu@skype.net>




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Devrim GUNDUZ
Дата:
Сообщение: Re: Usernames with hyphens
Следующее
От: Jan Wieck
Дата:
Сообщение: Re: The Contrib Roundup (long)