Обсуждение: 2 phase commit: performance implications?

Поиск
Список
Период
Сортировка

2 phase commit: performance implications?

От
"Andy Ballingall"
Дата:
Hello,

Clearly, I shouldn't actually use these transactions unless I have to, and
in cases where I do use it, I'd expect the completion of the transaction to
depend on the speed of all participating databases in the transaction, but
are there any additional overheads which might come with a significant time
penalty that might commonly be overlooked by someone like me with no
previous experience with two-phase commit (2PC)?

---

The application:

I'm evaluating a design for a database scheme where the nation is
partitioned into small areas (e.g. 2km squares), each area serviced solely
by its own dedicated database.

All queries are locally pinpointed, with a limited search radius, and the
database enclosing the centre is responsible for executing the query.

The only issue is to ensure that a query near a boundary between two
adjacent areas behaves as though there was no partitioning. To do this, I'm
looking into using  8.1's new 2PC to allow me to selectively copy data
inserted near a boundary into the adjacent neighbouring databases, so that
this data will appear in boundary searches carried out by the neighbours.
The percentage of inserts which are copied into neighbours is intended to be
roughly 25%, most of which involve just a single copy.

My scheme intends to ensure that all the databases are able to fit entirely
in RAM, and in addition, the amount of data in each database will be
relatively small (and therefore quick to sift through). Inserted data is
'small', and most of the time, each database is servicing read requests
rather than writing, updating or inserting.

A single nationwide database would be logically simpler, but in my case, the
application is a website, and I want a hardware solution that is cheap to
start with, easily extensible, allows a close coupling between the apache
server responsible for a region and the database it hits.

Any insights gratefully received!

Andy Ballingall


Re: 2 phase commit: performance implications?

От
David Roussel
Дата:

The only issue is to ensure that a query near a boundary between two
adjacent areas behaves as though there was no partitioning. To do this, I'm
looking into using  8.1's new 2PC to allow me to selectively copy data
inserted near a boundary into the adjacent neighbouring databases, so that
this data will appear in boundary searches carried out by the neighbours.
Why not just query adjacent databases, rather than copying the data around?

If you really wanted to do this, do you need 2pc?  Once data has been uploaded to the database for region A, then asynchronously copy the data to B, C, D and E later, using a queue.  If you try to commit to all at once, then if one fails, then none has the data.

All depends on what type of data you are dealing with, how important is consistency, i.e. will it cost you money if the data is inconsistent between nodes.

Generally queuing is your friend.  You can use 2pc to ensure your queues work correctly if you like.

David

Re: 2 phase commit: performance implications?

От
"Andy Ballingall"
Дата:


>Why not just query adjacent databases, rather than copying the data around?

The reasons I didn't choose this way were:
1) I didn't think there's a way to write a query that can act on the data in
two
Databases as though it was all in one, and I didn't want to get into merging
multiple database query results on the Application side. I'd rather just
have all the needed data sitting in a single database so that I can perform
whatever query I like without complication.
2) Most database accesses are reads, and I didn't want a big network
overhead for these, especially since I'm aiming for each database to be
entirely RAM resident.

>If you really wanted to do this, do you need 2pc?  Once data has been
uploaded to the database for region A, then asynchronously copy the data to
B, C, D and E later, using a queue. 

I've always assumed that my data needed to be consistent. I guess there are
some circumstances where it isn't really a problem, but each would need to
be carefully evaluated. The easy answer is to say 'yes, it must be
consistent'.

>If you try to commit to all at once, then if one fails, then none has the
data.

Yes, I'd prefer things to be that way in any event.

Regards,
Andy