Oracle DBA, How To, Error, Cause and Action

Two-Phase Commit (2PC)

Following is the explanation by C.J. Date

Two-phase commit is important whenever a given transaction can interact with several independent "resource managers" each managing its own set of recoverable resources and maintaining its own recovery log. For example consider a transaction running on an IBM mainframe that updates both an IMS database and a DB2 database (such a transaction is perfectly legal, by the way). If the transaction completes successfully, then all of its updates, to both IMS data and DB2 data must be committed: conversely, if it fails, then all of its updates must be rolled back. In other words, it must not be possible for the IMS updates to be committed and the DB2 updates rolled back, or vice versa for then the transaction would no longer, be atomic (all or nothing).

It follows that it does not make sense for the transaction to issue, say. a COMMIT to IMS and a ROLLBACK to DB2; and even if it issued the same instruction to both, the system could still crash between the two, with unfortunate results. Instead, therefore, the transaction issues a single "global" or system-wide COMMIT (or ROLLBACK). That COMMIT or ROLLBACK is handled by a system component called the coordinator, whose task it is to guarantee that both resource managers (i.e., IMS and DB2, in the example) commit or roll back the updates they are responsible for in unison-and furthermore to provide that guarantee even if the system fails in the middle of the process.
And it is the two-phase commit protocol that enables the coordinator to provide such a guarantee.

Here is how it works. Assume for simplicity that the transaction has completed its database processing successfully, so that the system-wide instruction it issues is COMMIT, not ROLLBACK. On receiving that COMMIT request, the coordinator goes through the following two-phase process:

Prepare:
First, it instructs all resource managers to get ready to "go either way" on the transaction. In practice, this means that each participant in the process-that is, each resource manager involved-must force all log records for local resources used by the transaction out to its own physical log (i.e. out to nonvolatile storage; whatever happens thereafter, the resource manager will now have a permanent record of the work it did on behalf of the transaction, and so will be able to commit its updates or roll them back, as necessary), Assuming the forced write is successful, the resource manager now replies "OK" to the coordinator; otherwise, it replies "Not OK."

Commit:
When the coordinator has received replies from all participants, it forces a record to its own physical log, recording its decision regarding the transaction. If all replies were "OK." that decision is "commit"; if any reply was "Not OK," the decision is "rollback." Either way, the coordinator then informs each participant of its decision, or each participant must then commit or roll back the transaction locally, as instructed. Note that each participant must do what it is told by the coordinator in Phase 2-that is the protocol. Note too that it is the appearance of the decision record in the coordinator's physical log that marks the transition from Phase 1 to Phase 2.

If the system fails at some point during the foregoing process, the restart procedure will look for the decision record in the coordinator's log. If it finds it, then the two-phase commit process can pick up where it left off. If it does not find it, then it assumes that the decision was "rollback," and again the process can complete appropriately. Note: It is worth pointing out that if the coordinator and the participants are executing on different machines, as they might be in a distributed system, then a failure on the part of the coordinator might keep some participant waiting a long time for the coordinator's decision and, as long as it is waiting, any updates made by the transaction via that participant must be kept hidden from other transactions (i.e. those updates will probably have to be kept locked).

Oracle, The Distributed Database Capability (Oracle 2PC)

No comments:

Post a Comment

Thanks for your comment.