1. Reasons for distributed transactions
1. The original data is stored in a single database and single table. As the business continues to expand, the amount of data continues to increase, and the performance of a single database cannot support data update and access. In order to solve the bottleneck on the database, the database is split horizontally. The transaction operation in the original database has now becomeCross-database transaction operations。
2. As the business continues to grow, split the different module services in the business intoMicroserviceLater, the cross-service distributed transaction problem caused by calling multiple microservices at the same time.
2. Various concepts in distributed transactions
When learning distributed transactions, you will see a lot of concepts in many places, and it is also confusing to see the relationship. Next, a comprehensive introduction.
First of all, X/OPEN is an organization
1.X/Open DTP model
X/Open DTP (Distributed Transaction Process) is a distributed transaction model. This model mainly uses two-phase commit (2PC-Two-Phase-Commit) to ensure the integrity of distributed transactions. In this model, there are three roles:
AP: Application, application program. That is, the business layer. Which operations belong to a transaction is defined by AP.
TM: Transaction Manager, transaction manager. Receive AP transaction requests, manage global transactions, manage transaction branch status, coordinate RM processing, notify RM which operations belong to which global transactions and transaction branches, etc. This is also the core part of the entire transaction scheduling model.
RM: Resource Manager, resource manager. Generally it is a database, but it can also be other resource managers, such as message queues (such as JMS data sources), file systems, etc.
2.XA Protocol (XA Specification)
It is a distributed transaction processing specification proposed by X/OPEN. XA standardizes the communication interface between TM and RM, forming a two-way communication bridge between TM and multiple RMs, thereby ensuring the four characteristics of ACID under multiple database resources. Currently well-known databases, such as Oracle, DB2, mysql, etc., all implement the XA interface and can be used as RM.
XA is a distributed transaction of the database, with strong consistency. In the whole process, the data is locked in a state, that is, during the whole process from prepare to commit and rollback, TM always holds the lock of the database. If someone else wants to modify it This piece of data in the database must wait for the lock to be released, and there is a risk of long transactions.
3.TCC
TCC is a distributed transaction of the business, final consistency, and there will be no lock risk of long transactions. Try is a local transaction. After the resource is locked, the transaction is committed. Confirm/cancel is also a local transaction. You can directly commit the transaction, so multiple short transactions There is no risk of long transactions.
3. Theoretical basis of distributed transactions
1. Two-phase Commit (Two-phase Commit, 2PC)
It is the mechanism used by XA to coordinate multiple resources in a global transaction
As the name suggests, two-phase commit is divided into two phases when processing distributed transactions: voting (voting phase, sometimes called prepare phase) and commit phase.
There are two roles in 2pc, the transaction coordinator (? What component does this role play) and the transaction participant. The transaction participant usually refers to the application database.
The two-phase commit protocol solves the problem of strong consistency of distributed database data
defect:
-
Synchronization blocking problem. During execution, all participating nodes are transaction-blocking. When participants occupy public resources, other third-party nodes have to be blocked from accessing public resources.
-
Single point of failure. Due to the importance of the coordinator, once the coordinator fails. Participants will be blocked forever. Especially in the second stage, when the coordinator fails, all participants are still in the state of locking transaction resources and cannot continue to complete the transaction operation. (If the coordinator fails, you can re-elect a coordinator, but it cannot be resolved because the coordinatorDowntimeThe problem that caused the participant to be blocked)
-
The data is inconsistent. In the second phase of the two-phase submission, when the coordinator sends a commit request to the participants, a local network abnormality occurs or the coordinator fails during the commit request, which causes only a part of the participants to accept the commit request. After receiving the commit request, these participants will execute the commit operation. However, other machines that have not received the commit request cannot perform transaction commit. As a result, the entire distributed system has the phenomenon of data consistency.
2. Three-phase submission (3PC)
The three-phase commit protocol is introduced in both coordinators and participantsTimeout mechanism, And split the first phase of the two-phase commit protocol into two steps: ask, then lock the resource, and finally commit. The three phases of the three-phase commit are: can_commit, pre_commit, and do_commit.
In the doCommit phase, if the participant cannot receive the doCommit or abort request from the coordinator in time, it will continue to commit the transaction after the waiting timeout. (Actually, this should be determined based on probability. When entering the third stage, it means that the participant has received the PreCommit request in the second stage, so the precondition for the coordinator to generate the PreCommit request is that he receives before the second stage starts. The CanCommit response to all participants is Yes. (Once the participant receives the PreCommit, it means that he knows that everyone actually agrees to the modification) So, in one sentence, when entering the third stage, due to network timeout and other reasons, although The participant did not receive a commit or abort response, but he has reason to believe that the probability of a successful submission is very high.)
The difference between 2PC and 3PC:
Compared with 2PC, 3PC mainly solves the single point of failure problem and reduces congestion, because once the participant cannot receive the information from the coordinator in time, he will execute commit by default. It will not always hold transaction resources and be in a blocking state. However, this mechanism can also cause data consistency problems, because, due to network reasons, the abort response sent by the coordinator is not received by the participant in time, so the participant performs the commit operation after the waiting timeout. In this way, there is a data inconsistency with other participants who received the abort command and performed the rollback.
3.TCC(Try-Confirm-Cancel)
Try stage:
Complete all business inspections (consistency) and reserve business resources (quasi-isolation)
Confirm phase:
Confirm the execution of business operations, do not do any business checks, and only use the business resources reserved in the Try phase.
Cancel phase:
Cancel the business resources reserved in the Try phase.
The two-stage submission of TCC and XA has the same effect and the same effect. The following figure lists the comparison between the two.
1) At stage 1:
In XA, each RM prepares to submit its own transaction branch, in fact, prepares to submit resource update operations (insert, delete, update, etc.);
In TCC, it is the main business activity request (try) to reserve resources for each slave business service.
2) At stage 2:
XA judges whether to submit or roll back based on whether each RM has been prepared successfully in the first stage. If all prepare succeed, then commit each transaction branch, otherwise, rollback each transaction branch.
In TCC, if all business resources are successfully reserved in the first stage, then confirm each slave business service; otherwise, cancel (cancel) all resource reservation requests for slave business services.
The difference between TCC two-phase submission and XA two-phase submission is:
XA is a resource-level distributed transaction with strong consistency. During the entire two-phase commit process, resource locks are always held.
The internal two-phase commit process in XA transactions is shielded from developers, and developers cannot perceive this process from the code level. In the transaction manager's two-phase commit process, from prepare to commit/rollback, resources are actually locked all the time. If someone else needs to update these two records, they must wait for the lock to be released.
TCC is a distributed transaction at the business level, with ultimate consistency and will not always hold resource locks.
The two-phase submission in TCC does not completely shield developers, which means that from the code level, developers can feel the existence of two-phase submission. In the execution process of try, confirm/cancel, each local transaction is generally opened to ensure the ACID characteristics of the internal business logic of the method. among them:
1. The local affairs of the try process ensure the correctness of the business logic of resource reservation.
2. The local transaction logic executed by confirm/cancel confirms/cancel reserved resources to ensure final consistency, which is the so-called compensation-based transaction (Compensation-Based Transactions). Due to multiple independent local transactions, resources will not be locked all the time.
In addition, the local transaction performed by confirm/cancel mentioned here isCompensatory affairs:
Compensation is an independent local transaction that supports ACID features. It is used to logically cancel the impact of an ACID transaction on the service provider. For a long-running transaction, instead of implementing a huge distributed ACID transaction, It is better to use a compensation-based solution, treat each service call as a short local ACID transaction, and submit it immediately after execution
4.TCC transaction model VS DTP transaction model
Compare the TCC transaction model and the DTP transaction model, as shown below:
These two pictures look quite different, but in fact many places are similar!
1. The main business service in the TCC model is equivalent to the AP in the DTP model, and the secondary business service in the TCC model is equivalent to the RM in the DTP model
In the DTP model, the AP is used to operate the resources on multiple resource managers RM; in the TCC model, the main business service operates the resources on multiple slave business services. For example, in the flight booking case, the Meituan App is the main business service, while Sichuan Airlines and China Eastern Airlines are the secondary business services, and the primary business service needs to use the ticket resources on the secondary business services. The difference is that the resource provider in the DTP model is a relational database similar to Mysql, while the resource provider in the TCC model is other business services.
2. In the TCC model, the try, confirm, and cancel interfaces provided from business services are equivalent to the prepare, commit, and rollback interfaces provided by RM in the DTP model
The XA protocol stipulates that the RM in the DTP model needs to provide prepare, commit, and rollback interfaces for TM to call to achieve two-phase commit.
In the TCC model, the secondary business service is equivalent to RM, providing similar try, confirm, and cancel interfaces.
3. Transaction Manager
There is a transaction manager in the DTP model and the TCC model. the difference is:
In the DTP model, phase 1 (prepare) and phase 2 (commit, rollback) are both called by TM.
In the TCC model, the try interface of phase 1 is the main business service call (green arrow), and the phase 2 (confirm, cancel interface) is the transaction manager TM call (red arrow). This is the two-stage asynchronous function of the TCC distributed transaction model. From the first stage of the business service execution is successful, the main business service can be submitted and completed, and then the second stage of each slave business service is executed asynchronously by the transaction manager framework . This sacrifices a certain degree of isolation and consistency, but improves the availability of long transactions.
Reference materials: