Saga design pattern and transaction in Microservices architectures

Ahmet DELLAL
13 min readMar 3, 2022

As it is generally known, design patterns are the method that will solve a design problem in the simplest and most effective way in project development. So let’s say you’re going to move from besiktas to uskudar. your design template should tell you to get on the ferry. Otherwise, it is possible to go from Beşiktaş to Uskudar via Australia.

For more detailed information, see here.

Ps: Esign patterns either show themselves by sitting on the template at the very beginning or they are started to be used later due to necessity. The places where the design pattern will be used start from the place where the change in the code is known or likely to change. Many examples of singletons have been given. You can explode a multithreaded application using a singleton. For this and other reasons, a singleton antipattern debate is made. If we say that the design pattern is a solution design of a problem, we define it incompletely. Using that solution that makes a pattern into a pattern 3 or 4 times in real world applications is a kind of common knowledge.

I’ll continue without messing up :)

As it is known, it is accepted that every service should ideally have its own database in microservice architectures. Such services are called distributed transactions. Since each service has its own database, there must be holistic data consistency between each service. so naturally, if each service has its own database, then it is inevitable to have a holistic hierarchy among these services. Let me explain with an example.

Let’s say you are developing in the order department at Amazon. Let’s say that we do the order taking process in the ‘Order’ services, the stock transactions after the order in ‘Stock’ and finally the payment transactions in the ‘Payment’ services of this software. Of course, let’s consider that all these services have their own databases, as we mentioned in the paragraphs above, and let’s assume that the operations proceed as follows from a scenario perspective:

When the user places an order for any product, the 'Order' service will perform this operation and add it to its own database. The 'Payment' service will provide the payment and if the payment is successful, it will reduce the stock amount corresponding to the relevant product in the 'Stock' service.

Now, in this process, if the ‘Payment’ service successfully completes the payment for that order after the ‘Order’ service creates the incoming order, the stock number of the relevant product in the order in the ‘Stock’ service must be reduced. Otherwise, an inconsistency will occur. Think about it, after the order is placed and the payment is made, the stock still remains the same. Think of the damage this has done to the data statistics in the software and to the business using that software!

What are Transaction, Distributed Transaction and Compensable Transactions?

Transaction :

You could say everything we do in the database. This is how the incoming scope passes. Basically, if there is a problem in the process of an operation we have done on the database, it is useful to cancel the transaction and roll it back.

Transaction status

Distributed Transaction:

This term, which refers to the situation where more than one different database works as a whole, is used for distributed database systems. Generally, in approaches such as microservices, each service moving its own database is considered a distributed transaction. Refer.

Compensable Transaction

Compensable Transaction, on the other hand, is to take the reverse of what a transaction has done. In other words, the retrospectiveness of a committed transaction is only possible with compensation after this time. This backward compensation process is called compensable transactions.

I believe it will be more enlightening if we examine the figure above. Let me explain with a simple example; The customer came with an order. then we directed it to calculateservice to calculate all the price of this order. In the next step, it is directed to the payment service and the payment step is provided. To put it more simply, the customer has listed the products with the price of x units of money, and the fee has been created. In order to obtain this from the customer, the relevant bank is asked to deduct x units of money. In this case, x units of money are deducted from the customer’s bank account. If any problem occurs, x units of money deducted from the relevant bank will be returned to the customer’s bank account. Here, reverting the work done in all services in case of any negative situation is a compensable transaction.

The core characteristic of the microservices architecture is the loose coupling of services. To achieve that, each service must have its own private data store. So, building the database architecture for microservices almost always requires following the database-per-service pattern. At least up till the point where the application hasn’t got too complex yet, with numerous services involved.

Let’s take a look at an online store application. Order Service and Customer Service each store data in their own databases. Changes to one database don’t impact other microservices.
Actually this method has a name : The Database-per-Service pattern

The most important of these, we can say, is a very difficult and laborious function to query data by joining (join) data between databases and to produce holistic statistical results.
The service’s database can’t be accessed directly by other microservices. Each service’s persistent data can only be accessed via API.

ACID properties of transactions

As a result of any data interference with a software on distributed databases, we evaluate how this process will be implemented in line with certain principles.

In the context of transaction processing, the acronym ACID refers to the four key properties of a transaction: atomicity, consistency, isolation, and durability. refer!

Atomicity
All changes to data are performed as if they are a single operation. That is, all the changes are performed, or none of them are.For example, in an application that transfers funds from one account to another, the atomicity property ensures that, if a debit is made successfully from one account, the corresponding credit is made to the other account.

Consistency
Data is in a consistent state when a transaction starts and when it ends.For example, in an application that transfers funds from one account to another, the consistency property ensures that the total value of funds in both the accounts is the same at the start and end of each transaction.

In this principle, logic refers to the success of all transactions throughout a transaction process. It is the state of undoing the entire transaction in case any operation fails. In short, all or nothing.

Let’s consider two services, if the X record in the first of these services corresponds to the Y record in the other service, there is no problem. But if there is no record in the first service, but there is a Y record in the second service, the alarm should sound. Because data must be consistent in distributed databases. We may encounter unexpected situations in later processes.

Isolation

The intermediate state of a transaction is invisible to other transactions. As a result, transactions that run concurrently appear to be serialized.For example, in an application that transfers funds from one account to another, the isolation property ensures that another transaction sees the transferred funds in one account or the other, but not in both, nor in neither.

The isolation of the services with each other is important. What he means is that in the case of sequential transactions, if each transaction is dependent on the result of the previous transaction, we should wait for it, if not, we should ring alarm bells and shut ourselves down. Let me explain with an example: Let’s consider an e-commerce application, the customer added the products to his cart. Normally, the simple thing to do is to go to the payment step, but there is such a situation that we took the order directly and started to deliver the product without going to the payment step. In this case, a question like who wrote this code comes up.

Durability
After a transaction successfully completes, changes to data persist and are not undone, even in the event of a system failure.For example, in an application that transfers funds from one account to another, the durability property ensures that the changes made to each account will not be reversed. store data in the safest place. whether on cd, disk or cloud service :)

What is the SAGA ?

A saga is a sequence of local transactions where each transaction updates data within a single service. The first transaction is initiated by an external request corresponding to the system operation, and then each subsequent step is triggered by the completion of the previous one.

Types of saga pattern The two ways to perform sagas:

Choreographic sagas:
In this type, the domain events act as triggers. The first transaction is initiated by an external request or user’s input,then each localtransaction publishes domain events into event bus that will trigger local transactions in other services.

Benefits :

This is a natural and easy way of implementing saga pattern. 

This is easy to understand and will not require too much effort to develop. 

All the services are loosely coupled so it does not violate the principle of microservices.

Drawbacks :

As the number of services increases or local transactions increase, this method become more complex and cumbersome because it can have cyclic dependency between the stages of saga. 

Testing would become complex as all the participants should be running.

Orchestration sagas:

An orchestrator or centralized controller tells the participants or services what local transactions to execute.

Benefits: 

We can avoid cyclic dependencies between participants. 

Complexity can be reduced as participants only have to execute and reply for the commands.

Drawbacks: 

Too much load on the orchestrator or centralized controller because we will be concentrating on the orchestrator logic.

Increase in the infrastructure complexity because we need to add extra service.

SAGA — EVENTS/CHOREOGRAPHY

In this approach every change to the state of an application is captured as an event. This event is stored in the database/event store (for tracking purposes) and is also published in the event-bus for other parties to consume.

In other words, when the transaction, which started in the first service, is finished, it will send an event via the message broker (kafka, RabbitMQ etc.). When the transaction is completed in the transmitted service, it will send another event via the message broker in the same way. All of these processes will continue until the process requested by the client is completed.

In Choreography, each service queue listens. If there is a message that comes according to the event message type it listens, it performs the necessary operations and as a result, it adds the successful or unsuccessful information to the queue as an event. Then, other services will either continue their functions according to this event, or all transactions will be restored and data consistency will be ensured.

Example below is from vinsguru

The business workflow is implemented as shown here.

  • order-services receives a POST request for a new order
  • It places an order request in the DB in the ORDER_CREATED state and raises an event
  • payment-service listens to the event, confirms about the credit reservation
  • inventory-service also listens to the order-event and conforms the inventory reservation
  • order-service fulfills order or rejects the order based on the credit & inventory reservation status.
Abstract : In this implementation, it is essential to communicate with each other through events without a central control and communication point between microservices. In other words, he argues that communication between services should be designed asynchronously.

Code Example github link .

Let’s see how it would look like in our e-commerce example:

  1. Order Service saves a new order, set the state as pending and publish an event called ORDER_CREATED_EVENT.
  2. The Payment Service listens to ORDER_CREATED_EVENT, charge the client and publish the event BILLED_ORDER_EVENT.
  3. The Stock Service listens to BILLED_ORDER_EVENT, update the stock, prepare the products bought in the order and publish ORDER_PREPARED_EVENT.
  4. Delivery Service listens to ORDER_PREPARED_EVENT and then pick up and deliver the product. At the end, it publishes an ORDER_DELIVERED_EVENT
  5. Finally, Order Service listens to ORDER_DELIVERED_EVENT and set the state of the order as concluded.

In the case above, if the state of the order needs to be tracked, Order Service could simply listen to all events and update its state.

Rollbacks in distributed transactions

Rolling back a distributed transaction does not come for free. Normally you have to implement another compensating transaction for what has been done before.

Suppose that Stock Service has failed during a transaction. Let’s see what the rollback would look like:

  1. Stock Service produces PRODUCT_OUT_OF_STOCK_EVENT;
  2. Both Order Service and Payment Service listen to the previous message:

2.a Payment Service refund the client

2.b Order Service set the order state as failed

Note that it is crucial to define a common shared ID for each transaction, so whenever you throw an event, all listeners can know right away which transaction it refers to.

Benefits and drawbacks of using Saga’s Event/Choreography design

Events/Choreography is a natural way to implement a Saga orchestration pattern. It is simple, easy to understand, does not require much effort to build, and all participants are loosely coupled as they don’t have direct knowledge of each other. If your transaction involves 2 to 4 steps, it might be a very good fit.

However, this approach can rapidly become confusing if you keep adding extra steps in your transaction as it is difficult to track which services listen to which events. Moreover, it also might add a cyclic dependency between services as they have to subscribe to one another’s events.

Finally, testing would be tricky to implement using this design, in order to simulate the transaction pattern you should have all services running.

SAGA — COMMAND/ORCHESTRATİON

In this approach, the inter-service distributed transaction is coordinated with a central controller. This controller is called the Saga State Machine, or another name, the Saga Orchestrator. Saga Orchestrator manages all operations between services and tells which operation to take based on events.

As its other name, Saga State Machine, can be understood, Saga Orchestrator keeps the application state of each request from each user, interprets it and applies compensatory operations when necessary.

Let’s see how it looks using our e-commerce example below:

  1. Order Service saves a pending order and asks Order Saga Orchestrator (OSO) to start a create order transaction.
  2. OSO sends an Execute Payment command to Payment Service, and it replies with a Payment Executed message
  3. OSO sends a Prepare Order command to Stock Service, and it replies with an Order Prepared message
  4. OSO sends a Deliver Order command to Delivery Service, and it replies with an Order Delivered message

In the case above, Order Saga Orchestrator knows what is the flow needed to execute a “create order” transaction. If anything fails, it is also responsible for coordinating the rollback by sending commands to each participant to undo the previous operation.

A standard way to model a saga orchestrator is a State Machine where each transformation corresponds to a command or message. State machines are an excellent pattern to structure a well-defined behavior as they are easy to implement and particularly great for testing.

Rolling Back in Saga’s Command/Orchestration

Rollbacks are a lot easier when you have an orchestrator to coordinate everything:

  1. Stock Service replies to OSO with an Out-Of-Stock message;
  2. OSO recognizes that the transaction has failed and starts the rollback
    2.1 In this case, only a single operation was executed successfully before the failure, so OSO sends a Refund Client command to Payment Service and set the order state as failed
Here, in case of an error or a failure of the transaction, the orchestrator receives the information that there is a failure in the transaction process and starts the rollback transactions(Compensable Transaction)Keeping state information on Saga for each process will make it easier to see which step you mismanaged the process.

Benefits and Drawbacks of Using Saga’s Command/Orchestration Design

Orchestration-based sagas have a variety of benefits:

  • Avoid cyclic dependencies between services, as the saga orchestrator invokes the saga participants but the participants do not invoke the orchestrator
  • Centralize the orchestration of the distributed transaction
  • Reduce participants’ complexity as they only need to execute/reply commands.
  • Easier to be implemented and tested
  • The transaction complexity remains linear when new steps are added
  • Rollbacks are easier to manage
  • If you have a second transaction willing to change the same target object, you can easily put it on hold on the orchestrator until the first transaction ends.

However, this approach still has some drawbacks, one of them is the risk of concentrating too much logic in the orchestrator and ending up with an architecture where the smart orchestrator tells dumb services what to do.

Another downside of Saga’s Orchestration-based is that it slightly increases your infrastructure complexity as you will need to manage an extra service.

The links to the sources I used during the writing period are below.Sources:https://blog.couchbase.com/saga-pattern-implement-business-transactions-using-microservices-part/https://www.irjet.net/archives/V7/i5/IRJET-V7I5124.pdfhttps://microservices.io/patterns/data/saga.html

--

--