The business logic in a solution often acts as a Process Manager or orchestrator. This may involve invoking a number of operations and then controlling the process flow based on the result. Depending on what you are building you may be committing changes to both your own systems and external ones.

Consider the following code.

private void MyBusinessProcess()
 {
    var result = external.DoSomethingImportant(args, out errors);
    if (errors.Any())
    {
        MyDb.CommitFailure(result);
    }
    else
    {
        MyDb.CommitSuccess(result);
    }
 }

Here the logic invokes an operation on an external class and then depending on the result, records the outcome to the application’s database whether it is successful or not.

So the solution above is pushed into Live.

Soon support incidents are coming in that indicate that sometimes the calls to CommitFailure or CommitSuccess fail to write their changes. As there is no record that the call to DoSomethingImportant ever happens, so the application tells the user that the operation never executed. When MyBusinessProcess is retried, this time DoSomethingImportant throws an exception because it is not idempotent and calling it with the same arguments is not allowed.

For the sake of this post let’s assume that there is no trivial way to stop the transient problems that causes exception in CommitFailure or CommitSuccess. However there remains the requirement that MyBusinessProcess must operate consistently.

The developer that picks up this issue asks around the team on how the external class works. They find out that not many people really understand this system but they are aware that when the call to DoSomethingImportant completes it commits its result to its own database. When it sees the same arguments again it throws the exception that is the cause of the support incident. The developer examines their development environment and sure enough on their local SQL Server alongside MyAppDB there is another called ExternalDB.

Great. So they implement this code to wrap up the two database operations into a transaction. Now either all calls commit or all calls are rolled back.

private void MyBusinessProcess()
{
    using(TransactionScope tx = new TransactionScope())
    {
        var result = external.DoSomethingImportant(arg, out errors);
        if (errors.Any())
        {
            MyDb.CommitFailure(result);
        } 
        else
        {
            MyDb.CommitSuccess(result);
        }
        tx.Complete();
    }
}

This is tested locally and it seems to work. However, once it hits the first test environment which in this case is hosted in Azure and specifically uses separate SQL Azure nodes for each database MyBusinessProcess fails all the time. This is because in order for a transaction to work across two SQL Azure nodes a distributed transaction must be used. And until recently the only way a Transaction Scope could achieve this would be to enlist into a transaction managed by the Microsoft Distributed Transaction Coordinator (MSDTC) which is not supported on SQL Azure.

I have encountered this problem a couple of times now. I find it interesting that the default option is often to wrap everything up in a transactional scope. Microsoft have done a great job in the language syntax to hide the complexity of deciding whether a distributed transaction is required and then dealing with the 2 phase commit. And that convenience often becomes a problem. Distributed transactions are complex and using them can have a large impact on your application. But as the complexity is hidden many people have forgotten or have no incentive to learn in the first place what is going on under the hood.

When I have challenged people about this, the normal defence is that it is easy to implement so why wouldn’t you do it. However even today it is common for the people that have implemented the code to not be the same people who have to get it working in a bigger environment.

As in the example it is typical for the configuration of development, test and production environments to be different so you may only find problems like I highlighted above late in the day. You don’t want to be finding that all of your transactional logic doesn’t work just as you are trying to put your solution live. The second thing I have seen is that distributed transactions can seriously constrain the performance of your system. In this situation you may only find you have a problem just as your product is becoming successful.

Distributed transactions and transactions in general are said to be ACID – Atomic, Consistent, Isolated and Durable. It is the C that causes all the problems here. Trying to be consistent limits concurrency, as only one party at a time can commit a change. Allowing multiple parties to commit at the same time compromises consistency. When you are trying to get a system working it makes complete sense to be consistent. But when you are trying to implement a system that will see lots of use than that equation no longer seems to make sense.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s