Getting your arms around “Versioning”

Getting your arms around “Versioning”

Once you have spent some time integrating different systems you start to see the same problems coming up but being presented in different ways. The question I think I hear most often usually takes a similar form to this

We need to version this interface to avoid breaking an existing consumer, what is the best way to do it?

This is usually followed by some furious Googling and a lack of consensus.

In order to understand why this question keeps reoccurring and why there is no satisfactory resolution, it is useful to really break down the question itself.

What is meant by versioning?

Often the need for “versioning” is driven by the need to evolve the functionality of a service that is already live and is being consumed by at least one client. For whatever reason whether that be a bug fix or an enhancement, the change to the service is such that there is a risk of one or more clients failing to function after the change is made. It follows that “versioning” in its purest sense is the means to add the new behaviour to a Service without impacting existing consumers.

What do you mean by the best way to do it?

We are always looking for the best solution to any problem but the “best” solution is subjective. It depends on context and depends heavily on if we really understand the problem. There is also an assumption that a silver bullet solution exists that will deal with all eventualities but this usually proves to be false, especially in this context. The reality is that there are many approaches each with pros and cons and in this problem space you may need to apply multiple approaches.

Are we asking the right question?

The reason that versioning approaches are often incomplete or not successful is because there isn’t a “one size fits all” approach. In fact, versioning is not really the problem. Instead it is a potential solution to the real problem. We are not dealing with the problem of versioning. Instead the problem is

Minimising the cost of change

As you know change is the only constant. Clients and Services are in constant flux. We want to be able to make these changes (and to be able to run the resulting solutions) easily and cheaply without concerning ourselves with the impact. Some versioning approaches I’ve seen in the past are anything but cheap and simple!

Are you trying to solve a problem that already has a solution?

I have to admit whenever the question of versioning comes up I find myself Googling in the hope that someone has come up with a silver bullet since I last looked. I always draw a blank but what I notice is the same articles do come up time and again. Take a look at the index at this link.

https://msdn.microsoft.com/en-us/library/bb969123.aspx

If you start drilling into them you notice a common theme

  • A lot of clever people have done a lot of thinking in this problem space
  • Many of the articles are old (in IT terms)

Whilst things have moved on especially in terms of the cost of executing different instances of the same service side by side we don’t always have the luxury of being able to use all the latest tech and trendy architectural patterns on our project. So, the first step to getting a grip on this problem space is to understand that the problem is really about ensuring change is cheap and easy and reading about how people have tried to do that in the past.

Advertisements

AngularJS, Protractor and TeamCity – Part2

AngularJS, Protractor and TeamCity – Part2

In my last post, I covered the basics for creating a protractor test suite to test an AngularJS application. The next step for all self-respecting developers is having the suite running as part of a CI/CD pipeline. I promised to outline how you might do this with TeamCity. Before I do that, I want to cover some of the stages that you might miss.

Writing a protractor test suite boils down to writing JavaScript. As with any development effort this might result in good code or it might result in a ball of mud. Whilst the success of the test suite is dependent on the quality of the tests, good automated suites are also dependent on the quality of the code that makes them up. The automated test suite will run regularly, so it needs to be reliable and efficient. The software will evolve, it will change, so the tests will need to change too. Your test code needs to be simple and easy to maintain. The software principles that apply to production code such as SOLID, should also apply to your test code.

Testing Patterns

A typical pattern used when writing automated end 2 end tests is Page Objects. The idea is that the complexity of driving a particular page is removed from the test itself and encapsulated in a page object. The page object has intimate knowledge of a page in the application and exposes common features that the tests will need. The pattern reduces the amount of repeated code and provides an obvious point of reuse. The page object evolves at the same rate as the underlying application functionality. Done correctly the tests themselves become decoupled from changes in the application.

As the page object and the underlying page end up in step it becomes natural for the developer and the tester to work closely. The developer helps the tester understand the capability of the page. The tester then builds a page object to expose this functionality to tests. Done correctly this collaboration drives out an application that is easy to test. As new functionality is created the tester can influence the developer to write the page in a way that is easy for them. One common area is the use of selectors to find elements within the page’s DOM.

Tests written after a page is finished often result in complex and/or unreliable selectors. Complex XPath expressions or custom logic to search the DOM tree for the correct element is common. Tests written like that become hard to maintain. On the other hand, if the tester and developer work together and importantly, the developer understands how their work will be tested, then pages that are easy to test are the result.

Testing the Tests

Writing test code starts to look and feel like writing production code. Similar tooling should be in place. Don’t let your testers struggle with basic editors when your developers are using fully featured IDEs with syntax highlight and intelli-sense. Testers working with protractor should use a decent editor such as VSCode, Sublime or ATOM with a compliment of JavaScript plugins. In-particular use linting tools to ensure that the JavaScript written for tests matches similar style guidelines as the main application AngularJS code.

You should also ensure that it is easy to test your test. This link demonstrates how to set up VSCode to run a test specification by hitting F5. This article itself advertises that you are able to debug protractor tests. Whilst I have managed to hit breakpoints in my tests the debugging experience has fallen well short of my expectation. I have not had the time to investigate further.

Team City Integration

The final step of this is to have your test suite running as part of your CI/CD pipeline. Protractor is a e2e testing framework. End 2 end tests may take some time to execute. Large test suites may take several hours. CI builds should take minutes not hours so it is advisable to run these types of test suite overnight. Daily feedback is better than no feedback at all. If you have a deployment build that deploys to a specific environment each night you could tag execution of the test suite to the end of that. If you deploy to an environment less regularly you can still run the suite each night.

TeamCity happens to offer comprehensive statistical analysis of repeat test runs. It provides trends that show if particular tests are getting slower and it will also highlight tests that are flaky – i.e. ones that sometimes work and sometimes fail even if nothing has changed.

The simplest way to integrate your suite into TeamCity is to use the TeamCity Jasmine reporter. You can add the reporter to your conf.js like this. Don’t forget to add it your package.json file.

exports.config = {
    framework: "jasmine2",
    capabilities: {
        browserName: "chrome"
    },
    onPrepare() {
        let jasmineReporters = require("jasmine-reporters");
        jasmine.getEnv().addReporter(new jasmineReporters.TeamCityReporter());
    },
};

In the last post, I created a gulp task to execute the suite. The final step is to add a build step to execute it.

protractor1

I have a npm install step that ensure all the dependencies required by the tests are downloaded. I’m then executing gulp e2e to run the test.

You can see this is working on the build summary. You can see a count of passing and failing tests.protractor2

As your test suite is outputting results just the way TeamCity likes them you can drill into your tests and get all sort of useful stats, like this.

protractor3

Distributed transactions are not a comfort blanket

Distributed transactions are not a comfort blanket

The business logic in a solution often acts as a Process Manager or orchestrator. This may involve invoking a number of operations and then controlling the process flow based on the result. Depending on what you are building you may be committing changes to both your own systems and external ones.

Consider the following code.

private void MyBusinessProcess()
 {
    var result = external.DoSomethingImportant(args, out errors);
    if (errors.Any())
    {
        MyDb.CommitFailure(result);
    }
    else
    {
        MyDb.CommitSuccess(result);
    }
 }

Here the logic invokes an operation on an external class and then depending on the result, records the outcome to the application’s database whether it is successful or not.

So the solution above is pushed into Live.

Soon support incidents are coming in that indicate that sometimes the calls to CommitFailure or CommitSuccess fail to write their changes. As there is no record that the call to DoSomethingImportant ever happens, so the application tells the user that the operation never executed. When MyBusinessProcess is retried, this time DoSomethingImportant throws an exception because it is not idempotent and calling it with the same arguments is not allowed.

For the sake of this post let’s assume that there is no trivial way to stop the transient problems that causes exception in CommitFailure or CommitSuccess. However there remains the requirement that MyBusinessProcess must operate consistently.

The developer that picks up this issue asks around the team on how the external class works. They find out that not many people really understand this system but they are aware that when the call to DoSomethingImportant completes it commits its result to its own database. When it sees the same arguments again it throws the exception that is the cause of the support incident. The developer examines their development environment and sure enough on their local SQL Server alongside MyAppDB there is another called ExternalDB.

Great. So they implement this code to wrap up the two database operations into a transaction. Now either all calls commit or all calls are rolled back.

private void MyBusinessProcess()
{
    using(TransactionScope tx = new TransactionScope())
    {
        var result = external.DoSomethingImportant(arg, out errors);
        if (errors.Any())
        {
            MyDb.CommitFailure(result);
        } 
        else
        {
            MyDb.CommitSuccess(result);
        }
        tx.Complete();
    }
}

This is tested locally and it seems to work. However, once it hits the first test environment which in this case is hosted in Azure and specifically uses separate SQL Azure nodes for each database MyBusinessProcess fails all the time. This is because in order for a transaction to work across two SQL Azure nodes a distributed transaction must be used. And until recently the only way a Transaction Scope could achieve this would be to enlist into a transaction managed by the Microsoft Distributed Transaction Coordinator (MSDTC) which is not supported on SQL Azure.

I have encountered this problem a couple of times now. I find it interesting that the default option is often to wrap everything up in a transactional scope. Microsoft have done a great job in the language syntax to hide the complexity of deciding whether a distributed transaction is required and then dealing with the 2 phase commit. And that convenience often becomes a problem. Distributed transactions are complex and using them can have a large impact on your application. But as the complexity is hidden many people have forgotten or have no incentive to learn in the first place what is going on under the hood.

When I have challenged people about this, the normal defence is that it is easy to implement so why wouldn’t you do it. However even today it is common for the people that have implemented the code to not be the same people who have to get it working in a bigger environment.

As in the example it is typical for the configuration of development, test and production environments to be different so you may only find problems like I highlighted above late in the day. You don’t want to be finding that all of your transactional logic doesn’t work just as you are trying to put your solution live. The second thing I have seen is that distributed transactions can seriously constrain the performance of your system. In this situation you may only find you have a problem just as your product is becoming successful.

Distributed transactions and transactions in general are said to be ACID – Atomic, Consistent, Isolated and Durable. It is the C that causes all the problems here. Trying to be consistent limits concurrency, as only one party at a time can commit a change. Allowing multiple parties to commit at the same time compromises consistency. When you are trying to get a system working it makes complete sense to be consistent. But when you are trying to implement a system that will see lots of use than that equation no longer seems to make sense.

 

Integration Styles

Integration Styles

If you work in IT and don’t live under a stone, it is likely that you will have come across the term Microservices. It is the new saviour of Enterprise IT. It doesn’t matter what question you ask; the answer currently seems to be Microservices. Out in the Enterprise, SOA and N-Tier architectures are being rebadged as Microservices and the conversations around applying heavyweight governance to Microservices architectures are starting. In the Hype Cycle we are probably near the top of the curve at the Peak of Inflated Expectation and it won’t be long before we are rapidly descending to the Trough of Disillusionment.

I’m not going to use this post to define Microservices nor am I going to try to outline what challenges you might encounter when delivering a Microservices architecture. Instead I’m going to build on some previous posts.

Making Choices, When the Fallacies of Distributed Computing don’t apply and Conservation of Complexity.

These posts cover how it is important to understand the context when making big strategic bets and also how many technically savvy organisations don’t really understand what it means to build a distributed application. They also talk about that reconfiguring your architecture doesn’t remove complexity it simply moves it.

oOo

If there is one thing you need to think about when doing Microservices it is the inherent distributed nature of the system that will be produced. Therefore, you need to accept the fallacies and be comfortable thinking about the styles in which your services will integrate.

There is a reference manual for people working in the field of integration. Enterprise Integration Patterns has been around for some years but the patterns it describes don’t just exist at enterprise level, they are still appropriate at application level when those applications leverage a Microservices architectures.

The first section of the book talks about Integration styles. It discusses the journey from File Transfer, to Shared Database, into Remote Procedure Invocation and then to Messaging. Microservices architectures sit somewhere in between remote procedure invocation and messaging. You might use REST APIs or you might use some type of message queue. There is also the question of whether to use synchronous or asynchronous communication. Messaging would sit on the asynchronous side whilst REST tends to be more synchronous.

integration styles

This transition of styles represents a journey. It is important to understand the benefits that remote procedure invocation gives over a shared database model. It is important to have felt the pain of not being able to make a simple change to a database schema due to the fear of breaking something important. You need to have felt the sinking feeling when you encounter table names such as customer1, customer2 & . And you need to have experienced the frustration in senior stakeholders when they realise that that simple change is going to have to take a big chunk out of their IT budget.

It is the same when considering messaging over remote procedure invocation. When you have experienced the temporal coupling that causes your whole SOA to fail because a single shared service is unavailable you start looking for alternatives. When you see with your own eyes the cascading failures caused when services are available but are taking a long time to respond.

There are two forces at play. Many young developers are unlikely to have experienced of the pain us oldies had in the past. There is a risk that they make the same mistakes again. Secondly many oldies are now in senior positions riding on the success of past projects and have conveniently forgotten about the pain. In hierarchical organisations they are the decision makers and are creating the same systems over and over with different names, N-Tier, SOA and Microservices.

When embarking on a Microservices project look around you. What styles do other projects use, which projects are successful and which are not? What do senior and influential stakeholders understand about modern integration styles and architectures.

If you conclude the rest of the organisational landscape uses a shared database model, you need to ask yourself whether pure message based event driven Microservices architecture is the way to go. Shared database integration is about building a canonical central data model and forcing (ermm, governing) everyone to use it. An event driven approach is very different. It is asynchronous, disconnected and autonomous by design and can take a bit of getting used to. It is possible to deliver a successful project this way, but buy in and experience across the whole team is required. The project team just wanting to go this way is not enough, they must be wide and deep seated experience within the team.

Ask yourself – who is driving this change?  If it is the client maybe there are simpler ways to solve their problems. If it is the project team, are they biting off too much which may cause problems in the future? The reality is that the client may be moving too far from their current maturity level. This increases the risk of failure. It is though you are swinging a pendulum too far and at any moment it could swing back.  Swinging back may mean the project has to change its integration style at a late stage or is left with a hybrid model, some parts of the system using messaging and others going back to old styles such as shared database.

In summary, most organisations are most comfortable with shared database and remote procedure invocation because they are based on mental models that most people can get their heads around.  Whilst some organisations will get asynchronous messaging, care must be taken when introducing the for the first time. Using REST in a Microservices architecture seems a good compromise. It provides a mental model that most people understand however it is possible to layer on more advanced integration styles if the application or business domain requires it.

Assess the context in which the project is being delivered and keep it simple.

Creating Chaos from Simplicity

Creating Chaos from Simplicity

Software is complex, often too complex for the human mind to comprehend.

As IT professionals exposed to this complexity day in and day out, we create mental models, models that simplify problems down to a level that our puny minds can handle. Design Patterns as popularised by the Gang of Four are a response to the inherent complexity in software. Here we have a set of patterns or models that 9 times out of 10 provide solutions to software architecture challenges. Patterns provide a common language that we can use to describe complex concepts in a consistent way.

Ah, consistency, something that software developers are very familiar with. Consistency is drilled into us at programming school – encapsulate the commonality once and then reuse often. This is what frameworks are all about. Who writes a string class from scratch these days?

So the thing that has amazed me enough to spur me to write this post is why do these fundamentals fall by the wayside when we find ourselves under pressure? Why do people re-invent the wheel when a simple, well known pattern would suffice?

Let me explain a very simple pattern. This is a pattern we are applying in a service based architecture. The idea is simple

  • Each service exposes one of more unit of work.
  • Each unit of work is invoked by a Command message
  • The command message relates directly to the service business domain. It is a command to invoke a business process or operation.
  • Once the unit of work is complete, either successfully or not, this is reported by way of an Event message

The pseudo code for this might be

public void Handle(MortgageApproveCommand message)
{
    ValidateMessage(message);
    var mortgageApproval = Transform(message);
    
    var result = Mortgage.Approve(mortgageApproval);
		
    if (result.OK)
    {
	PublishEvent(new MortgageApprovedEvent
	{
		Result = result
	});
    }
    else
    {
        PublishEvent(new MortgageApprovalFailedEvent
	{
	    Result = result
	});
    }
}

I’m not suggesting this is perfect but it demonstrates how to keep the implementation clean by avoiding the logic of other services creeping in and also how to provide an anti-corruption layer between the service interface and the internal business logic. I also like the simplicity of the command -> process -> event pattern.

I recently reviewed some code implemented by a project team and found this. I’ve mapped the code to a very simple service model (Road and Car) to avoid any complexity related to the real business domains causing confusion.

public void Handle(AccelerateCarCommand message)
{
    Road road = message.CurrentRoad;
    if (road == null)
    {		
        throw new ArgumentNullException("No road");
    }
	
    if (road.SpeedLimit == null)
    {		
        throw new ArgumentNullException ("No speed limit");
    }
	
    if (road.TrafficCondition == null )
    {		
        throw new ArgumentNullException ("No traffic");
    }
	
    var actualSpeed = Car.AccelerateToOptimalSpeed(road);
	
    if (actualSpeed < SpeedLimit)
    {
        road.State = Busy;
    }
    else
    {
        road.State = Clear;
    }

    PublishEvent( new RoadStateChangedEvent
    {
        Road = road
    });
}

So what are the issues with this?

1) Passing Service objects/interface structures into external services

Road road = message.CurrentRoad;
if (road == null)
{
    throw new ArgumentNullException("No road");
}
…

The Car service is passed an instance of a Road. This couples the Road and Car services together. A change to the structure of the Road object now impacts the Car. Ideally the Car would have exposed a command that allowed the caller to pass the Speed Limit and Traffic Conditions without passing the entire state that comprises a Road. Both services do need a shared understanding of these concepts but that does not mean they have to share a representation.

2) Passing Interface object into the business logic

var actualSpeed = Car.AccelerateToOptimalSpeed(road)

Not only does this compound the previous issue but now there is no separation between the service interface and the internal logic. If the interface changes, (which it will do by the way, when you least want it to) it has impact across the implementation not just at the perimeter. There must be an anti-corruption layer between the interface and implementation to allow them to change at different rates. I see this as very similar to the Model-View-ViewModel MVVM pattern you see in many web applications.

3) Changing the state of data you don’t own

if (actualSpeed < SpeedLimit)
{
    road.State = Busy;
}
else
{
    road.State = Clear;
}

Here we are changing the state of Road in the Car service.

With the previous points there are ways to refactor our way out of a corner. However this point, and the following one have much worst smells and they are clues to what was going on in the developer’s mind during implementation.

What I see here is the developer making assumptions about what other services will do. They have changed the state of an object they don’t own because they know how the Road and Car services will coordinate to realise an end to end process. This is just another form of coupling… “implementation coupling”. We are using a nice interface but services on each side of the interface are making assumptions about what that other is doing. Change one implementation and you break the other one.

4) Raising Domain Events the service doesn’t own

PublishEvent( new RoadStateChangedEvent
{
    Road = road
});

This is the second piece of the implementation coupling jigsaw. Now that we have changed the state of the Road, we need the Road service to commit the change. So lets forget the problems associated with implementation coupling for a moment and think about how we would deal with this. I would reach for a Command Message where I called the Road service to commit my change. The command message is a point to point message that is only consumed by the Road service.

So what has the developer done? They are publishing an event!

In this system an Event message is a convention for using a Publisher/Subscriber pattern. By definition the publisher has no knowledge about the message subscribers neither does the publisher know when the message is consumed. There are absolutely no guarantees that the message will reach the Road service and even if it does the service may not be able to process it.

BadEvents

If you are following a domain event model, a domain service should only raise events related to its business domain. Here we are raising an event from the Car service that should be owned by the Road service. Any other service could be subscribing to this event and they will assume that this was raised in response to a state change in the Road service. They may get this event before the Road service has a chance to commit the change and it is entirely possible that the change could be rejected. Any one of the subscribing services may kick off another business process which in turn may trigger further events. A rejection by the Road service would leave all the services that had consumed the event in an inconsistent state.

What a mess!

The implementation that spurred this example emerged from good intentions. What I have presented here was the result of a series of incremented design choices, each one on their own, not too bad, but the result took us a long way from where we needed to be. But going back to the introduction, the real problem is the pattern for building these units of work was not well understood and where it was known there was a lack of buy in. Delivery pressure meant that the team had no time to sit up and understand the pattern or understand the concepts it is built on. Instead they forged ahead reinventing the wheel as they went.

So we are left with coupling and chaos where we should be enjoying simplicity and consistency.