Connecting Web Apps to External Services – Virtual Appliance

Connecting Web Apps to External Services – Virtual Appliance

Last time I proposed a solution which enables Web Apps hosted in Azure App Services to communicate with services running in a private network. This solution required the configuration of a Site to Site VPN which require network configuration both in Azure and on the private network. Sometimes that is not possible so alternative options should be considered. Below I have restated the problem we are trying to solve and then go on to describe an alternative approach.

Problem:

Code hosted in an App Service needs access to a web service endpoint hosted on premise private network. It must be possible for the on premise endpoint to identify traffic from your application hosted on App Services in Azure and only allow access to that traffic.

Solution Option – Deploy Virtual Appliance

In this scenario, there is no private connection between the Web App and the private network. Instead all traffic destined for services on the private network will be routed on an Azure VNET that would subsequently be routed via a Virtual Appliance. The Virtual Appliance is acting as a Firewall and an Outbound Proxy. It forwards traffic over the Internet to the external system endpoints. The Public IP of the appliance would need to be white listed on those systems.

Challenges

The software that runs on a Virtual Appliance is “Enterprise Class” which means it can be difficult to understand and configure correctly. It may also require additional effort to support over the course of the solution’s lifetime.

Context

The following diagram explains the configuration we are trying to achieve. This time there is no need to have an understanding of the private network topology. The only thing that will be shared is the IP Address of the outbound traffic originating from the Virtual Appliance and the public IP address of the service we are connecting to.

virual(no ips)

The following list highlights the differences from the Site to Site VPN solution.

Azure VNet: The main difference with the VNET is that User Defined Routes via a routing table must be created and maintained in order to ensure the correct traffic is routed through the Virtual Appliance.

Frontend Subnet: Acts as the perimeter subnet for the Azure VNet. Having this subnet contain only the virtual appliance makes routing simpler.

Barracuda F-Series: This is the software providing the Virtual Appliance. It is a fully featured Firewall and as such requires some investment to understand it properly. Not only do you need to know how to operate it, you also need to understand how to secure it properly.

In Azure, this is a preconfigured VM which is provisioned with a Network Security Group and a Public IP.

It must be licensed and you can operate it on a Pay As You Go basis or you can pre purchase a license. By default, the Virtual Appliance is a single point of failure. If the firewall were to go down in production, at best, all connectivity to the private network would be lost and at worst, all Internet connectivity from your Web App would be lost (depending how the Azure VNET and Point to Site VPN is configured).

Test Endpoint: The test endpoint only needs to be Internet accessible for this configuration to work. In real world scenarios, the public endpoint is likely to be expose through some sort of perimeter network and your originating IP address will need to be whitelisted before access can be established.

Connecting Web Apps to External Services – Site to Site VPN

Connecting Web Apps to External Services – Site to Site VPN

Last time I set the scene for a common scenario when using Web Apps hosted on Azure App Services. How do I connect to services hosted on a private network? This time I’ll walk through the first potential solution option.

Problem:

Code hosted in an App Service needs access to a web service endpoint hosted in an on premise private network. It must be possible for the on premise endpoint to identify traffic from your application hosted on App Services in Azure and only allow access to that traffic.

Solution Option – Site to Site VPN

Build a private network segment in Azure as an Azure VNET. Connect the App Service to the private Azure VNET using a Point to Site VPN. This acts as a private connection between your application hosted in the Azure multi tenanted App Service infrastructure, allowing it to access resources routable via the Azure VNET. Resources on the VNET are not able to access the Application. The on premise network is connected to the Azure VNET via a Site to Site VPN. This effectively extends the on premise network to the cloud allowing bi-directional communication between resources hosted on premise and those hosted in Azure via private network addressing.

Challenges

Network configuration is required within the on premise network to enable the VPN connection to function. This includes setup of either VPN software or an appliance and configuring network routing to ensure that traffic destine to Azure is routed through the VPN.

The network in Azure must be designed with the on premise network in mind. As a minimum, you need to understand the on premise network design enough to avoid address conflicts when creating the Azure VNET. More likely, any design principles in play on the on premise network are likely to extend to the cloud hosted network.

What this means in practice is that there needs to be collaboration and coordination between the people managing the on premise network and yourself. Depending on the situation this may not be desirable or even possible.

Context

The following diagram explains the configuration we are trying to achieve.

site2site(no ips)

The main components are:

Azure App Services: When setting up the point to site VPN you must define a network range. This is a range of addresses that Azure will select as the outbound IP addresses that the App Service hosted application presents into the Azure VNET. Whilst you might assume that this is the IP address of the server hosting your application it is not quite that straight forwards as Azure is working under the covers to make this all work. However, you can assume that traffic from your application will always originate from this range of addresses so if you make it sufficiently small it is suitable for whitelisting in firewalls, etc. without comprising security.

Azure VNET: Represents your virtual networking space in Azure. You define an address space in which all of your subnets and resources will live.

GatewaySubnet: This is created automatically when you create the VPN gateway in Azure. From experience, it is better to leave it alone. If you add a virtual machine or other networkable devices into this network, routing becomes more of a challenge. Consider this subnet to be the place where external traffic enters and leaves the Azure VNET. The gateway subnet exists inside your Azure VNET so its address range, must exist entirely within the Azure VNET address space.

Backend Subnet: This is an optional subnet. Its primary in this walkthrough is for testing. It is relatively simple to add a VM to the subnet so you can test whether traffic is propagating correctly. For instance, you can test that a Point to Site VPN is working if an App Service application can hit an endpoint exposed on the VM. Additionally, you can test that your Site to Site VPN is working if a VM on this subnet can connect to an endpoint on a machine on your on premise network via its private IP address. The subnet must have an address range within that of the Azure VNET and must not clash with any other subnet. In practice, this subnet can be the location for any Azure resource that needs to be network connected. For example, if you wanted to use Service Fabric, a VM Scale Set is required. That scale set could be connected to the backend subnet which means it is accessible to applications hosted as App Services. In this configuration, it has a two-way connection into the on premise network but a one-way connection from Azure App Service to resources on the backend subnet.

On Premise: This represents your internal network. For demonstration purposes, you should try to build something isolated from your primary Azure subscription. This builds confidence that you have everything configured correctly and you understand why things are working rather than it being a case of Azure “magic”. You could set this up in something completely different from Azure such as in Amazon Web Services and in a later post I’ll walk through how to do that. However, if you are using Azure ensure that your representation of your on premise network is isolated from the Azure resources you are testing. The IP address space of the on premise network and the Azure VNET must not overlap.

Connecting Web Apps to external services – Overview

Connecting Web Apps to external services – Overview

When starting out building websites with Azure, it is likely that you’ll start by deploying a Web App to Azure App Services. This is a great way to get you going but with all things as you invest more and more time and your solution grows up you might experience some growing pains.

All credit to Microsoft. You can build a sophisticated solution with Azure Web Apps and you can have it connect to an Azure SQL instance without really thinking about the underlying infrastructure. It just works.

The growing pains start when you want to connect this Web App to something else. Perhaps you need to connect to some other service hosted in Azure. Or perhaps you need to leverage a third-party system that does not allow connections over the Internet.

Some development focused organisation stumble here. They have chosen Azure for its PaaS capability so they don’t have to think about infrastructure. They can code, then click deploy in Visual Studio – job done. Unfortunately for them to break out of this closed world requires some different skills – some understanding of basic networking is required.

Getting through this journey is not hard but it requires breaking the problem down into more manageable pieces. Once these basics are understood they become a foundation for more sophisticated solutions. Over the next few posts I going to go through some of these foundation elements that allow you to break out of Web App running on Azure App Services (or any other type of App Service) first to leverage other resources running in your Azure subscription such as databases or web services running on VMs and then out into other services running in other infrastructure whether they be cloud hosted or on private infrastructure.

Over this series of posts I’ll be addressing the following scenario.

The code running in a Web App hosted on Azure App Services needs to call a Web Service endpoint hosted in a private network behind a firewall. The organisation says that they’ll only open the firewall to enable access to IP addresses that you own.

This discounts opening the firewall for the range of outbound IP address exposed by Azure App Services as there is no guarantee that you have exclusive use of them.

So the approach will be to build a network in Azure to which the Web App can connect. Then connect the Azure network to the private network by way of a private connection or by way of a connection over the Internet where traffic is routed through a network appliance whose outbound IP is one controlled by you.

 

 

Backlog Black Hole – Agile Anti Patterns

Backlog Black Hole – Agile Anti Patterns

If I create a story and add it to the backlog, it will be lost forever and will never get done

A backlog is a prioritised list of work that needs to be done. The important stuff is at the top and the least important stuff at the bottom. If you find that work is “disappearing” in your backlog what could be the cause

  1. The backlog is not being maintained. The backlog is a living thing and as such needs feeding and watering. By that it needs near constant refinement. New work is discovered it gets added to the backlog. But what is happening to the existing stuff. All stories need to be reviewed not just the new ones. They need updating based on current knowledge. That might mean that the story is not required any longer and should be removed. Some people go so far as purging stories that have not been delivered based on their age. The thinking is that if a story gets to being 3 to 6 months old without being delivered then the chances are it will not be delivered in its current form at all.
  2. The newest work is the highest priority. Just because you have thought of the next killer feature it doesn’t automatically mean delivering that work is the highest priority. It should be assessed based on all the work in the backlog. If new work is always added to the top this starts to push older worked down, often meaning the team never get a chance to work on it.
  3. The work is not well defined. In order for someone to understand the work involved in a story it must be clear. If you are going to the trouble of adding work to the backlog that you think needs to be done you should also put in some effort to describe it. I’m not saying that you need to write “War and Peace” but you do need to represent the work to the Product Owner in ceremonies such as backlog refinement. In some circumstances, there are benefits to be found by having a triage process for new work. This provides a chance for the work to be reviewed by the necessary parties to ensure that it is understood, be prioritised and actually needed.
  4. You don’t have the right tools. A small team might get away with managing their backlog with sticky notes on a board. Large teams may need some tooling. Tooling can be an inhibitor as well as an enabler. So perhaps a tool has been implemented that is hard to use or that requires the team to be trained on. This might make it hard to find stories when you need them. Often it is possible to configure tools to provide reports of stories added in the last week or to enable integration with messaging tools such as slack so you have a constant stream of messages indicating new work entering the backlog.

Up to now this discussion has focused on the negative position that it is a bad thing that work is “being lost” in the backlog. However, when you think about it this may be a sign that you are doing the right thing. The work coming in may be aspirational or simply a wish list which is not what your customers really need. If you have an effective feedback loop you’ll be reacting to your customer’s needs rather than focusing on the things that they don’t care about.

Therefore, if you are the one coming up with the ideas that are not making it into the system you need to understand why. You can’t be precious about the work because it was “your idea”. This is looking at the product you’re building from a personal point of view and not considering how the product is used in reality. Perhaps you don’t understand the product as well as you think you do.

Finally, it is worth making a point around continuous technical improvement. My point of view is that for a product to be successful over a long period of time the technology it is built with needs to continuously evolve. Whether you call this technical debt or something else the point is that there will always be technical work that needs to be done that may not have direct value for the customer. The value is actually to your business as you’ll be able to continue to serve your customers in the future.

How you deal with this depends on the organisation. Often people implement a capacity tax that says that a given percentage of the team’s capacity goes towards technical improvement. This way the team are not asking for permission to improve things but there is a still need to document and prioritise the technical work that needs to be done. This is still a backlog. In other situations where the product owner is technically savvy and understands the relative value between delivering new features vs technical improvement, technical stories can be treated as any other work in the backlog.

Whichever way you look at this, it boils down to the fact that there is a pile of work that needs to be done. The work needs to be prioritised and each work item will have a different potential value to your customer and your business. And their needs to be way to make this work visible and transparent in an efficient manner.

Is Scrum anti Agile?

Is Scrum anti Agile?

When an organisation is moving from a top down process such as waterfall to an Agile methodology like Scrum, for the people involved it can feel like everything is coming off the rails. All the comfortable and reliable “process” is gone and now you really have to think. Change can be difficult and this type of change is no different.

When moving in this direction, a process or a framework is a safety net or a comfort blanket. If you are not careful people can miss the point of the Agile transformation and instead focus on the framework, methodology and tooling. Work management tools such as JIRA, which are a bit of a swiss army knife, can, if you are not careful, become another facet of the process safety net. Before long your work management tool is configured with so much “process” people have stopped thinking and any Agility that was blossoming in that organisation is slowly evaporating.

But let’s look at it from the other angle. Some small organisations have little to no process. From an outsider’s perspective, it looks like chaos but the reality is that these organisations have started from nothing and now have paying customers, so they must be doing something right.

These organisations might be looking to frameworks like Scrum to provide some stability and some predictability. They want to build on a successful foundation and grow without losing what made them special in the first place. So, you might look at implementing Scrum and related tooling simply to manage stories in a backlog. You might encourage using sprints to create a delivery cadence.

And then the backlash starts. In the same way waterfall practitioners think you are trying to take away their comfort blanket, so do the developers in the start-up that needs to mature. Whichever way you look at Scrum it has some rules. Okay, they may be called a framework but they are still rules. These rules drive home the point that to be stable and predictable you cannot have a free for all. The transforming organisation may start to realise that their current ways of working are not special and instead they need to conform with what the majority of the industry is doing.

In this situation, you must realise that processes or frameworks, even lightweight ones like Scrum, can be seen as a burden when transforming chaos into stability. While it might seem like common sense to you, the people undergoing the transformation may believe that agility is being lost, along with the innovation that got the organisation to that point in the first place.

 

Painting the Forth Bridge

Painting the Forth Bridge

They say that it takes so long to paint the Forth Bridge in Edinburgh, by the time the painting team have worked their way across the bridge, the paint at the start will need renewing, so they have to start again. This is of course, a myth, but if it were truth the workers painting the bridge would have a job for life.

Sometimes software projects are like this. They are in a state of perpetual rewrite. The rewrite may be needed because the wrong JavaScript framework was selected at the start so the team are moving to framework N which when completed will solve all problems. Or the application is considered a monolith so the team are “doing the right thing” breaking the solution up into Microservices. The “rewrite” is done with the best intentions but the outcome is often the same. The rewrite takes so long, that the IT world has moved on and now the goal the team is working towards is old fashioned and out of date, Fresh thinking is required, which triggers the next big rewrite and so the cycle continues, much like painting the proverbial bridge.

As professional technies, developers like solving the hard problems. They like using new technologies and the latest frameworks. However, it is a fact of life that most development work isn’t sexy or glamourous. Often developers spend a lot of time grinding out “business logic”, or fixing bugs. The work can become repetitive and boring. There is often a tension between the motivation to keep software development simpler and predictable through standardisation vs. a desire from the technical team to keep their skills fresh on the job.

For freelancers or developers working for software consultancies, getting stuck in one technical stack for a single project is not a problem. The next one is never too far away and it is likely to be very different. Change doesn’t come so frequently for those developers working in software houses. Typically they will be working longer terms on a smaller portfolio of projects and products. For software houses the economics are straight forwards, ship more products – make more money! Investing time in rewrites is a big challenge. Redirecting effort is large scale technical changes means they are not fixing bugs nor are they delivering as many features.

But if a product isn’t changing in pace with the technology landscape it is in danger of stagnating and becoming irrelevant. The software used to build it become out of date. The development team start to feel deskilled and may started to leave this business taking critical knowledge with them. It becomes harder to replace them as your technology stack is no longer attractive to the job market. Before you know it all the innovation that took you from a start up to a mature software house in the first place has leaked away.

As with everything there is a balance to be found. The development team need to be able to stay current but the organisation still needs to pay the bills. Here are a few things to look at to ensure that this balance is maintained.

Be aware of technical debt and pay it down frequently. This is simple really. The best way to avoid big changes in the first place is to fix problems soon after they occur. If they are left to mount up over time it become much harder to fix. Therefore, ensure that the team have the opportunity to fix things as part of the development process.

Ensure that the business value of large technical changes is understood. All work the development teams do should have a business value so ensure that this is understood when it comes to technical changes. There are often valid business reasons for changing from framework X to framework Y, but it is often hard to articulate. There is a temptation to avoid identifying the business value because it is hard to do, and instead the change is delivered as a side project or worst as someone’s pet project. Avoid this temptation as the term “side project” implies a lower priority so it is likely to be pushed to the side when your important customers are hammering down your door asking for the next great feature. Technical changes and evolving architecture is just as important as new features and so all the work should be in the same pot. The Product Owner must be given the hard problem of deciding whether to improve the system itself or deliver new features.

Ensure that large technical changes are delivered as a series of steps as part of a roadmap. Agile development is based on short feedback loops. This is no different when it comes to technical changes. Therefore, a big change should be broken into a roadmap. At the end of the roadmap is a goal and a vision, and at the beginning is a next few steps to get there. The idea is that you don’t create a detailed plan. You might only define the next few steps. This approach also allows the goals to change with little impact. It should be easy to get started as there is no long planning exercise which also means there is no temptation to follow through on a now invalid plan simply because too much cost has been sunken into the planning exercise.

Speeding up Azure Cloud Service deployment in Octopus Deploy

Speeding up Azure Cloud Service deployment in Octopus Deploy

This post is a brain dump of something I discovered working with Azure Cloud Services specifically, when deploying them to Azure with Octopus Deploy.

In the beginning.

Cloud Service deployments have been designed by Microsoft to provide a seamless upgrade experience.  If your cloud service infrastructure comprises of multiple cloud service instances, than the fabric controller in Azure which controls deployments will perform a rolling upgrade. The underlying instances will be gradually upgraded until all are done. When you are deploying to a production slot this all make sense. You want to avoid down time, you want to minimise impact for your customers. However this convenience comes at a cost – time. If you have a large number of Cloud Service instances and web roles the process seems to take for ever.  That is the last thing you want if you are watching over a Live release in an evening.

What other choices are there?

In some deployment scenarios you might deploy to the staging slot of the Cloud Service, do all of your testing and then perform a slot swap to get this version into Live. In this case you don’t want to incur the cost of a rolling upgrade as customers don’t use the staging slot.

The Cloud Service upgrade documentation talks about a deployment mode called Simultaneous. Unfortunately there is not a lot of documentation around that describes what it does. This Stack Overflow question highlights that simultaneous mode is referred to as BlastUpgrade in the topologyChangeDiscovery attribute in the Cloud Service’s Service Definition File. What I determined by experimenting is that in this mode, the fabric controller ignores all upgrade domains meaning all instances are upgraded at once. This was a lot quicker and exactly what I wanted when deploying to Staging slots.

So the obvious answer would be to update the service definition file? Wrong! This didn’t work with Octopus Deploy so I was faced with looking for other options. This led to an investigation of how Octopus Deploy actually deploys Cloud Services. I found that it uses this script by default.

function CreateOrUpdate()
{
    $deployment = Get-AzureDeployment -ServiceName $OctopusAzureServiceName -Slot $OctopusAzureSlot -ErrorVariable a -ErrorAction silentlycontinue
    if (($a[0] -ne $null) -or ($deployment.Name -eq $null))
    {
        CreateNewDeployment
        return
    }
    UpdateDeployment
}

function UpdateDeployment()
{
    Write-Verbose "A deployment already exists in $OctopusAzureServiceName for slot $OctopusAzureSlot. Upgrading deployment..."
    Set-AzureDeployment -Upgrade -ServiceName $OctopusAzureServiceName -Package $OctopusAzurePackageUri -Configuration $OctopusAzureConfigurationFile -Slot $OctopusAzureSlot -Mode Auto -label $OctopusAzureDeploymentLabel -Force
}

function CreateNewDeployment()
{

    Write-Verbose "Creating a new deployment..."
    New-AzureDeployment -Slot $OctopusAzureSlot -Package $OctopusAzurePackageUri -Configuration $OctopusAzureConfigurationFile -label $OctopusAzureDeploymentLabel -ServiceName $OctopusAzureServiceName
}

function WaitForComplete()
{

    $completeDeployment = Get-AzureDeployment -ServiceName $OctopusAzureServiceName -Slot $OctopusAzureSlot
    $completeDeploymentID = $completeDeployment.DeploymentId
    Write-Host "Deployment complete; Deployment ID: $completeDeploymentID"
}

CreateOrUpdate
WaitForComplete

You’ll notice calls to Set-AzureDeploymentwhere the modeparameter is set to Auto.However the documentation for this powershell cmdlet states that the optional mode argument can be set to Simultaneous. How do you get Octopus to do something different. Luckily if you drop a powershell script called DeployToAzure.ps1 into the root of the package you are deploying , Octopus will use your script rather than its own. Therefore you can adjust the script to look like this.

if ($UseSimultaneousUpgradeMode -eq "True")
{
    Write-Verbose "Using Simultaneous Upgrade Mode"
    Set-AzureDeployment -Upgrade -ServiceName $OctopusAzureServiceName -Package $OctopusAzurePackageUri -Configuration $OctopusAzureConfigurationFile -Slot $OctopusAzureSlot -Mode Simultaneous -label $OctopusAzureDeploymentLabel -Force
}
else
{
    Set-AzureDeployment -Upgrade -ServiceName $OctopusAzureServiceName -Package $OctopusAzurePackageUri -Configuration $OctopusAzureConfigurationFile -Slot $OctopusAzureSlot -Mode Auto -label $OctopusAzureDeploymentLabel -Force
}

Where $UseSimultaneousUpgradeMode is an Octopus variable that can be used to control which mode is used when.

One word of warning. You see the function in the script called WaitForComplete()? This is used by Octopus to determine when the release is complete. It works by querying the relevant AzureDeployment. I have found that this reports back as complete before the Cloud Service instances have upgraded. And if you were to swap from Staging to Production whilst they were still upgrading… oops you have a temporary outage. So if you are doing this remember to physically check the status of the staging deployment before swapping.