Azure, Cloud Service & Reserved IPs

Azure, Cloud Service & Reserved IPs

Azure is pretty good at getting you up and running quickly. You can get from nothing to a solution in production very quickly. Whilst this approach definitely reduces time to market, it can introduce growing pains along the way. Let’s consider Cloud Services as a specific example of how growing pains might manifest themselves.

When you create a Cloud Service you get two IP addresses, one for each slot, Staging and Production. These are allocated from a huge range Azure manages for each region and you have no guarantee of which IP you’ll get. When you’re setting up your Cloud Service you probably didn’t worry about that. As time passes and the solution matures you may have used those IP addresses to create firewall rules to your databases in Azure and perhaps even giving them to third parties in order for them to be whitelisted to allow your application to access another service.

Now the specific IP addresses that were allocated at random by Azure are now critical to the success of your solution. And guess what, those IP are not as secure as you might think. If for whatever reason your Cloud Service is deallocated, the IP addresses will be lost. When you recreate the Cloud Service it will be allocated a new IP address. All of your firewall rules now don’t work. That might not be a major problem for your own rules which you can hopefully change rapidly but it might be a problem if you are working with a supplier that has a 2 week turn around SLA for “minor changes”.

This is where Reserved IPs come in. They are a means to control the life span of an IP address by effectively taking ownership of it in your Azure subscription. Now Azure will never reclaim an IP Address whilst it is reserved in your subscription. The following PowerShell command will create a new reserved IP address.

New-AzureReservedIP –ReservedIPName very-important-ip –Location "UK South"

And this command associates the IP address with an existing Cloud Service.

Set-AzureReservedIPAssociation -ReservedIPName very-important-ip -ServiceName MyBrilliantService

However, we might already have a Cloud Service and its IP address has become important. Changing it would cause unacceptable problems. Luckily it is possible to create a reserved IP from an existing cloud service. The following Powershell command creates the reserved IP using the IP address of the Staging slot of a Cloud Service and also creates the association between the reserved IP and the Cloud Service

New-AzureReservedIP –ReservedIPName very-important-ip –Location "UK South" -ServiceName MyBrilliantService -Slot Staging

A couple of notes about these commands

  1. The -Location is the region in which the reserved IP will be created
  2. The -Slot is an optional argument on these commands. It lets you target either the Staging or Production Cloud Service deployment slots. Production is the default.
  3. Reserved IPs are a “classic” Azure feature. As such resource groups are a meaningless concept. You’ll see all your reserved IPs deployed to a Default resource group.
  4. The first five reserved IPs are free but you should be aware managing more than this is not. You are charged based on the time you hold on to an IP address, which is in effective, to dissuade you from holding on to a large number of publically routable IPV4 addresses which are increasingly become a limited resource. https://azure.microsoft.com/en-us/pricing/details/ip-addresses/

Let’s talk about deallocation. Over the lifetime of your solution your architecture will evolve. You might need to move an IP address from one resource to another. You may want to release IP addresses that you no longer use. Once you have an IP address reserved you own it until the reserved IP resource is deleted. The only way it can used is by creating an association. To use it elsewhere it must be deallocated from the original resource and associated with the new resource. It is important to note that during the process the original resource will receive a new IP address from Azure’s pool.

When playing around with reserved IPs I noticed a couple of behaviours that are worth noting.

Firstly, once a Cloud Service has a reserved IP you must specify its name when deploying the Cloud Service. Remember that the name of the IP should map to the one used for the particular slot. You do this by adding a NetworkConfiguration section to the service ServiceConfiguration file

<?xml version="1.0" encoding="utf-8"?>
<ServiceConfiguration serviceName="My Service" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="4" osVersion="*" schemaVersion="2015-04.2.6">
<Role name="MyRole">
…
</Role>
<NetworkConfiguration>
<AddressAssignments>
<ReservedIPs>
<ReservedIP name="very-important-ip"/>
</ReservedIPs>
</AddressAssignments>
</NetworkConfiguration>
</ServiceConfiguration>

I found when the reserved IP was not referenced I received the following error when deploying. I believe by not specifying the IP the deployment process assumes you are changing the IP which is not allowed

Set-AzureDeployment : BadRequest: A reserved IP cannot be added, removed or changed during deployment update or upgrade.

Secondly if you added a reserved IP to one slot of your Cloud Service you must also add one to the other if you want to be able swap the deployment slots. You’ll get this error if you forget.

Move-AzureDeployment : BadRequest: Cannot swap VIPs when only one deployment has a Reserved IP.

Finally, as the number of Cloud Services in a particular environment grows and the number of environments increases the management overhead for the individual reserved IPs increases greatly. Let’s say you have 6 cloud services in 4 environments. That is:

6 cloud services * 2 deployment slots * 4 environments = 48 reserved IPs

In that case it might be better in the long run to build up a VNET with a subnets for each environment and then have a Virtual Network Appliance presenting these network to the Internet on a smaller range of IPs.

For further reading on Reserved IPs take a look at the following links.

The state of “Not Invented Here Syndrome” in 2017

The state of “Not Invented Here Syndrome” in 2017

Development teams often build up high levels of trust internally due to the nature of the constant collaboration between team members.  Whilst that internal trust increases and increases, it can cause a lack of trust of outsiders whether that be 3rd parties or even other internal teams. So, when there is a genuine case for reuse there is often a strong argument against it. A common one is that the high-quality standards of the team can only be assured if code is written in house.

And why not. Developers like writing code, therefore given the chance they will write “all the code”. But code has a cost in terms on maintaining a solution over time. And we will have to support the solution because software isn’t written once then forgotten about, it continuously evolves. And let’s not forget that writing scalable, reliable and adaptable distributed systems is hard.  Who really wants to be debugging a custom load balancing solution when your system is on its knees and customers are beating down your door. Why invest the next couple of months building yet another custom security solution when your competitors seem to be releasing new features every few weeks.

The IT industry is seeing trends that will hopefully consign that old insular mindset to the history books.

Cloud computing offers, amongst many other advantages, the opportunity of offloading complexity on to some other party. Why worry about heating and air conditioning in a custom data centre when all you really need to do is build a website? Economies of scale means that costs are substantially reduced but you need to remember that cloud offerings are built for the masses and if you don’t fit then you may not get the benefits you expected. Cloud solutions such as Azure and Amazon Web Services practically offer a menu of services that you pick based on your requirements for ease of use vs the flexibility and control that you need. At the extreme, serverless computing promises that you can deploy and run code in the cloud without ever worrying about how the underlying infrastructure will be scaled to meet demand.

There is a trend where many companies are reinventing themselves as tech companies – Netflix and Amazon are just a couple of examples of companies that in order to be disruptive in their particular marketplaces transformed themselves into technology companies. Over the last few years this has reached a tipping point and now many organisations are trying the same thing and expecting the same results. Whilst it is true that IT is fundamental to many business models and being technically savvy as an organisation has a key role to play it is unlikely that everyone needs to code their IT from the ground up.

By looking at the first movers in that space you see technologies being developed in house to solve a particular problem and then shared back to the community. Google created AngularJS and Facebook the Cassandra NoSQL database. Today anyone can pick up these projects for their own use and perhaps more importantly they can contribute to them allowing them to evolve independently.

So, my vision of a team that is successfully avoiding NIH Syndrome in 2017 is one that

  • Has a wide understanding of the technology landscape
  • Does not exhibit siloed thinking about technology stacks, particular products or architectures
  • Has the time and space to try new things
  • And is encourage to contribute back into the community that they take from.

Reusing open source software is not like picking apples from somebody else’s orchard. It is a two-way proposition. You use an open source project to enhance your own product – usually to save cost and time. Therefore, you should invest some of that time back even if it is to simply fix bugs or answer questions on Stack overflow. And here in lies the challenge. Many organisations do not yet see the value of reinvesting in the community that bootstrapped them to where they are today and are so single minded they cannot see beyond their own immediate business pressures to deliver more features. Whether this approach is sustainable – I’m not sure. But as more and more companies transform into technology companies the cream of the development world will come to expect certain values from their employers and as you know the cream rises to the top.

 

 

 

 

Permission, Ability and Skills

Permission, Ability and Skills

It could be argued that the primary purpose of a software developer is to provide solutions to problems. Problems are presented in the form of requirements or user stories and it is the software developers job to provide a solution. They provide a solution with the tools that are available and within the constraints that exist within the organisation in which they work. Normally this is done unconsciously and we don’t stop to think about it. We open of our IDE of choice and start coding.

It is only when we start to encounter what might be considered more difficult problems, we start to see the limitations of this approach. What happens if the problem is that of building up a continuous delivery approach for your team or providing a level of disaster recovery for your product’s infrastructure.

Torturous Analogy Time.

Let imagine that the problem is that I need to travel from London to New York for conference. I have permission from my organisation to do this but I have no other help.  Let say I work for an aircraft manufacturer and they have a plane sitting outside ready to be used. They don’t want to waste money on buying a standard airline ticket when they have a £multi-million asset sitting outside. (I told you this is torturous!). So now I have the ability to get from London to New York but that still isn’t helping me because I can’t fly the plane, I don’t have the necessary skills.

In order to solve the problem I need the permission to tackle it, the ability, through technical support and the skills to use that technology to solve the problem at hand.

In software development we’ll nearly always have the permission and tools but that is not always enough to solve the big problems such as those a mentioned above – building up a continuous delivery approach for your team or providing a level of disaster recovery for your product’s infrastructure.

Solving the big problems often requires organisational transformations which require at least three aspects to change – people, process and tools.  Simply having the tools is not normally enough.  It like having the plane without the skill to fly it. As technologists, we want to believe that it is the tool that provides the solution to a problem, but it isn’t.

We are doing ourselves a misjustice really. We provide the skills that ultimately solve the problem. The technology simple provides the ability to bring those skills to bear.

Traffic Manager Profile – Automating

Traffic Manager Profile – Automating

If you have been following my series, (part1 & part2) on Traffic Manager profiles you should understand how to create a setup that can select between a number of App Service endpoints based on a Routing Method. You’ll also be able to create custom domains, so your customers don’t have to use *.trafficmanager.net and *.azurewebsites.net based addresses.

Up to now, all of these changes have been achieved manually through the Azure portal. Whilst this acceptable for proof of concepts and small scale deployments,  doing everything by hand very quickly becomes error prone and time consuming. When I need to provision a number of environments with near identical setups I prefer to automate via Azure Resource Manager (ARM) templates.

I have written about ARM template previously so I am not going to cover old ground. Instead I will cover one of the problems I bumped up against when trying to automate the setup described in the diagram below.

part3.1

Here I have three resource groups.

  1. One for the primary region that contains a web site and the API supporting it
  2. One for the secondary region which is a copy of the first, providing basic business continuity in the case of failure
  3. One containing traffic manager profiles for both the web site and the API.

Before I go on, there is one important thing to highlight about ARM templates. You cannot have one template that effects multiple resource groups. Therefore I had to have at least three template deployments to achieve this automation. Full disclosure: I actually achieved this with two templates. One that had two different parameters sets for the web/api sites in UK West and UK South and one for the traffic manger profiles, making three deployments.

Creating a template to deploy the web site and API turned out to be quite simple.  I could use different parameter files against the same templates to give me two of the three resource groups I needed. This created the website, API and a custom domain for each.

The difficultly came when creating the traffic manager profiles. Whilst setting up the profile rules and the endpoints was easy enough there was no clear way to provide a custom domain so customers don’t need to use the *.trafficmanager.net address. Why?…

The custom domain must be added to the app services, not the traffic manager profile or the endpoints. As these app services live in different resource groups I could not affect them from the traffic manager profile deployment.

So why not add this step to the template that provisions the App Services? I did try this but still hit problems. If I tried to provision the app service first this didn’t work because the CNAME for custom domain I was provisioning was configured at my DNS registrar to point at the traffic manager profile address and that didn’t exist yet. As part of provisioning the custom domain Azure couldn’t verify that it was valid.

Reversing the provisioning order did not help either. It is not possible to provision endpoints to a traffic manager profile if the app services they point at do not exist yet.

I am not the sort of person to give up easily so I wanted to see if there was an resource group topology that could be provisioned automatically. So this time I tried the configuration below.

part3.2

The main change is that the traffic manager now exists in the same resource group as the app services it is managing, so this should just work…

Unfortunately this was not the case.  This time something more subtle was causing problems.

Remember last time that I pointed out that when you create the traffic manager profile in the Azure Portal a *.trafficmanager.net custom domain is added to each underlying app service. And in order to access the traffic managed site through a friendly domain name you also need to add a custom binding to each app service.

In my template, I was provisioning the traffic manager profile and it endpoints once the app services were created. Once the endpoints were available I’d then add the custom domain to each one. This final step failed because when provisioning the traffic manager profile endpoints, added the *.trafficmanager.net domain to the app services is done asynchronously. Adding my custom domain whilst this was happening caused a conflict.

This stack overflow question covers a very similar issue. I tried the recommendation of using the dependsOn element to change the order the resources were provisioned. The best I could achieve was a template that would fail on the first attempt but then work on subsequent runs. Not great, but least it failed reliably, and I could get a working environment eventually.

I have not been able to get any further than this. I can live with this for now but this is something I’ll keep an eye on and update this post if I find a resolution.