I ran into an issue recently with one of my customers, where the need for Security conflicted with the need for Backups.
With some help from others that had similar issues, along with the awesome Azure Backup team (they are very responsive and supportive), I was able to successfully overcome the challenge.
I thought I would write a quick article to share this with everyone else, in case you run into the issue as well.
First, allow me to explain the scenario.
This customer has an Azure IaaS Virtual Machine. I have also on-boarded the VM into Operations Management Suite (OMS), aka Log Analytics, for monitoring. We also enabled the Standard Tier of Azure Security Center (ASC).
Recently, I had noticed that there are more frequent RDP Brute Force Attacks against the VM. Thankfully they are all “failed” attacks, else this would be a totally different type of post.
So, as any good IT person would do, I took steps to protect the environment. In this case, I enabled the Azure Security Center (ASC) feature called Just-In-Time VM access, or JIT for short.
If you are not familiar with JIT VM Access, simply put, it blocks all common RDP ports at the networking layer. This means that if you do need RDP access into a VM, you will need to actually request access to the VM first, and the ports will be opened for a specific period of time (by default it’s 3 hours; though you can configure this to be as low as 1 hour and as high as 24 hours).
Great! So now our Azure VMs are protected. What’s the issue?
The issue that I encountered is with the automated backups I had configured to occur. These were VM-level backups, not anything with the Direct Agent (i.e. no File/Folder level backups).
While having Just-In-Time (JIT) VM Access configured, the backup jobs would fail with the following error.
Error Code: GuestAgentSnapshotTaskStatusError
Error Message: Could not communicate with the VM agent for snapshot status.
Recommended Action: Please retry the backup operation. If the issue repeats, follow the instructions at http://go.microsoft.com/fwlink/?LinkId=800034. If it fails further, please contact Microsoft support.
Notice that the backups are failing because the Azure VM Backup Agent cannot trigger the snapshot.
So, what do we do? Do we disable the security to ensure our backups will continue? Or do we forego our backups in order to implement security protection?
Thankfully we can have our Security cake, and eat our Just-In-Time VM Access too!
At first, when you start to troubleshoot the issue, you will probably end up at the Azure Backup troubleshooting page. The documentation states:
To manage the VM snapshots, the backup extension needs connectivity to the Azure public IP addresses. Without the right internet connectivity, the virtual machine’s HTTP requests time out and the backup operation fails.
Their solution is to Whitelist the Azure datacenter IP ranges, but that can be difficult. After all, if you look at the XML file that has the IP ranges list, it’s quite long. Not to mention having to modify your firewall rules if the IP ranges change.
A Better Solution
But, there is a better solution.
Recently, Microsoft has started to provide something called Service Tags. Through this provision, it simplifies the process of listing IP Addresses for security use. The good news is that Microsoft manages the address prefixes encompassed by the service tag, and automatically updates the service tag as addresses change.
So, we can use a Service Tag like “Storage” and then sit back and relax, knowing that our resources will be able to properly and securely access the underlying Azure Storage infrastructure as needed.
Even better is the fact that the Service Tags have further segregation by Region. So you don’t have to open up access to all of the Azure Storage infrastructure if you’re only using the Canada Central region. That’s even better. More granular security.
After adding the Network Security Group (NSG) rule targeting the Service Tag for Canada Central Storage, I re-tested the backup functionality, and …
So, we don’t have to compromise our security or loosen/lessen our control, and can even use the more enhanced/advanced features like Just-In-Time (JIT) VM Access; all the while we can still ensure our Business Continuity and Disaster Recovery (BCDR) needs are achieved through Azure Backup.
Side Note: You may be thinking or wondering why we needed to open access in the Network Security Group (NSG) to Azure Storage, while there is no reference to Azure Backup.
According to the Azure Backup Team, during the snapshot phase, the Azure VM Agent (via the VM Extension) initiates a call to a [Name].blob.core.windows.net URL. This means that using the Azure Storage Service Tag is sufficient since access to Azure Blob storage is all that is needed for the snapshot process to work.