Azure governance describes the guard rails and degrees of freedom for working in Azure. On the one hand, requirements are defined, such as how resources are to be deployed or how authorizations are assigned in order to set general requirements for all employees. On the other hand, governance should allow as much freedom and flexibility as possible for users to work with Azure.
Microsoft defines the following 5 disciplines for governance: Cost Management, Security Baseline, Resource Consistency, Identity Baseline and Deployment Acceleration.
Five Disciplines becomes Nine Cluster
In my opinion, the thematic blocks are important for governance. However, I find them not concrete enough and they are difficult to structure for workshops and documentation. Therefore, I divide the 5 areas into the following 9 topic clusters.
Each of the 9 topic clusters is discussed in its own workshop or as a separate topic in a workshop. For each cluster, we have a catalog of questions with key points that have a major impact on governance and that should be considered.
Network is an important topic from two points of view. For one, network resources cannot be changed, i.e. if the network sizing is wrong, the resources must be recreated maybe. On the other hand, the network is the basis of everything, since most services are based on it. For example, the following questions might be important:
- Do we need Hub&Spoke? How many spokes and in what size do you need?
- How can spokes communicate with each other? How do external employees access it?
- What are VPN connections used for? Do we need an ExpressRoute?
The next important topic is technical security. This is not about the technical security of the Azure data centers, but about the exchange with the customer’s security experts. Many customer divides their networks into zones and secures them with firewalls. In Azure there are completely different options, e.g. microsegmentation and a firewall is not always necessary, maybe a network security group is sufficient. These issues must be discussed and a uniform picture created. Because nothing is worse than when the first workload is deployed unprotected in Azure, maybe even attacked and the security experts have been bypassed. For example, the following questions might be important:
- Do the customer have to bring his own key? Which additional security features should be activated?
- For which areas and with which features should the Azure Security Center be activated?
- Do further hardenings have to be implemented?
After the first workloads have been generated, the first costs are also incurred. Depending on the type of contract and the structure of the company, billing can be a difficult topic. For example, provisions may need to be made because the invoice only comes every 3 months. You may even need to determine and allocate costs for individual resources. For example, the following questions might be important:
- How does the billing process work with Microsoft? How does the internal billing work?
- Are budgets and warnings needed?
- How is a regular review of the costs for futher optimization?
Even if the first workload in Azure does not usually contain any critical data, you should speak to the data protection officer at an early stage. On the one hand, educational work is often necessary here, e.g. that the data are assigned to certain regions and remain there, on the other hand, some basic decisions should be queried. For example, the following questions might be important:
- Which regions are allowed and when?
- Are there different levels of confidentiality and what is the respective protection for them?
Billing and contract aspects also have an impact on the structure of the Azure account. Enterprise Agreements have several hierarchical levels (department accounts, accounts) that should be used as sparingly as possible. The invoices are delivered at subscription level. The authorization and monitoring of policies is done with the help of Managemenet Groups. Here a balance has to be found from all influencing factors in order to structure the account in a meaningful way. For example, the following questions might be important:
- How many accounts are required and for whom?
- How many subscriptions are required for which purpose?
- How should management groups be structured and used?
Organizational security means all authorization and authentication aspects of Azure. Azure has a large number of standard roles and allows many configurations, whereby tasks can be delegated to the users or also restricted. For example, the following questions might be important:
- What is the access concept for the subscriptions and resources?
- Who are the Global Admins and how is PIM configured?
- Who can create service principals?
At the beginning, the processes are certainly carried out manually and implemented in the form of projects. Later, however, as much as possible should be automated, for example orders, approvals or the counting of licenses. For example, the following questions might be important:
- How are activated hybrid benefits licenses (e.g. Microsoft SQL Server or SLES) counted?
- How can Reserved Instances be ordered?
- Who orders the change of tags and when should this be done?
Many of the points from the previous clusters are then defined in policies and distributed via blueprints or management groups. Policies are a powerful tool for checking or enforcing rules. For example, the following questions might be important:
- Which policies are defined as audits and which as deny?
- Which mandatory tags must be set?
- Who will review the rule violations?
Many customers want recommendations for Azure for their employees. The recommendations differ depending on how employees are allowed to use Azure (e.g. provision resources by themselves). For example, the following questions might be important:
- What should be considered when using VMs (sizing, automation)?
- Which Azure services should be considered for which use cases?
Start with Minimal Viable Product
Not all of the topics need to be discussed at the beginning. The numbers next to the clusters show the recommended priority with which the topics should be processed. At least the network and technical security should be discussed before the first workload is actively set up. In parallel to the technical security, you can also speak to the data protection officer, even if the first workload is usually just a PoC. The billing issue must also be clarified very quickly afterwards, because after the first workload it only takes 4 to 12 weeks to receive the first invoice. All other issues can be clarified later.
I also recommend clarifying only as many details as necessary. If the first workloads are not internet facing systems, the input channel from the internet does not have to be finally clarified. The MVP should consist of network and security fundamentals and should not avoid future extensions, but not yet define them. With every new workload, governance is allowed to grow and develop.
There are many different forms of governance. In my opinion, the quality is only good and sufficient if the aspects have been clarified with sufficient maturity. If a topic is not relevant, it should still be briefly mentioned and the reasons why it does not need to be considered further. If a topic is relevant, it should be worked out in such depth that it can be implemented directly. Often there are concepts that describe that something should be done, but not how it should be done. For example, “we use management groups to structure the subscriptions”, but it does not define how many management groups there will be, what they are called, what they define etc.