Edit: After running into a few issues, I have updated the Terraform template. It not only fixed some issues, but it also now deploys Active Directory into the DC01 VM. You still need to perform the post AD installation steps into the VM.
The other day I posted a blog on how to deploy an AKS cluster that is ready for Windows workloads using Terraform. Today, I wanted to expand that to include gMSA, which is a highly requested feature from Windows customers running containers on AKS. Obviously, the complexity of the Terraform template grows a lot, so this blog post will provide the details on what is needed for that to work.
gMSA requirements and items outside of Terraform scope
Before diving into the Terraform template, it’s important to review the gMSA pre-requisites and what is not part of the scope of Terraform when deploying the Azure resources:
A few notes on the Terraform template:
Since this is a more complex Terraform template, I invite you to collaborate on it and if you see an opportunity for improvement, please send your suggestions!
gMSA on AKS Terraform template
The Terraform deployment has two files. The main.tf file contains the resources to be deployed. The variables.tf file contains the variables used during the deployment. Note that some of the variables’ values are not set in the file, both because you need to define it for the deployment and because some are sensitive, such as passwords.
Here is the main.tf file:
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "=3.55.0"
}
}
}
data "azurerm_client_config" "current" {}
data "azurerm_subscription" "current" {}
provider "azurerm" {
features {
key_vault {
purge_soft_delete_on_destroy = true
recover_soft_deleted_key_vaults = false
}
}
}
#Creates Azure Resource Group
resource "azurerm_resource_group" "rg" {
name = var.resource_group
location = var.location
}
#Creates Azure User Assigned Managed Identity
resource "azurerm_user_assigned_identity" "managed_identity" {
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
name = "gmsami"
}
#Creates Azure Key Vault
resource "azurerm_key_vault" "akv" {
name = "gmsatestviniap"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
tenant_id = data.azurerm_client_config.current.tenant_id
soft_delete_retention_days = 90
purge_protection_enabled = false
sku_name = "standard"
}
#Assign reader role to MI on Azure Key Vault
resource "azurerm_role_assignment" "mi_akv_reader" {
scope = azurerm_key_vault.akv.id
role_definition_name = "Reader"
principal_id = azurerm_user_assigned_identity.managed_identity.principal_id
}
#Define AKV access policy for MI
resource "azurerm_key_vault_access_policy" "akvpolicy" {
key_vault_id = azurerm_key_vault.akv.id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = azurerm_user_assigned_identity.managed_identity.principal_id
secret_permissions = [
"Get"
]
}
#Assign reader role to Terraform session on Azure Key Vault
resource "azurerm_role_assignment" "tf_akv_reader" {
scope = azurerm_key_vault.akv.id
role_definition_name = "Reader"
principal_id = data.azurerm_client_config.current.client_id
}
#Define AKV access for terraform session
resource "azurerm_key_vault_access_policy" "tfpolicy" {
key_vault_id = azurerm_key_vault.akv.id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = data.azurerm_client_config.current.object_id
secret_permissions = [
"Get",
"List",
"Set"
]
}
#Creates the secret on Azure Key Vault (careful: this is the standard user on your AD)
resource "azurerm_key_vault_secret" "gmsa_secret" {
name = "gmsasecret"
value = "${var.netbios_name}\\${var.gmsa_username}:${var.gmsa_userpassword}"
key_vault_id = azurerm_key_vault.akv.id
}
#Creates Azure Virtual Network
resource "azurerm_virtual_network" "vnet" {
name = "gmsavnet"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
address_space = ["10.0.0.0/16","10.1.0.0/26"]
}
#Creates the gMSA Subnet - both pods and Domain Controller will use this subnet
resource "azurerm_subnet" "gmsasubnet" {
name = "gmsasubnet"
resource_group_name = azurerm_resource_group.rg.name
virtual_network_name = azurerm_virtual_network.vnet.name
address_prefixes = ["10.0.0.0/16"]
}
#Optional: Creates the Azure Bastion vNEt for RDP into DC01
resource "azurerm_subnet" "AzureBastionSubnet" {
name = "AzureBastionSubnet"
resource_group_name = azurerm_resource_group.rg.name
virtual_network_name = azurerm_virtual_network.vnet.name
address_prefixes = ["10.1.0.0/26"]
}
#Creates a vNIC for the DC VM - remove this if you have an existin DC
resource "azurerm_network_interface" "dc01_nic" {
name = "dc01_nic"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
ip_configuration {
name = "dc01_nic"
subnet_id = azurerm_subnet.gmsasubnet.id
private_ip_address_allocation = "Dynamic"
}
}
#Creates the DC VM - remove this if you have an existing VM
#You need to connect to this VM and finish the Active Directory configuration
resource "azurerm_windows_virtual_machine" "dc01" {
name = "DC01"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
size = "Standard_D4s_v3"
admin_username = var.win_username
admin_password = var.win_userpass
network_interface_ids = [
azurerm_network_interface.dc01_nic.id
]
os_disk {
caching = "ReadWrite"
storage_account_type = "Standard_LRS"
}
source_image_reference {
publisher = "MicrosoftWindowsServer"
offer = "WindowsServer"
sku = "2022-Datacenter"
version = "latest"
}
}
#Install Active Directory on the DC01 VM
resource "azurerm_virtual_machine_extension" "install_ad" {
name = "install_ad"
# resource_group_name = azurerm_resource_group.main.name
virtual_machine_id = azurerm_windows_virtual_machine.dc01.id
publisher = "Microsoft.Compute"
type = "CustomScriptExtension"
type_handler_version = "1.9"
protected_settings = <<SETTINGS
{
"commandToExecute": "powershell -command \"[System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String('${base64encode(data.template_file.ADDS.rendered)}')) | Out-File -filepath ADDS.ps1\" && powershell -ExecutionPolicy Unrestricted -File ADDS.ps1 -Domain_DNSName ${data.template_file.ADDS.vars.Domain_DNSName} -Domain_NETBIOSName ${data.template_file.ADDS.vars.Domain_NETBIOSName} -SafeModeAdministratorPassword ${data.template_file.ADDS.vars.SafeModeAdministratorPassword}"
}
SETTINGS
}
#Variable input for the ADDS.ps1 script
data "template_file" "ADDS" {
template = "${file("ADDS.ps1")}"
vars = {
Domain_DNSName = "${var.Domain_DNSName}"
Domain_NETBIOSName = "${var.netbios_name}"
SafeModeAdministratorPassword = "${var.SafeModeAdministratorPassword}"
}
}
#Creates AKS cluster with Windows profile and gMSA enabled, and uses existing vNet
#This is dependable on DC01 VM as we need to set up the DNS primary IP for the Windows nodes
resource "azurerm_kubernetes_cluster" "aks" {
name = "ContosoCluster"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = "contosocluster"
default_node_pool {
name = "lin"
node_count = var.node_count_linux
vm_size = "Standard_D2_v2"
vnet_subnet_id = azurerm_subnet.gmsasubnet.id
}
windows_profile {
admin_username = var.win_username
admin_password = var.win_userpass
gmsa {
dns_server = "10.0.0.4"
root_domain = var.Domain_DNSName
}
}
network_profile {
network_plugin = "azure"
service_cidr = "10.240.0.0/16"
dns_service_ip = "10.240.0.10"
}
identity {
type = "SystemAssigned"
}
depends_on = [
azurerm_windows_virtual_machine.dc01
]
}
#Creates Windows node pool on AKS cluster
resource "azurerm_kubernetes_cluster_node_pool" "win" {
name = "wspool"
kubernetes_cluster_id = azurerm_kubernetes_cluster.aks.id
vm_size = "Standard_D4s_v3"
node_count = var.node_count_windows
os_type = "Windows"
depends_on = [
azurerm_virtual_machine_extension.install_ad
]
}
output "kube_config" {
value = azurerm_kubernetes_cluster.aks.kube_config_raw
sensitive = true
}
#Assigns the User assigned Managed Identity to the Windows node pool
resource "null_resource" "identity_assign" {
provisioner "local-exec" {
command = "az vmss identity assign -g MC_${azurerm_resource_group.rg.name}_${azurerm_kubernetes_cluster.aks.name}_${azurerm_resource_group.rg.location} -n aks${azurerm_kubernetes_cluster_node_pool.win.name} --identities /subscriptions/${data.azurerm_subscription.current.subscription_id}/resourcegroups/${azurerm_resource_group.rg.name}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/${azurerm_user_assigned_identity.managed_identity.name}"
}
depends_on = [
azurerm_kubernetes_cluster_node_pool.win
]
}
#Update the VMSS instances
resource "null_resource" "vmss_update" {
provisioner "local-exec" {
command = "az vmss update-instances -g MC_${azurerm_resource_group.rg.name}_${azurerm_kubernetes_cluster.aks.name}_${azurerm_resource_group.rg.location} -n aks${azurerm_kubernetes_cluster_node_pool.win.name} --instance-ids *"
}
depends_on = [
null_resource.identity_assign
]
}
#Optional: Creates a public IP address for the Azure Bastion host
resource "azurerm_public_ip" "bastion_ip" {
name = "bastionip"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
allocation_method = "Static"
sku = "Standard"
}
#Optional: Creates a Bastion Host to connect to the DC VM via RDP
resource "azurerm_bastion_host" "gmsa_dc_bastion" {
name = "gmsabastion"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
ip_configuration {
name = "configuration"
subnet_id = azurerm_subnet.AzureBastionSubnet.id
public_ip_address_id = azurerm_public_ip.bastion_ip.id
}
}
Here is the variables.tf file:
variable "resource_group" {
type = string
description = "Resource group name"
default = "TestgMSARG"
}
variable "location" {
type = string
description = "RG and resources location"
default = "East US"
}
variable "node_count_linux" {
type = number
description = "Linux nodes count"
default = 1
}
variable "node_count_windows" {
type = number
description = "Windows nodes count"
default = 2
}
variable "win_username" {
description = "Windows node username"
type = string
sensitive = false
}
variable "win_userpass" {
description = "Windows node password"
type = string
sensitive = true
}
variable "Domain_DNSName" {
description = "FQDN for the Active Directory forest root domain"
type = string
sensitive = false
}
variable "netbios_name" {
description = "NETBIOS name for the AD domain"
type = string
sensitive = false
}
variable "SafeModeAdministratorPassword" {
description = "Password for AD Safe Mode recovery"
type = string
sensitive = true
}
variable "gmsa_username" {
description = "Username for the standard domain account"
type = string
sensitive = false
}
variable "gmsa_userpassword" {
description = "Password for standard domain account"
type = string
sensitive = true
}
Update: You will also need the ADDS.ps1 file to be executed in the DC01 VM:
[CmdletBinding()]
param
(
[Parameter(ValuefromPipeline=$true,Mandatory=$true)] [string]$Domain_DNSName,
[Parameter(ValuefromPipeline=$true,Mandatory=$true)] [string]$Domain_NETBIOSName,
[Parameter(ValuefromPipeline=$true,Mandatory=$true)] [String]$SafeModeAdministratorPassword
)
$SMAP = ConvertTo-SecureString -AsPlainText $SafeModeAdministratorPassword -Force
Install-windowsfeature -name AD-Domain-Services -IncludeManagementTools
Install-ADDSForest -CreateDnsDelegation:$false -DatabasePath "C:\Windows\NTDS" -DomainMode "WinThreshold" -DomainName $Domain_DNSName -DomainNetbiosName $Domain_NETBIOSName -ForestMode "WinThreshold" -InstallDns:$true -LogPath "C:\Windows\NTDS" -NoRebootOnCompletion:$false -SysvolPath "C:\Windows\SYSVOL" -Force:$true -SkipPreChecks -SafeModeAdministratorPassword $SMAP
With the files in the same folder, you can run:
az login
az account set <subscription ID>
terraform init
terraform apply
I did not include the -auto-approve flag as you probably want to confirm that everything will run as you expected. Once you have the plan for the deployment, type yes and continue with it.
Now, let me go over the details of this template:
We start by creating a Resource group. The information about name and location for the RG are in the variables.tf file.
Next, we create the auxiliary Azure services (Key Vault and user assigned managed identity). You could use the regular identity from the AKS cluster once it’s deployed. I decided to go with a new one for testing and learning purposes. We then assign the managed identity a reader role to Azure Key Vault, and give it the “Get” permission for secrets. This is what will allow the managed identity to read the standard user account to then connect to AD. We then create the secret on Key Vault. Note that we also give the Terraform session itself list and set permissions to the Key Vault, so it can write the value of the standard user account into the secrets of that Key Vault.
Moving on, we create the Azure virtual network, and two subnets. One for the AKS cluster and Domain Controller VM, and another for Azure Bastion. This last one is optional as you might not need it, but I added it just in case.
To create the Domain Controller VM, we create a network interface associated with the gMSA subnet, and then create the Windows VM on Azure with the vNIC associated with it. Here you can change the size and disk of the VM - depending on your environment and cost limitations. The image used here is a Windows Server 2022 image. While that’s the recommended version, this deployment would work with Windows Server 2019. Keep in mind that you need to RDP/connect into this VM to finish the Active Directory configuration – this is outside the scope of this template. Update: The template now deploys ADDS into the VM. You still need to open the VM to finish the AD configuration.
We then finally create the AKS cluster. This is a standard AKS cluster with a simple default node pool with Linux nodes. Note that the subnet associated with it is the gMSA subnet created earlier. We also use a Windows profile for this cluster and already configure gMSA. IMPORTANT: At this moment, you must indicate the gMSA DNS server and FQDN of the AD root domain. If you have an existing DC that is a DNS server, you should pass on the internal IP address of that machine. This is just like adding a primary (and secondary) DNS server on the IP configuration of a Windows instance. However, if you are using this template for deploying your DC, do not change the DNS Server here. Since the DC VM was the first to be created in the subnet, it gets the first available IP address, which in this case is 10.0.0.4, hence the configuration on the template. For that to work, I set the “depends_on” flag on this resource. (In other words, the AKS cluster is created after the DC VM). Next, the Windows node pool is created with standard configurations. Here you can change the number of Windows nodes and the VM size.
The final steps in the template are to assign the managed identity to the Virtual Machine Scale Set (VMSS) of the Windows node pool and then update it. Since the managed identity has access to the Azure Key Vault, and we’re associating the managed identity to the VMSS, all nodes in that VMSS will be able to access the secret and authenticate with AD.
Post installation steps
The template does the heavy lifting of creating the Azure resources for the gMSA to work. As mentioned before, there are additional steps, so let me just go over it once again:
Conclusion
It is possible to deploy a gMSA application on Windows containers on an AKS cluster. Automating this process reduces the chances of errors in the future and allows you to set up a CI/CD pipeline. This blog post covered the Terraform deployment of Azure resources for gMSA on AKS to work. It deploys all the Azure resources and configures it, while some environment specific actions are still needed.
I hope this is helpful. No doubt you’ll need to modify the template to your environment. Luckly, you can leverage the ITOpsTalk repo to do that – and even let us know if you have any feedback it by submitting a PR! Let us know what you think!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.