Terraform Modules

Terraform modules are a powerful and extremely useful concept, here I will discuss some of the ways in which I've been using modules and the impact they can bring.

TLDR;

If used correctly the terraform module can be a powerful tool for dev menu to get rapidly up and running with infrastructure.

Pros:

  • Module encapsulates infrastructure, abstracting away implementation details.
  • Can reduce repetition and increase consistent results.

Cons:

  • Used in this way could restrict available infra to menu.

Modules antipattern

When you first start getting into Terraform it's quite easy to treat the language like a script, you could create a bit main.tf and just shove everything you need in the one file. As you start adding resources though this can quickly become difficult to keep track of. In other terms the main.tf becomes that "God Class" that is responsible for everything in a system.

This is how my introduction to terraform started (see my earlier posts), then the decision was taken to move the large monolithic code into separate terraform modules centred around the service boundary.

This is a bit of an anti-pattern, whilst you do get some degree of separation of concerns across the various services, there can still exist inconsistencies between the various services and their infrastructure. This usually happens when the knowledge of "how and why" certain infrastructure configurations isn't readily made available (or when there is a lack of testing for these things!)

The other issue with this pattern can be seen when we want to add a "global" feature to multiple resources. If your security team are seeking to integrate a 3rd party analysis tool to existing infrastructure then depending on how much stuff you have, this can become a large piece of unplanned work. It's like having to go back through a code base where you've hard coded values across several pieces of a system and then expecting to get them all and get consistent results.

Things like this have a negative impact on development lifecycle, increasing the amount of work needed to be done and therefore soaking up available time to work on features.

A better way

The power of the module lies in its ability to encapsulate a piece of infrastructure, whilst abstracting away implementation details away from a consumer. When dev menu that have to "build it, run it" then it also helps to reduce that cognitive load.

In the following example I'll use Azure resources, but the pattern is the same across any cloud provider.

Example Service

Imagine that your team has been asked to create a service Pantry Service, the service needs to return available food items in the pantry, the service will need an app service for the API, an SQL database to store the food item data as well as a key vault to store secrets (and stop bad actors finding out how much chocolate is available!), they will also need to attach the new API to an existing API Management instance.

A simple main.tf could look something like this:

resource "azurerm_resource_group" "pantry" { 
    name = "pantry-service" 
    location = "UK West" 
} 

resource "azurerm_app_service_plan" "pantry" { 
  name                = "pantry-appserviceplan" 
  location            = azurerm_resource_group.pantry.location 
  resource_group_name = azurerm_resource_group.pantry.name 

  sku { 
    tier = "Standard" 
    size = "S1" 
  } 
} 

resource "azurerm_app_service" "pantry" { 
  name                = "pantry-app-service" 
  location            = azurerm_resource_group.pantry.location 
  resource_group_name = azurerm_resource_group.pantry.name 
  app_service_plan_id = azurerm_app_service_plan.pantry.id 

  app_settings = { 
    "SECRET_KEY" = "@Microsoft.KeyVault(VaultName=${azurerm_key_vault.pantry.name};SecretName=mysecret)" 
  }

  connection_string { 
    name  = "PantryDatabase" 
    type  = "SQLServer" 
    value = "Server=some-server.mydomain.com;Integrated Security=SSPI" 
  } 
} 

data "azurerm_client_config" "current" {} 

resource "azurerm_key_vault" "pantry" { 
  name                        = "pantrykeyvault" 
  location                    = azurerm_resource_group.pantry.location 
  resource_group_name         = azurerm_resource_group.pantry.name 
  enabled_for_disk_encryption = true 
  tenant_id                   = data.azurerm_client_config.current.tenant_id 
  soft_delete_retention_days  = 7 
  purge_protection_enabled    = false 

  sku_name = "standard" 

  access_policy { 
    tenant_id = data.azurerm_client_config.current.tenant_id 
    object_id = data.azurerm_client_config.current.object_id 

    key_permissions = [ 
      "Get", 
    ] 

    secret_permissions = [ 
      "Get", 
    ] 

    storage_permissions = [ 
      "Get", 
    ] 
  } 
}

resource "random_password" "password" { 
  length           = 16 
  special          = true 
  override_special = "_%@" 
} 

resource "azurerm_sql_server" "pantry" { 
  name                         = "pantrysqlserver" 
  resource_group_name          = azurerm_resource_group.pantry.name 
  location                     = azurerm_resource_group.pantry.location 
  version                      = "12.0" 
  administrator_login          = "mydomainadmin" 
  administrator_login_password = random_password.password.result 

  tags = { 
    environment = "production" 
  }
}

resource "azurerm_storage_account" "pantry" { 
  name                     = "pantrydbsa" 
  resource_group_name      = azurerm_resource_group.pantry.name 
  location                 = azurerm_resource_group.pantry.location 
  account_tier             = "Standard" 
  account_replication_type = "LRS" 
} 

resource "azurerm_sql_database" "pantry" { 
  name                = "PantryDatabase" 
  resource_group_name = azurerm_resource_group.pantry.name 
  location            = "West US" 
  server_name         = azurerm_sql_server.pantry.name 

  extended_auditing_policy { 
    storage_endpoint                        = azurerm_storage_account.pantry.primary_blob_endpoint 
    storage_account_access_key              = azurerm_storage_account.pantry.primary_access_key 
    storage_account_access_key_is_secondary = true 
    retention_in_days                       = 6 
  } 

  tags = { 
    environment = "production" 
  }
}

resource "azurerm_api_management_api" "pantry" { 
  name                = "pantry-api" 
  resource_group_name = azurerm_resource_group.pantry.name 
  api_management_name = "example-apimanagement" 
  revision            = "1" 
  display_name        = "pantry API" 
  path                = "pantry" 
  protocols           = ["https"] 
} 

NOTE: The above is purely an example and hasn't been tested in an Environment, it's for demonstration purposes only!

The above will give the team the needed infrastructure to get up and running, some lines of code and a couple pipeline runs later and we'll have the service up and running.

What if next the team creates a menu service for creating meals based off of items in the pantry, it becomes easy to copy and paste the above and simply change the name, maybe following that the "business" needs a way to order menu items in the kitchen and that those orders in turn need deduct or keep track of pantry items. Again, maybe some elements of the above are copy a paste.

If the team has elected to go down the single main.tf file route then that's a lot of lines to read to simply find out what's going on. Maybe the team elect to make a module per service something like the below:

Module Per Service Folder Structure

At first this can be a way to rapidly get something in place, but it comes with some points worth discussing:

  • Increasing the number of services can become a maintenance nightmare.
  • Retrospectively changing resources, i.e. Key Vault to have the same feature throughout the estate becomes difficult (same change in multiple places).
  • If a different product team take ownership of a service, then knowledge and "standards" can drift, hell this can even happen inter-team if the person writing the terraform that day simple copies stuff without truly understanding it!

These are some of the main problems I've seen with this approach. Blind copy and paste, along with no clear knowledge or examples on why or how things need to be setup in a certain way results in inconsistent infrastructure, possibly even insecure infrastructure and then trying to apply updates to resources when you have 15-20 of them across a handful of services is really difficult and takes some planning (see my above point on impact of unplanned work on feature development).

The answer, as far as I can tell, points to modules.

Getting better with Modules

Terraform and the HCL is designed to be modular by default, the mere fact that you can plug into and control / design systems from many, many cloud providers makes that clear, modularity is important. I even stumbled across the module testing experiment recently that hints that modules are a key part of the Terraform design.

I started to look at the modularity part in finer detail when I asked myself the question, "what is the bare minimum I need to recreate and deploy a service?".

I had a goal in mind, given a limited set of inputs what was the maximum I could make Terraform achieve? For the purpose of the exploration I decided that a service had to consist of an API endpoint that sat behind API Management and stored data in a SQL database along with an app insights instance and a key vault.

I wanted to get something that meant I could easily and rapidly deploy the next service, and know to a high degree of certainty the outcome, i.e. API in API Management, secure backend that was linked to API Management via Managed Identity and IP restrictions and an SQL server and database ready for use, with the connection strings and password all being stored as secrets in a Key Vault. The secondary goal was to do this in Terraform in such a way that as the developer I would never have to know a password or connection string.

The answer I found, was modules: (NOTE: I've changed the names of the service from the original to tie in with the hypothetical example from before)

module "pantry_service" { 
    source = "./modules/app_service_api_example" 
    capability_name = "pantry" 
    location = local.location 
    location_prefix = local.location_prefix 
    terraform_azure_client_id = var.azure-clientId 
    api_ip_restrictions = module.api_management.public_ip_addresses 
    apim_name = "exemplar-apim" 
    api_display_name = "pantry API" 
    api_path = "pantry" 
    apim_resource_group_name = azurerm_resource_group.apim_resource_group.name 
    product_display_name = "Example API Product" 
    openapi_spec_json = templatefile("./open_api_spec/example_spec.json.tmpl",  {url_and_path = "https://API_M_HOST/pantry"}) 
    depends_on = [ module.api_management] 
} 

I'll quickly explain some of the above and hopefully shed some light on what is going on here.

Firstly module is a key word, the part in the string "pantry_service" is simply the terraform resource name. The interesting stuff is in the braces.

The first line source = "./modules/app_service_api_example" simply tells terraform where this module is defined, the preceding 12 lines are inputs to that module definition. At this point there isn't really anything particularly interesting, it's worth mentioning that this is an extract from the main.tf file in the root of the project.

The interesting bit happens within the module located at the source directory:

resource "azurerm_resource_group" "capability_resource_group" { 
  name     = "rg-${var.capability_name}-01" 
  location = var.location 
} 

resource "azuread_application" "basic_ad_app_registration" { 
  display_name                       = "${var.location_prefix}-app-reg-${var.capability_name}-01" 
  available_to_other_tenants = false 
  oauth2_allow_implicit_flow = false 
} 

locals {
  sql_connection_name = "sql-connection" 
  azure_ad_app_reg_id = azuread_application.basic_ad_app_registration.application_id 
} 

module "key_vault" { 
  source = "../common/keyvault" 
  name   = "${var.location_prefix}-kv-${var.capability_name}-01" 
  resource_group_location = azurerm_resource_group.capability_resource_group.location 
  resource_group_name = azurerm_resource_group.capability_resource_group.name 
  azure_tenant_id = var.azure_tenant_id 
  terraform_azure_client_id = var.terraform_azure_client_id 
  alert_logic_id = var.alert_logic_id 
  network_acls_ip_rules = formatlist("%s/32",concat("${file("./ipaddress.txt")}",var.kv_ip_restriction, module.app_service.possible_output_ip_addresses)) 

  secrets = { 
    sql_connection = { 
      name = local.sql_connection_name 
      value = module.sql_database.sql_connection_string 
    }, 

    apim_user = { 
      name = "apim-user" 
      value = module.apim_api_instance.apim_user_password 
    } 
  } 

  policies = { 
    app_service = { 
      tenant_id           = var.azure_tenant_id 
      object_id           = module.app_service.app_service_identity_principal_id 
      key_permissions     = [ "Get", "List" ] 
      secret_permissions  = [ "Get", "List" ] 
      storage_permissions = [ "Get", "List" ] 
      certificate_permissions = []     
    } 
  } 
}

module "sql_database" { 
  source = "../common/sql_database" 
  capability_name = var.capability_name 
  location_prefix = var.location_prefix 
  resource_group_location = azurerm_resource_group.capability_resource_group.location 
  resource_group_name = azurerm_resource_group.capability_resource_group.name 
  sql_admin_account = "sqladmin01" 
} 

module "app_service" { 
  source = "../common/app_service" 
  capability_name = var.capability_name 
  location_prefix = var.location_prefix 
  resource_group_location = azurerm_resource_group.capability_resource_group.location 
  resource_group_name = azurerm_resource_group.capability_resource_group.name 
  sql_connection_string_name = local.sql_connection_name 
  api_ip_restrictions = var.api_ip_restrictions 
  app_insights_key = module.app_insights.app_insights_key 
  key_vault_name = "fa-${var.location_prefix}-kv-${var.capability_name}-01" 
  azure_ad_application_id = local.azure_ad_app_reg_id 
  azure_tenant_id = var.azure_tenant_id 
}

module "app_insights" { 
  source = "../common/app_insights" 
  capability_name = var.capability_name 
  location_prefix = var.location_prefix 
  app_insights_location = "North Europe" 
  resource_group_name = azurerm_resource_group.capability_resource_group.name 
} 

module "apim_api_instance" { 
  source = "../common/api_management_api" 
  apim_name = var.apim_name 
  capability_name = var.capability_name 
  apim_resource_group_name = var.apim_resource_group_name 
  display_name = var.api_display_name 
  path = var.api_path 
  azure_ad_application_id = local.azure_ad_app_reg_id 
  capability_ai_key = module.app_insights.app_insights_key 
  product_display_name = var.product_display_name 
  openapi_spec_json = var.openapi_spec_json 
  backend_url = "${module.app_service.app_serivce_url}/${var.api_path}" 

  depends_on = [ 
    module.app_service 
  ] 
} 

Now there is quite a lot of lines of code here and in terms of the folder structure it looks like the below

Folder structure of modules when used as combinations

So let's try and analyse the above.

The short of it is that I've got a set of modules I've put into a common folder, in this folder I've split out each individual resource, or combination of resources required to form the overall resource.

I'll use the module sql_database as an example, this is the module instance here:

module "sql_database" { 
  source = "../common/sql_database" 
  capability_name = var.capability_name 
  location_prefix = var.location_prefix 
  resource_group_location = azurerm_resource_group.capability_resource_group.location 
  resource_group_name = azurerm_resource_group.capability_resource_group.name 
  sql_admin_account = "sqladmin01" 
} 

Now hopefully you can see that this is similar in structure to the first module instance we looked at before.

I've got the keyword, the module instance name sql_database and then inside the braces we have the source directory followed by 5 lines of configuration; the capability name, location prefix, resource group location, resource group name and then a sql admin name.

These are the inputs to the module that are required in order to execute the module instance. Let's move onto that module definition next.

resource "random_password" "password" { 
  length           = 16 
  special          = true 
  upper            = true 
  override_special = "_%@" 
} 

resource "azurerm_storage_account" "sql_storage_account" { 
  name                     = "sqlserverstrg${var.capability_name}" 
  resource_group_name      = var.resource_group_name 
  location                 = var.resource_group_location 
  account_tier             = "Standard" 
  account_replication_type = "LRS" 
} 

resource "azurerm_mssql_server" "basic_ms_sql_server" { 
  name                         = "fa-${var.location_prefix}-${var.capability_name}-sql" 
  resource_group_name          = var.resource_group_name 
  location                     = var.resource_group_location 
  version                      = "12.0" 
  administrator_login          = var.sql_admin_account 
  administrator_login_password = random_password.password.result 

  tags = { 
    environment = "dev" 
  } 
}

resource "azurerm_mssql_database" "basic_sql_database" { 
  name      = "${var.capability_name}-db" 
  server_id = azurerm_mssql_server.basic_ms_sql_server.id 
} 

# SQL Firewall Rule 
resource "azurerm_mssql_firewall_rule" "base_sqlserver_allow_all_azure_ips" { 
  name                = "AllowAllAzureIps" 
  server_id           = azurerm_mssql_server.basic_ms_sql_server.id  
  start_ip_address    = "0.0.0.0" 
  end_ip_address      = "0.0.0.0" 
} 

resource "azurerm_mssql_database_extended_auditing_policy" "basic_sql_auditing_policy" { 
  database_id                             = azurerm_mssql_database.basic_sql_database.id 
  storage_endpoint                        = azurerm_storage_account.sql_storage_account.primary_blob_endpoint 
  storage_account_access_key              = azurerm_storage_account.sql_storage_account.primary_access_key 
  storage_account_access_key_is_secondary = false 
  retention_in_days                       = 6 
} 

In the above I've got 6 separate resources, that together build out the SQL Server and associated database, at the top is a bit of code to generate a password for the SQL database. This is proceeded by a storage account, server, database etc. The details aren't really the thing I want to focus on, but I'd like to draw attention to the lines resource_group_name = var.resource_group_name and location = var.resource_group_location here I have var.resource_group_name, which hopefully you can see is the same as the module instance we looked at before.

Modules work by taking inputs called variables in terraform for this particular module this looks like this:

variable "capability_name" { 
  type = string 
  description = "The name of the capability this resource belongs to" 
}

variable "resource_group_name" { 
  type        = string 
  description = "Resource group name for the capability the SQL server belongs to" 
}

variable "resource_group_location" { 
  type        = string 
  description = "Resource group location for the capability the SQL server belongs to" 
} 

variable "location_prefix" { 
  type = string
  description = "The location prefix used on resources within this capability" 
} 

variable "sql_admin_account" { 
  type = string 
  description = "The admin account name for the provisioned SQL server" 
} 

In the above we use the variable keyword then the variable name along with the type and a description, we could also include default values if that was something we wanted (in those instances variables aren't required, but if provided override the default)

The above is quite simple, only taking strings but we can even pass objects and lists as variables.

The other part to mention with modules is the concept of outputs for the sql database the thing of interest here is the connection string and that is defined in an outputs.tf file like so:

output "sql_connection_string" { 
    value = "Server=tcp:${azurerm_mssql_server.basic_ms_sql_server.fully_qualified_domain_name},1433;Initial Catalog=${azurerm_mssql_database.basic_sql_database.name};Persist Security Info=False;User ID=${azurerm_mssql_server.basic_ms_sql_server.administrator_login};Password=${azurerm_mssql_server.basic_ms_sql_server.administrator_login_password};MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;" 
    sensitive = true 
    description = "The SQL database connection string" 
} 

Are you with me so far?

Lets go though what we have looked at so far, I promise there is a point to this!

  • A module called app_service_api_example whose main file contains many instances of modules under the common folder
  • Each "combined resource" has a definition in the common folder, in the above we looked at the sql_database module
  • Within the sql database, we can see that there is a small set of resources used to build the database, this relied on a small set of inputs and gave a single output.

The above is a layering of modules. The module I've called app_service_api_example simply combines the common modules that I've created to spin up a service. It's extremely simplistic and has many assumptions, like each API needs an SQL database etc.

Okay, so far so good? Cool.

So, let's bring this back to the main points, the antipattern of copy and pasting stuff over and over meant that there was room for drift between services, harder work to retrospectively alter requirements or configurations to resources and ultimately increased the cognitive load on a developer.

The modules I've demonstrated that are layered, present something I think is wonderful.

module "menu_service" { 
    source = "./modules/app_service_api_example" 
    capability_name = "menu" 
    location = local.location 
    location_prefix = local.location_prefix 
    terraform_azure_client_id = var.azure-clientId 
    api_ip_restrictions = module.api_management.public_ip_addresses 
    apim_name = "exemplar-apim" 
    api_display_name = "menu API" 
    api_path = "menu" 
    apim_resource_group_name = azurerm_resource_group.apim_resource_group.name 
    product_display_name = "menu API Product" 
    openapi_spec_json = templatefile("./open_api_spec/menu_spec.json.tmpl",  {url_and_path = "API_M_HOST/menu"}) 
    depends_on = [ module.api_management] 
} 

Within 15 lines of code, I can easily spin up another service, this time I've called it menu service, so this means I can rapidly deploy a new service, however in contrast to the copy and paste modules antipattern the above example gives me a service that is identical in ever aspect but name as the pantry_service.

At it's heart this is simply good programming.

The "layered modules" means that I can abstract away the implementation details away from a consumer, present them with a contract/interface reducing that cognitive load on that team, they know that they get all of these things out of the box for very little input. It also means that the team can get good security, backups, disaster recovery etc out of the box, if the modules used to build up a service level module where designed in that way.

In this approach we could also readily introduce updates to a single resource and have them rolled out across the estate, for example I could amend the SQL server to be a smaller size and then allow for the consumer to have control over the sizing with a variable.

Conclusion

I think that I've covered a lot, but probably not into a great detail, so I think I will be back to explain in further detail how some of this works and even explore extensions to this approach.

The above should however demonstrate that there is a powerful tool available in modules that can be used to create more consistent infrastructure whilst simultaneously removing cognitive load from developers and teams.

I hope that this post also showed how antipatterns such as modules per service can be fraught with sticking points that make one of the key reasons to use an IaC difficult. If you enjoyed this or if you even want a better explanation then please reach out.