Month: April 2025 (page 1 of 1)

Create an IaaS Windows VM with a Bicep Template

Today’s post will be the first technical post of this series and will focus on the script you need to write to generate an Infrastructure as a Service (IaaS) virtual machine (VM) in Azure. For my purposes, I’ve created such a Bicep template to create this resource to speed up our process of making IaaS VMs to host SQL Server instances–for our applications that cannot utilize a Platform as a Service (PaaS) version of SQL Server.

Since the post is very detailed and long, please use the table of contents below to jump forward to specific sections if you would like.

What’s in this post

What the template does

In the template I am about to walk through, I first create the Azure IaaS virtual machine resource. I then create the minimum required networking resources–a network interface and a network security group, which sets up access rules for the server. I then create a resource to install the SQL IaaS extension on the VM, which will connect the main virtual machine resource with the SQL virtual machine resource in the portal. My final step of the template is to create a Custom Script Extension resource which allows me to run a PowerShell script on the VM after it’s created, which finalizes the required setup I need for my machine. All these steps work together to make the bare minimum requirements of the type of VM I need to create.

Getting the entire template file

I have broken down a full Bicep template into its parts in the post below. If you would like to see the template in its entirety for an easier overview, you can find it on my GitHub here.

How to know what configurations to choose

When I first started writing this template, I had no idea what configuration values to choose. When looking at the Microsoft documentation for the VM resource type in Bicep, it seemed like there were endless possibilities for what I could choose to configure the machine. That was overwhelming to me at first. I then had the idea that I would compare the list of possible values in the documentation for the resource type to the settings of an existing resource with settings similar to what I wanted for my new machine.

That effort originally started with me looking at the normal portal view of the resource, but I didn’t stick with that for long. I quickly realized that the portal view of settings doesn’t show a lot of values that I was looking for. But I figured out that you can view all the setup information for a given resource in the portal in a JSON format, which is very similar to the Bicep formatting I needed. I believe this JSON document is likely what would be used by Azure Resource Manager (ARM) to create the same resource, which is why it’s available for every resource in the portal.

To view the JSON version of the resource settings, navigate to the resource in the portal, then near the top right corner of the Overview page, you will have a link to “JSON View”.

Screenshot of the Azure portal showing how you can locate the JSON view for any resource

When you open that pane, you will see something like the following, with all pertinent details of the resource, which you can then use to help you create your Bicep template.

Screenshot of an example JSON view of an Azure VM

Writing the template

Parameters

When creating a new Bicep template, the first thing you’ll need to decide (apart from what specific resources you need to create) is what parameters you will need to input into the template, which will usually be fed in from a pipeline that deploys the template. For my template, I created the following parameters:

  • vmName (string): The name I want to give to the VM that will be created
  • dataDiskNameBase (string): A base string that I will use to name the data disks that I will create and attach to the VM. This base name follows my team’s standard naming strategy for disks.
  • adminPassword (string): The password that should be assigned to the total admin user the VM will be created with
  • adminUsername (string): The username that should be given to the ultimate admin user created for the VM
  • domainAdminPassword (string): This password is used in the PowerShell script I run after the VM is created
  • storageAcctKey (string): The key associated with the storage account where my PowerShell script (used by the custom script extension resource) is stored
  • storageAcctName (string): The name of the storage account where my PowerShell script is stored
  • resourceTags (object): A list of key-value pairs for the resource tags I want assigned to the VM after it’s created

Variables

Variables in Bicep are used so that you don’t have to repeatedly type the same values over and over, like variables in other scripting and programming languages. For my Bicep template, I created the following variables:

  • sqlExtensionName: This variable is used to give a name to the resource for the SQL IaaS extension
  • customScriptExtName: Used to store the name I want to give to the custom script extension resource
  • initializeDisksScript: The URL to the PowerShell file that I have saved in a storage account, which will be run by the custom script extension resource

Resource: Virtual Machine

Once you have your parameters and variables defined, the next step is to define the main virtual machine resource. Depending on the settings and configurations you need for your VM, you may need to define this resource differently. In my template, I have the following definition for the VM resource:

resource sqlVM 'Microsoft.Compute/virtualMachines@2024-07-01'= {
  location:'westus2'
  name: vmName
  properties:{
    hardwareProfile: {
      vmSize:'Standard_D4s_v3'
    }
    additionalCapabilities:{
      hibernationEnabled:false
    }
    storageProfile:{
      imageReference: {
        publisher: 'MicrosoftWindowsServer'
        offer: 'WindowsServer'
        sku: '2022-datacenter-azure-edition-hotpatch'
        version:'latest'
      }
      osDisk:{
        osType: 'Windows'
        name: '${vmName}_OsDisk_1_${uniqueString(vmName)}'
        createOption:'FromImage'
        caching:'ReadWrite'
        deleteOption:'Delete'
        diskSizeGB: 127
      }
      dataDisks: [for i in range(0,3):{
        lun: i
        name: '${dataDiskNameBase}${i}'
        createOption:'Empty'
        caching:'None'
        writeAcceleratorEnabled: false
        deleteOption:'Detach'
        diskSizeGB: 256  // The disks should all have 256 GB except for the 4th which should have 128, so this checks the index value
      }
      ]
      diskControllerType:'SCSI'
    }
    osProfile: {
      computerName:vmName
      adminUsername: adminUsername
      adminPassword: adminPassword
      windowsConfiguration:{
        provisionVMAgent: true
        enableAutomaticUpdates: true
        patchSettings:{
          patchMode:'AutomaticByPlatform'
          automaticByPlatformSettings:{
            rebootSetting:'IfRequired'
            bypassPlatformSafetyChecksOnUserSchedule: false
          }
          assessmentMode:'ImageDefault'
          enableHotpatching:true
        }
      }
      secrets: []
      allowExtensionOperations: true
    }
    networkProfile:{
      networkInterfaces:[
        {
          id:netInterface.id
        }
      ]
    }
    diagnosticsProfile:{
      bootDiagnostics:{
        enabled:true
      }
    }
  }
  tags:resourceTags
  identity:{type:'SystemAssigned'}
}

That is a lot of script to look through, so I will break it down into its components and explain what each does.

Location and Name

Location and name are pretty self-explanatory: the Azure region you want the resource deployed in and the name you want to give to the resource. For the location, I specified ‘westus2’, and for the name, I specified the vmName parameter that I would get input from when deploying the template.

Hardware Profile

Next in the resource definition is the specification of all properties for the VM, which is the meat of the resource definition, so I’ll walk through it section by section. The first property set is the “hardware profile”, which is the VM hardware specification/type. I chose a standard version.

hardwareProfile: {
      vmSize:'Standard_D4s_v3'
    }

Additional Capabilities

Next is to specify “additional capabilities” for the VM, which I only use to set the hibernation setting to off (false).

additionalCapabilities:{
      hibernationEnabled:false
    }

Storage Profile

The next section is much longer, which is where we specify everything needed for the storage that should be attached to the VM, using the storageProfile property.

storageProfile:{
      imageReference: {
        publisher: 'MicrosoftWindowsServer'
        offer: 'WindowsServer'
        sku: '2022-datacenter-azure-edition-hotpatch'
        version:'latest'
      }
      osDisk:{
        osType: 'Windows'
        name: '${vmName}_OsDisk_1_${uniqueString(vmName)}'
        createOption:'FromImage'
        caching:'ReadWrite'
        deleteOption:'Delete'
        diskSizeGB: 127
      }
      dataDisks: [for i in range(0,3):{
        lun: i
        name: '${dataDiskNameBase}${i}'
        createOption:'Empty'
        caching:'None'
        writeAcceleratorEnabled: false
        deleteOption:'Detach'
        diskSizeGB: 256
      }
      ]
      diskControllerType:'SCSI'
    }
Image Reference

The “imageReference” object within the “storageProfile” property is where you specify the type of operating system you want to install on the VM, which you do by choosing a standard Azure VM image that should be installed on the machine. You could also specify a Linux-based image or even your own custom image if you have one already. In the future, I will be updating this template to use a custom image that already has SQL Server and SSMS installed, so I no longer have to do those steps manually as well.

imageReference: {
        publisher: 'MicrosoftWindowsServer'
        offer: 'WindowsServer'
        sku: '2022-datacenter-azure-edition-hotpatch'
        version:'latest'
      }
OS Disk

This object is where you specify the specific Operating System information for the VM, as well as the name of the disk the OS will be installed on, and how big that disk should be.

osDisk:{
        osType: 'Windows'
        name: '${vmName}_OsDisk_1_${uniqueString(vmName)}'
        createOption:'FromImage'
        caching:'ReadWrite'
        deleteOption:'Delete'
        diskSizeGB: 127
      }
Data Disks

The next step of defining the storage profile for the VM is to create any data disks that you want the VM to have. For my case, I need to create 3 separate data disks, because that’s what my team requires for our SQL Server setup. To easily create the data disks with fewer lines of code and less manual specification, I’ve used a loop in the dataDisk specification to loop through 3 times to create 3 different disks.

dataDisks: [for i in range(0,3):{
        lun: i
        name: '${dataDiskNameBase}${i}'
        createOption:'Empty'
        caching:'None'
        writeAcceleratorEnabled: false
        deleteOption:'Detach'
        diskSizeGB: 256
      }
      ]

To enable the process to create 3 unique disks, I specify the name of each disk to use the parameter “dataDiskNameBase” which I pass in when deploying the template, so they’ll get a unique but useful name. I have all my disks created as Empty, since that’s the only way I could get the template to deploy successfully. Each disk is created with 256 GB of storage.

Disk Controller Type

The final setting for the storageProfile property is the diskControllerType, which is what controls the connections between the VM and disks. I set it to “SCSI” because that is what our existing VMs were using.

diskControllerType:'SCSI'

OS Profile

The next property that needs to be specified for the VM resource is the osProfile, which gives more specific settings that should be used for the Operating System setup.

osProfile: {
      computerName:vmName
      adminUsername: adminUsername
      adminPassword: adminPassword
      windowsConfiguration:{
        provisionVMAgent: true
        enableAutomaticUpdates: true
        patchSettings:{
          patchMode:'AutomaticByPlatform'
          automaticByPlatformSettings:{
            rebootSetting:'IfRequired'
            bypassPlatformSafetyChecksOnUserSchedule: false
          }
          assessmentMode:'ImageDefault'
          enableHotpatching:true
        }
      }
      secrets: []
      allowExtensionOperations: true
    }

The computer name for the OS is set to the same vmName parameter I used for the general resource name above. The adminUsername and adminPassword are set to the input parameters with those same names as well. This admin user is the super user that the VM is created with, which will be what is used to initially set up the computer and will allow you to log in even before your VM is joined to your domain and allows your normal login process.

Next, you need to specify the configurations particular to Windows, and I pretty much set these all to what appear to be default values.

Finally, you can specify any necessary secrets, but I had none, and then you can choose to allow extensions or not, which I allowed.

Network Interface

When creating an IaaS VM, you will need to specify a network interface resource that connects network security groups to the VM. You do this by specifying the interface through the networkProfile property.

networkProfile:{
      networkInterfaces:[
        {
          id:netInterface.id
        }
      ]
    }

This section is very short. The only thing I specified within the networkProfile is the ID for the network interface resource I will create later in the template. I specify that value by using the semantic name of the resource (netInterface) then added .id to get to the ID value of that resource. Bicep is smart enough to know that the network interface object should be created before the VM, so specifying the ID in this method will not cause any deployment issues.

Diagnostics, tags, identity

The final things that I specify for the VM resource definition is the diagnosticProfile, where I specify that I want boot diagnostics to be saved, the tags that should be assigned to the resource after it has been created, and the Identity the resource should be created with, which I put as the system-assigned managed identity (not a custom identity).

Resource: Network interface

Next up, we need to specify the creation of a network interface, which is required for the VM to use the network security group we specified in that resource script above. The network interface resource specification is much simpler than that of the VM.

resource netInterface 'Microsoft.Network/networkInterfaces@2024-05-01' = {
  name:'${vmName}-${uniqueString(vmName)}'
  location: 'westus2'
  tags:resourceTags
  properties:{ 
    ipConfigurations:[
      { 
        name:'ipconfig1'
        type:'Microsoft.Network/networkInterfaces/ipConfigurations'
        properties:{ 
          privateIPAllocationMethod:'Dynamic'
          subnet:{ 
            id:resourceId('USWest_2_Network_Services','Microsoft.Network/virtualNetworks/subnets','Subnet_1_Name','Subnet_2_Name')
          }
          primary:true
          privateIPAddressVersion:'IPv4'
        }
      }
    ]
    enableAcceleratedNetworking:true
    enableIPForwarding:false
    disableTcpStateTracking:false
    networkSecurityGroup:{
      id:nsg.id
    }
    nicType:'Standard'
  }
}

The first specification of that resource is the name, which I create by combining the name of the VM it is associated with, plus a unique string (using the uniqueString() function) to make the resource name unique but still meaningful. Next, I provide the location the resource should be deployed to, the tags I want to apply to the resource using the input parameter, and finally get into the properties and detailed configurations of the resource.

For the properties of the network interface, we need to create a sub-resource of type “IP Configuration” which is used to specify the subnet the network interface should be deployed to. For the template to appropriately identify and configure information for the subnets that should be used, I had to use a function called resourceID(), to which I fed the subnet names so it could return the IDs of those subnets for use by the IP configuration.

Then, under the properties of the IP configuration, I specified that I want the private IP address of the resource to be given dynamically, that the IP configuration is the primary, and that the IP address version should be IPv4. Which closes out the creation of that sub-resource.

Continuing with the rest of the properties for the main network interface setup, I specified that I want to enable accelerated networking, I did not want to utilize IP forwarding, and I did not want to disable state tracking for TCP. Finally, we come to the most important part of the creation of the network interface, which is to specify the ID of the network security group (NSG) that we want to associate with the interface. To do that link, you need the ID of the NSG, which you can do by combining the semantic name of the NSG resource plus “.id” to retrieve the ID that will be generated when it’s deployed.

The final configuration I specify in the network interface properties is the type, which is Standard.

Resource: Network Security Group

The further we get into the Bicep template, the shorter the resource definitions get, thankfully. Our next resource specification is for the Network Security Group (NSG), which dictates what network traffic will be allowed into and out of the VM we’re creating.

resource nsg 'Microsoft.Network/networkSecurityGroups@2024-05-01' = { 
  name:'${vmName}-nsg'
  location:'westus2'
  tags:resourceTags
  properties:{ 
    securityRules:[ 
      {
        name:'SQL1433'
        properties:{ 
          protocol:'Tcp'
          sourcePortRange:'*'
          destinationPortRange:'1433'
          sourceAddressPrefix:'10.0.0.0/8'
          destinationAddressPrefix:'*'
          access:'Allow'
          priority:1020
          direction:'Inbound'
        }
      }
    ]
  }
}

Like with the previous resources, we first specify the name, location, and tags before getting into the property specifications. The properties for this type of resource are fairly simple–just the inbound and/or outbound network access rules. For my resource, I created only a single rule to allow the required inbound traffic for a SQL Server instance. The rule I specified allows inbound traffic from any source IP to port 1433.

Resource: SQL IaaS Extension

The next resource I create in the template is one that allows me to install the SqlIaaSExtension onto the newly created virtual machine, after it’s been created. This extension is what allows you to connect a SQL Server instance installed on an IaaS virtual machine to the Azure portal and to connect to the Virtual Machine resource itself. The resource type for this is extensions under the virtualMachines resource type.

resource iaasExtension 'Microsoft.Compute/virtualMachines/extensions@2024-07-01' = {
  name: sqlExtensionName
  parent: sqlVM
  location: 'westus2'
  tags:resourceTags
  properties:{ 
    autoUpgradeMinorVersion: true
    enableAutomaticUpgrade: true
    publisher: 'Microsoft.SqlServer.Management'
    type:'SqlIaaSAgent'
    typeHandlerVersion:'2.0'
  }
}

Most of the settings I specified for this extension resource are fairly standard, since there’s not much to change. I gave it the name, location, and tags that I wanted, just like with the rest of the resources. One key property of the resource specification to note is the “parent” value, which is how we tell Bicep/ARM that this resource is related to another we previously specified, so the other resource should be created before this one is. Bicep is usually good at understanding relationships between resources and which should be created first, but for this resource type, you need to specify it explicitly.

Resource: Custom Script Extension for PowerShell

The final resource I specify the creation of in my Bicep template for my IaaS VM is a CustomScriptExtension resource, which allows me to run a custom PowerShell script on the VM I create after it has been created. In my PowerShell script, I run commands to initialize the disks I attach to the VM, since that is not done for you if you create a VM using a Bicep template.

resource customScriptExtension 'Microsoft.Compute/virtualMachines/extensions@2024-07-01' = { 
  name: customScriptExtName
  location:'westus2'
  dependsOn:[sqlVM]
  properties:{ 
    publisher:'Microsoft.Compute'
    type:'CustomScriptExtension'
    typeHandlerVersion:'1.2'
    settings:{ 
      fileUris:[
        initializeDisksScript
      ]
      commandToExecute:'powershell -file .\\SetUpWindowsVM.ps1 -password "${domainAdminPassword}"'
    }
    protectedSettings:{
      storageAccountName:storageAccountName
      storageAccountKey:storageAcctKey
    }
  }
}

As with the previous resources, I specify the semantic name and location of the resource. I then also must specify the dependsOn value to tell Bicep/ARM that this resource should be deployed after the previously defined resource called ‘sqlVM’. That means that the custom extension resource will only be created after the main VM has been created, as it should be.

The properties object of this resource is where the magic happens, which tells Bicep that the “extension” you’re creating is one that you’ve created yourself to run a script. The publisher value is still Microsoft, but the type is “CustomScriptExtension”. The settings object is where you specify everything you need to tell the extension what script it should be running. In my case, the fileUris list only contains a single object, which is the variable containing the URL location of where my PowerShell file is stored on a Storage Account. Then the commandToExecute string contains the exact command that needs to be run on the VM to execute the PowerShell script I want, along with the required parameters of the script (which in this case is only the “password” parameter).

A cool feature of the custom script extension capability is that there is a protectedSettings object, which you can put any secret values into, and Bicep/ARM will keep those secrets hidden even in the log output of the template deployment. If you put those same values into the settings object instead, the values passed into those parameters would be displayed in the deployment log, which isn’t good for keeping secrets. For my protectedSettings object, I passed in the storageAccountName where the PowerShell script is saved and the storageAccountKey to provide access to that Storage Account. While the name isn’t secretive, I did put it in the protectedSettings to get rid of a warning in the VS Code editor.

Summary

Whew! That was a lot to get through! I hope my breakdown above of all resource creation steps needed for a Bicep template creating an IaaS virtual machine proved helpful to you. There are so many little pieces that go into creating a Virtual Machine through a Bicep template that you never really see when creating the same resource through the portal. If there is anything that I didn’t clarify well enough, please let me know in the comments, and I will clarify the best I can. There are a lot of lines that go into making this type of Bicep template, and it felt like a big time suck at the beginning of writing it, but it is worth the time to create the template to very quickly deploy this resource type repeatedly in the future whenever you want.

Related Posts

An Introduction to Bicep

I am starting a multi-week series of posts covering the topic of Bicep, Microsoft’s Infrastructure as Code scripting language, since I have been working with this tool heavily for the past several months. You may wonder why a database developer is using Infrastructure as Code, since DevOps engineers typically do that. However, I decided to learn Bicep as a proof of concept for my team since we repeatedly find ourselves creating the same types of resources for our projects. We would like to reduce the amount of time and effort we put into creating our infrastructure so we can spend more time developing databases and pipelines to get our users the things they care about.

Stay tuned in the coming weeks for more details on how I have started integrating Bicep templates into my database development lifecycle.

What’s in this post

What is Bicep?

Created using icon from iconmas on FlatIcon.com

Bicep is a declarative scripting language created by Microsoft as a successor to and addition on top of their existing Azure Resource Manager (ARM) templates. (Yes, the new templating script language is called Bicep because the original was called ARM.) Bicep is a scripting language that allows you to easily and quickly write declarative scripts to define how you want Azure resources to be created. These scripts, called templates, are used for Infrastructure as Code, which means you are generating your infrastructure of cloud objects using code instead of provisioning them manually through a portal. The main competitor for Azure Bicep is Terraform, which is a multi-platform tool that essentially does the same thing. But Bicep is developed by Microsoft to work only with the Azure cloud, so it is better suited for deploying Azure resources than Terraform is.

After you have created your Bicep script defining the resources you want to deploy to your Azure cloud environment, you deploy that template either manually or through a pipeline, and Azure gets to work creating the specified resources. In the background, as the deployment begins, the first thing Azure Resource Manager does is convert the Bicep script into the traditional JSON script used by ARM templates, so that it can understand and process your requests, since Bicep is essentially a nicer skin built for ARM. Once the template has been converted to JSON, Azure Resource Manager starts parsing all resource requests and sending those to the correct ARM APIs to create those resources.

If your template is set up correctly, all your requested resources will deploy to your cloud environment, and the time it takes to deploy depends on the resources you are creating.

Use cases for Bicep

Bicep templates are great for any Azure cloud users who find themselves creating the same types of resources over and over again, or for anyone who wants to make their environment re-deployable. For example, if you need to recreate your development environment from production after each deployment cycle, you could write Bicep templates to do that work for you. Essentially, if you are doing resource deployment on a regular interval and you would like to stop doing all that work manually, to reduce human error or to make the process more efficient, Bicep may be the tool for you.

What I use Bicep for

I started using Bicep as a proof of concept for my team to determine if it would be suitable for our needs for creating Azure Infrastructure as a Service (IaaS) Windows servers which we install SQL Server on. As of the time I am writing this, I still haven’t figured out 100% how to generate those servers exactly as we need them through Bicep, but I’ve gotten a good way through writing a template to provision what we need. I have proven that Bicep will work for that, but there are some final technical needs, such as using a custom virtual machine (VM) image with SQL Server already installed with our required settings, which I haven’t quite figured out yet.

Another use case that I have completed a proof of concept for is to generate all required minimum resources for our Azure-based AI chatbots with one template, which has already been immensely useful for me and my team. Creating fully cloud-based resources with Bicep has proven to be much easier to accomplish than creating the IaaS servers I started with, because all specifications and settings for those resources are managed by Azure, so Bicep handles them easily. It took me only a single afternoon to write my AI chatbot backend Bicep template, whereas it took me over a week to get my IaaS Bicep template partially working.

This new AI backend template creates an Azure SQL Server, Azure SQL Database, firewall rules, key vault, key vault role assignment, storage account, auditing on the SQL Server, and a Search Service for managing the document indexes used by the bots we’re developing for our business users. When I was doing this all manually, it would take me about an hour to work through the portal and deploy each of these resources. But now that I’ve turned it all into a Bicep template, it takes only a few minutes to deploy it all.

My thoughts on Bicep

Created using icon from Freepik on FlatIcon.com

Overall, I think Bicep is a cool and useful tool to use to create your Azure resources using a single script, which can be deployed over and over again. It really speeds up the creation of resources and reduces the chance of manual error while clicking through the portal setup of the same objects. There are several flaws with how it operates, such as not always understanding the hierarchy and order of deployment required for related resources, like it claims to do easily, but that isn’t an insurmountable challenge.

As with most scripting or programming languages, I wish there was more and better documentation from Microsoft to help us newbies figure things out faster, but it’s not impossible to learn. I would recommend that anyone who is tired of manually deploying the same or similar Azure resources repeatedly look into using Bicep to speed up their infrastructure work.

Best resources for learning Bicep

I learned most of what I know about Bicep from a Microsoft Learn workshop that is free online. The first learning path I worked through was the Fundamentals of Bicep, which I would recommend for any developer who is new to Bicep. After I completed that one to get a general understanding of writing Bicep, I then completed a few separate modules from the intermediate and advanced Bicep learning paths. Then there were many things I just had to learn on my own through other resources outside of Microsoft or through my own trial and error.

Summary

Bicep is a great tool that all types of developers working in the Azure cloud can likely use to speed up their development lifecycles. If you’re interested in learning more about Bicep, stay tuned for the next few weeks as I take a deeper dive into how Bicep works using detailed real-world examples.

Related Posts

Reseeding Table Identity from Synapse Pipeline

A task I was requested to implement a while ago in an Azure Synapse pipeline was to have the pipeline reseed a table identity column to prepare it for new data that will be calculated and inserted into the table as part of an optimizer solution someone is writing. This request challenged me to learn about identity reseeding since I had never even considered that would be something I would ever need to do, and it also challenged me to make that reseeding query work in a Synapse pipeline.

The process isn’t difficult, but it was a little finicky since a higher level of permissions is needed to run the reseeding command, and some people, like me, might not want to give their Synapse pipeline sysadmin permissions. This post will cover general information about what it means to reseed an identity column and then how you can do that reseeding from a pipeline.

What’s in this post

Background

I completed a project where we upgraded the database servers for a business group, putting their main database onto an Azure Platform as a Service (PaaS) database instead of an old Azure virtual machine. Part of the project was to see if we could rewrite their optimizer into a new Python script instead of their current, very dated, C-based program that requires a VM to run on. Unfortunately, the business group decided to not go with my new script (as I mentioned in a previous post), so we had to come up with a way for them to continue running their current DLL program in the updated environment. For those who don’t know, the PaaS Azure databases no longer have the ability to run DLLs natively like on-prem or Infrastructure as a Service (IaaS) instances do, which meant that we had to find a way to continue running their program without the ability to run it natively on their upgraded database server.

The solution we came up with is to create a new and updated Azure IaaS VM that can run SQL Server and the DLL optimizer program. To get that to work and be cost-effective, when the time comes each month for their program to run, we are going to copy the relevant data for the optimizer from the main PaaS database into this VM’s database, the DLL will be executed which will calculate and load data into a couple of tables, then we will copy that calculated data back to the main server and power down the VM for the rest of the month.

So what does that have to do with identity columns?

Flow diagram that gives an overview of the process requiring me to reseed an identity every month

One of the tables that will have its data generated by the DLL, which will then be copied back to the main table on the main server, has an identity column. On Server A, the table contains customer data up to the beginning of the current month and the maximum identity column value is something like 165487998. On Server B, where the new data for this month will be calculated and prepared to be loaded back to Server A, the table also has that same identity column but the table is truncated each month before calculating the new data. That means that if we did nothing about the seed value for the table on Server B, the identity column value would restart at 1 and work its way up as new data was loaded, which would cause primary key conflicts when the data is loaded back to Server A. We need to prevent that issue and start the current month’s records at the identity value where we left off on Server A, so I needed to find a way to update the identity seed value for Server B’s table when the pipeline runs each month.

What is an identity column?

An Identity Column is a column in a SQL Server database table that has its value automatically generated when a new record is inserted into the table. These columns are frequently used for primary keys (PKs) because an Identity Column must be unique, which is perfect for PKs. When you create an identity column on a table, the default is to have the value of that column start at 1 and increment by 1 every time a new record is inserted. Those default values can be changed to fit your needs. The value that you want the column to start at is called the “seed”; the value you want to increment the column by is called the “Increment”.

For example, if I want to have a column that is automatically generated for me but I only want to have those values be even numbers, you can set the Seed for the column to be 2 and the Increment for the column to be 2, so the first value will be 2, the second will be 4, the third will be 6, and so on. But traditionally, I’ve only ever seen an Increment of 1 used, and I’ve never needed to set a different Seed value before this project.

How do I know if a column is an Identity?

Viewing the Column Properties

There are several ways to find out if a column is an Identity, but the method I use is to look at the properties of the column using the Object Explorer, whenever possible. In the Object Explorer for your server connection, navigate to the table you want to check for an Identity, expand the table, then expand “Columns”. Once you have the list of columns opened, you can right-click on any column and select “Properties” which is where you’ll find if it’s an identity or not.

Note: This method will not work with Azure PaaS servers, which severely limit the the information you can see from context menus and dialog windows.

Screenshot of SSMS showing how you can access the Properties menu of a column to see if it’s an identity or not
Screenshot of the Properties window of a column showing that the column is an identity

In the above screenshot showing the properties for my selected column, you can see that the value for the property “Identity” is set to “True” indicating that the column is an identity column. Then below that, you can see that the “Identity Seed” is set to 1 and the “Identity Increment” is also set to 1. Those two values mean that the first value that will be assigned to a row inserted into the table will be 1, then the value will go up by 1 for every new record inserted.

Scripting the Table

Another way you could quickly learn if a table has an Identity column and what the settings for that column is would be to right-click on the table in the Object Explorer and script the table to a new query window.

Script out the table you want to check for an identity column

Once the script has been generated, you will easily be able to tell if a column is an identity because it will be included in the SQL query to generate the table.

Note: The values for Identity Seed and Identity Increment may not be accurate! In the screenshots above where I show how to find out if a column is an identity or not, both the Seed and Increment values show as being set to 1, but I know for sure that the Seed has been changed to a much larger value. The properties of the column may not reflect the actual values.

Finding identity columns on Azure SQL Databases

As I said in a note above, you cannot simply right-click on a column in the Object Explorer for an Azure SQL Database (PaaS database) and view the properties for the column like you can with IaaS or on-prem versions of SQL Server. To see whether a table has an identity column on Azure SQL Databases, you will either need to script out the table like the previous section said or you can use the following T-SQL query.

SELECT [object_id], 
	[name], 
	column_id, 
	system_type_id, 
	max_length, 
	[precision], 
	scale, 
	is_identity, 
	seed_value, 
	increment_value, 
	last_value
FROM sys.identity_columns
WHERE OBJECT_NAME(object_id) = 'TableName';
Data you can find about a table’s identity column by using the identity_columns view

When you run that query, you will see results like the following, which show the column that is an identity for the table. If the table doesn’t have an identity column, no results will be returned by the query.

Checking Identity Properties with DBCC CHECKIDENT

As I mentioned in a Note above, you can’t always trust that the values for Identity Seed and Identity Increment are correct in the Properties dialog or how they’re scripted out when you script out a table. So how do you know what the true values are? You can use the query on sys.identity_columns above or you can use the DBCC CHECKIDENT command.

DBCC CHECKIDENT ('dbo.TableName', NORESEED)

Note: Be very careful with the formatting of that `DBCC CHECKIDENT` command, because changing the `NORESEED` value to anything else and running it could reset or change the seed value inadvertently. Use caution when using this command and make sure you have it set exactly as you want. See the resources section at the end of this post for more documentation about this command to familiarize yourself with it.

When you run that above command, it will output a message that tells you what the Identity Seed value is currently set to and what the highest value of the column is as well (for when the Identity value has been incremented above the Seed value). In my example, I have only added a single record to my table so that one record has the outputted identity value, and the current column value is the same as the seed since there’s only one record in the table.

The output of the CHECKIDENT function when you specify the NORESEED option

What is reseeding?

Reseeding is changing the value of the Identity Seed value for the Identity column so that it starts the value of the column at a number other than 1 or whatever it was originally set to. For my case, I need to retrieve the maximum value of the Identity column from Server A’s copy of the table, then set the Seed for Server B’s table to that value + 1 (so it starts at the next value above the current identity value in the source table). That means that I need to change the Seed value for my table on Server B to 128166810 instead of 1.

How to manually reseed a column

If you want to manually reseed an Identity column using a SQL command directly on the database, you can do so by using the following command:

DBCC CHECKIDENT ('dbo.TableName', RESEED, <NewSeedValue>);

You can use variables with this command as well:

DECLARE @MaxIdentity int = 128166809;
DECLARE @ReseedValue int = @maxIdentity + 1

DBCC CHECKIDENT ('dbo.LengthAnalysisResultsHeader', RESEED, @ReseedValue);

Permissions Needed to Run Reseed Query

According to the Microsoft documentation (linked in the Resources section at the bottom of this post), one of the following permissions needs to be assigned to the entity that is running the DBCC CHECKIDENT command:

  • sysadmin server role
  • db_owner database role
  • db_ddladmin database role

But that document also specifically mentions that Azure Synapse requires db_owner.

How to reseed an identity column from an Azure Synapse pipeline

For my scenario, I don’t want to run the reseed command manually because the Seed value will be changing every month when our processing runs and generates new data. I have added the DBCC CHECKIDENT command to my Synapse pipeline that’s already loading the data between my two servers so that it’s reseeded automatically right when it needs to be.

In a Synapse pipeline, the reseeding based on a value from another server can be completed with two Lookup activities:

You only need two Lookup activities in your synapse pipeline to programmatically reseed an identity column

First Lookup Activity – Get the Identity value

The first Lookup will query the source database, in my scenario Server A, to get the maximum value for the column that is the Identity.

Second Lookup Activity – Set the Identity value

The second Lookup will run the DBCC CHECKIDENT command, which doesn’t return any results, and will then run a SELECT 1; to satisfy the requirements of the Lookup activity that something should be returned:

DECLARE @MaxIdentity int = @{activity('Get max identity value').output.firstRow.ReseedValue};
DECLARE @ReseedValue int = @maxIdentity + 1
DBCC CHECKIDENT ('dbo.TableName',RESEED,@ReseedValue);
select 1;

Let’s break that reseed query down a little more. The first line is creating a SQL variable called “MaxIdentity” which is an integer, and then setting the value of that variable to the output from the previous Lookup activity, which was called “Get max identity value”. To get the specific value from that preceding activity, we need to add .output after the activity call, then .firstRow to specify we want to get the value from the first row of the output of that activity, then finally add .ReseedValue which is the column name returned from the previous lookup activity.

DECLARE @MaxIdentity int = @{activity('Get max identity value').output.firstRow.ReseedValue};

Summary

Reseeding the Identity column of a SQL Server table manually is a piece of cake because it only requires one small SQL statement. But to do the same process in an automated and repeatable manner is a different story. It’s not overly complicated to create a Synapse pipeline that reseeds a table’s Identity column on demand, but there are some quirks to getting the expressions to do so correctly which involved a lot of guessing and checking for me. I am still new to working with Synapse pipelines in-depth though, so perhaps this wouldn’t be such a difficult task for you more experienced Synapse developers. 🙂

If you run into any issues while setting up the above process yourself, I would love to help as much as I can, so please leave a comment below!

Resources

Related Posts