Are you and your company keeping track of the security of the artificial intelligence (AI) tools your employees are using? Are you aware that AI is not magically more secure than other software tools, and may in fact be more prone to attack due to its newness and speedy development? If not, you need to start watching the news for cyber attacks that are related to AI. These aren’t even exclusive to all the new AI startups making moves in the industry; even tech giants like Google have been found to have major flaws in their AI tools.
I am not a cybersecurity expert, so I won’t go into detail attempting to cover the vulnerabilities that have been found, but I highly encourage you to read through these two articles I found recently that covered the exploits.
The first is about three major vulnerabilities discovered in Google’s Gemini AI assistant. Three different issues spread across different facets of the tool. I expect better of Google.
Standard cybersecurity processes are more important now than ever. Never trust the software or code you are using. Don’t put your most sensitive company data into tools managed by people outside of your company that you don’t trust 100%. Due diligence is always useful. AI is proliferating more than ever and it is guaranteed a lot of the tools won’t be following security best-practices. Protect yourself as much as you can using common sense, and keep on top of recently announced exploits using trusted news sources.
This morning I ran into the strangest data issue I’ve seen with Azure Synapse Serverless SQL Pools, which is that a view we have seemed to be caching data, and that data was very weird, wrong for the column data type, and causing issues with an application query. While I still have no idea how or why this happened since the serverless SQL pools aren’t supposed to cache data like the standard SQL pools, but I thought I would share what I saw to maybe help others (and to see if anyone else knew why this was happening).
I was able to correct the issue to get the app query working again, I just won’t ever know why (unless you know why, in which case you should share in the comments below!).
I first learned of this issue through an application developer on another team emailing me saying that they were running the same query on our serverless and dedicate SQL pools on Synapse, and the query was working as it normally does on the dedicated pool but was giving the following error on the serverless SQL pool:
Msg 241, Level 16, State 1, Line 3
Conversion failed when converting date and/or time from character string.
The query they were trying to run is one that hasn’t had any issues in the past.
In our setup, we load the same data into both our serverless and dedicated SQL pools for different use cases, and both are loaded from the same set of delta parquet files in a Data Lake on an Azure Storage Account. Which means that there should be no reason that the query running on one should run and the other not.
The first step of my troubleshooting process was to replicate the error that the developer sent me, and I was able to do that. I then reviewed the very long query for anything that was doing a CAST or CONVERT on a date field, which helped me narrow down the problem to a single view on our serverless pool.
The Weird Data
After I identified the one view that seemed to have the bad data on a date column, I ran a DISTINCT query on that one column and received the most confusing results.
None of those values look like dates to me. Which made me question myself and the data type for that column “order_entry_dt”, since a data column couldn’t possibly hold string values like this. But no, the column data type does say it’s a “datetime2”.
At that point, I started downloading and reviewing individual parquet files from the Data Lake to see if any of them had that weird data, and none did.
The next step I took was to look at the view definition to see what it was doing again, to make sure I was looking at the correct source data, and that all looked good. I decided that I would run my distinct query on the view itself, but then also run the exact same query using the SELECT statement that defines the view, to see what both returned. And that made things even more weird, because it confirmed that the source data should be correct, but the view is somehow wrong.
If the view definition uses that exact same “OPENROWSET” query as it’s definition, why does the view return different data than the “OPENROWSET” query??
Correcting the Problem
Even if I couldn’t figure out why or how the problem data happened, I still needed to figure out how to resolve the issue so that the app query would start working again. What I decided to do was drop and recreate the view, which is easy given how simple the view definition is.
Once I dropped and recreated the view, I ran the exact same queries as above again, yet this time the view started to return data that I expected.
So why was the view data weird?
While writing this post, I decided that I couldn’t let this go without knowing why the data got weird, so I decided to test out some theories for myself. (Spoiler: I couldn’t figure it out.)
My first theory was that someone ran an UPDATE query on the view accidentally to change the data. But that is not allowed in serverless SQL Pools, which I even tested out myself. By making a copy of the problem view with a different name and then trying to run an update statement, I was able to prove to myself that this theory was not how the bad data was created.
Then I thought maybe someone wrote a query to change the underlying parquet files directly, but that doesn’t seem to be possible either.
The only thing I can now think of is that someone accidentally overwrote the source parquet file, or made a second that was also used as source data, which contained the bad data. Parquet files are immutable, so there should have been no way for someone to update a parquet source file with bad data to get bad data into our serverless pool view.
I have learned many things today, but one of those things is not why this issue happened, unfortunately.
But like I said above, if you have ideas, I would love to hear them in the comments below!
Summary
If you run into this same strange issue where a Synapse Serverless SQL Pool view has weird data that you don’t expect, and that data doesn’t match the query that is used as the view data source, drop and recreate the view to fix the problem.
If you use a Hive Linked Service in your Azure Synapse workspace, you have probably been getting a banner at the top of your workspace every time you sign in telling you that you have linked services that are about to be deprecated so you should update them. If, like me, you go ahead and do that upgrade from version 1.0 to version 2.0, which is touted as being as simple as toggling a switch, you may have been disappointed to find that you now have an error with your connector that prevents your pipelines from running. If this is relatable to you, keep reading because I will tell you how you can resolve the issue.
Fortunately or unfortunately, depending on how you look at it, I was a preview tester for Microsoft for the fix to the Hive V2.0 connector issue because the problem had already been reported and had a fix in progress when I created a ticket with Microsoft support. The solution has now been released and is available for you to implement if you need it.
For a couple months, my team was being annoyed with a banner every time we opened one of our workspaces which utilizes a Hive linked service, telling us that we have a linked service version that is going to be deprecated soon. In August, I finally decided I had enough of the banner and decided to do the requested upgrade to our Hive Linked Service, to upgrade to version 2.0. Making the change was simple enough, changing the radio button we had selected in the settings of the linked service (LS), and I didn’t even have to fix any settings of the LS after making the switch like I had to do with an upgrade to our Oracle connectors. All previously saved settings stayed where they were, so I saved the change and merged it.
A couple days later, on the next run of our pipeline that uses that linked service, we got the following error on the “Copy Data” activity which pulls data from the Hive data source into our data lake.
"ErrorCode=FailedToConnectMSDISource,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to connect to the connector. Error code: 'AuthenticationError', message: 'One or more errors occurred., ErrorCode: AuthenticationError',Source=Microsoft.DataTransfer.Connectors.MSDIBridgeAdapter,''Type=Microsoft.DI.Driver.Hive.Exceptions.HiveConnectorException,Message=One or more errors occurred., ErrorCode: AuthenticationError,Source=Microsoft.DI.Driver.Hive,''Type=Apache.Arrow.Adbc.Drivers.Apache.Hive2.HiveServer2Exception,Message=An unexpected error occurred while opening the session. 'Required field 'client_protocol' was not present! Struct: TOpenSessionReq(client_protocol:null)',Source=Apache.Arrow.Adbc.Drivers.Apache,''Type=Thrift.TApplicationException,Message=Required field 'client_protocol' was not present! Struct: TOpenSessionReq(client_protocol:null),Source=Apache.Arrow.Adbc.Drivers.Apache,'",
The important part of that error message is An unexpected error occurred while opening the session. 'Required field 'client_protocol' was not present!.
As any developer then does, I started googling that error but didn’t find any pages with that same error for Azure Synapse related Hive issues. But I did find several similar errors with different uses of Hive that were caused by version issues between the client and the server, essentially the client got upgraded beyond what the server version was, which caused a similar error for others.
For my pipeline, I do not control the Hive data source that we were failing to pull from because that is managed by one of my company’s customers. I tried to get information about their Hive version to see if it was something they could fix, but I quickly hit a dead end and they told me to make a support ticket with Microsoft. Which I did.
Thankfully Microsoft was immediately helpful and I got to work with a very kind and knowledgeable support woman, who immediately tracked down and found that the issue I was having was one that the engineering team was already working on resolving. Great! There was a fix already in the works! And a couple days after being told I just needed to wait for the fix to be sent out, I received a message from the support agent asking if I would test out their proposed fix to the problem, which was an updated version of the Self-Hosted Integration Runtime (SHIR) that allowed us to connect our systems to our customer’s.
How to Resolve the Problem
The Self-Hosted Integration Runtime (SHIR) that Microsoft wanted me to test was easy to install on our server where we normally run our SHIR for the workspace I was upgrading. They provided a download link to me, so I hit that then easily install the SHIR like you normally would and it upgraded in place. The version of the SHIR I tested for them was “IntegrationRuntime_5.58.9364.4”.
Once I installed that on our server, I updated the Hive connector again (since I was forced to downgrade while the solution was being developed so I could continue getting our data) and this time, the test connection worked. I then committed the change and ran my pipeline, and it successfully completed the “Copy Data” activity that had been previously failing with the 2.0 version of the connector.
The only problem we have now is that the 2.0 version of the connector seems to have changed how data types are handled for Parquet files in Data Lakes, so now I need to remap data types between our Data Lake and the final data warehouse to get the pipeline completely working again. I had to remove the test SHIR from our production system at that point anyway since they told me we couldn’t use it in production yet, so when I do the official upgrade to the 2.0 Hive connector, I will also need to make that data mapping fix.
As of writing this post, the SHIR that I believe contains the fix is live on the site where you normally download that software, except the version is slightly higher. What’s available on their download site is “IntegrationRuntime_5.58.9377.1”. If you are having the same issue I am with this connector upgrade, give this latest SHIR a try to see if that resolves your issue like the test version did for mine. I will update this post once I have done my final upgrade and have confirmed that the SHIR version available online fixes my problem for good.
Summary
If you started to experience connection failures on your Hive linked service in Synapse after changing the version of the linked service from V1.0 to V2.0, update your Integration Runtime to version 5.58 or later to get the problem fixed. When updating the linked service, there is no mention of needing to upgrade the SHIR unless it is really out of date (when I first tried upgrading the connector I couldn’t because we were on SHIR 5.52), so you will need to be aware of this mismatch yourself.
Let me know down in the comments if you also ran into this issue with the upgrade!
Today at TechCon365 in Atlanta, I did another full day session of the pre-con, this time covering everything in the realm of Microsoft Fabric. Before today, I knew hardly anything about Fabric, aside from it being something that Microsoft was pushing our team to use for a new project I wasn’t involved with. The workshop, which didn’t have any hands-on work but was still good, was led by John White and Jason Himmelstein. The two speakers did an excellent job at teaching all aspects of Fabric without being boring. This was the introduction to Fabric as a concept that I’ve been needing for months.
Note: The content below is not guaranteed to be accurate, it’s just what I took away from the workshop. In the coming months, I will be learning more about these tools myself to fact check everything and see if the new tools will work for my team.
For me, the biggest takeaway of things I learned during today’s workshop is that Microsoft intends to get their customers migrated off Synapse as the data movement platform and into Fabric and its OneLake. As of now, Synapse as a product no longer has a product manager (according to the presenters), which means that Microsoft does not intend to make any feature updates to that tool. There is still a maintenance team for the product to fix any issues that may arise and to keep it running, but there are no long term plans for it anymore.
Synapse going away is concerning to me because my company only recently migrated to Synapse off of SSIS. My fear is that if we spend a ton of time and energy converting our ETL systems into Fabric, that as soon as we get the work done, we’ll need to start anew with whatever product line Microsoft releases next, just like what we just did with SSIS to Synapse. I understand that cycles like this are everyday occurrences in the technology sector, but I also know that my boss and the executives at my company are likely not going to be happy when I tell them what I learned in today’s workshop.
If you’re in the same situation my team is in, you are totally in Synapse and maybe just wrapped up getting there so are wary to move on to the next transition to Fabric, don’t worry too much. The presenters did assure us that Synapse will be supported for a while yet, no concrete plans have been announced to retire it. Based on that, John and Jason recommended that teams stop doing any new development using Synapse and instead start to do new development in Fabric to work as real-world test cases of how useful it can be. I haven’t yet done anything like that, so the usefulness is still unknown to me personally.
Hello Fabric
Fabric is being sold as the one stop shop for all things data in the Microsoft universe, the future of all data analytics and processing. No longer is Fabric going to be Power BI 2.0; now it will serve as the location for anything you need to do with data. While the sales pitches are great, I am wary of the transition, like I said above already. I dislike when companies claim that their new tool is going to solve all your problems, because nothing is ever that great in reality. But here is what I learned today.
OneLake: Put every speck of data here
With each tenant, Microsoft gives you a single OneLake, because it should be your one data lake for your organization, just like there is only a single OneDrive for each user. No more having dozens of storage accounts to act as separate data lakes for different processes. Now you get the one and you will like it. More details I learned:
Parquet files: the reigning champions
A “Shortcut” to the source data
If you want to get data from a source system, like your Oracle finance database for PeopleSoft finance, you can add a “Shortcut” to the data source without having to pull the data into OneLake with a normal ETL process.
This concept is the most intriguing to me, and I really want to see how it plays in real life with real data and systems. Could be too good to be true.
Security redefined
They’ve developed a new way to manage security for the data that is supposed to reduce overhead. They want to have you manage security in only one place, just like you’ll be managing all your data in one place with OneLake.
This feature is still in preview, so remains to be seen if it works as well as Microsoft claims
File Explorer: An old but good way to explore your data
A new plugin for Windows has been created that will allow you to view your OneLake data through the Windows File Explorer, like they’re normal files. Which they are, if they’re Delta-Parquet flat files. The new feature is OneLake File Explorer.
You can also view your OneLake data files through Azure Storage Explorer, which is a good second option as well since my team already uses that for other purposes.
For the OneLake concept, I like that it is trying to prioritize reusing what you already have–tools, data, reporting frameworks–so that you don’t have to keep reinventing the wheel for each new data flow process you need to create. The concepts are good, so I’m eager to use them for myself after the conference.
Lakehouse
This is a new way to store data that is offered exclusively by OneLake in the Microsoft ecosystem. It combines the concepts of data lakes and data warehouses by using Delta Parquet format files. It gives you the ability for ACID compliant transactions on the data files while also still having compressed and cheap data storage like a data lake.
Data Factory
This is the part of Fabric that is replacing Synapse pipelines. In the demos during the workshop, I could see that the Dataflow Gen 2s and the data pipelines look almost exactly like what is currently available in Synapse, which is promising, because then at least I don’t need to learn a new GUI. Like with Synapse, the Dataflows are still backed by Spark engine to allow for parallel processing and high throughput. Both tools allow you to move data from point A to point B (the OneLake) for the scenarios where Shortcuts won’t work and you still need to literally move data between systems.
Fabric Native Databases
If the other new data storage types in OneLake and Fabric weren’t enough for you, there is now also going to be Fabric-native databases available for use. These will be SQL Databases (and some other database types) that you create directly in Fabric instead of creating them independently then joining them to the OneLake with a Shortcut. This change to Fabric is what they are intending to use to take Fabric from a data analytics platform to a full data platform. However, the presenters of today’s workshop did caution that they don’t recommend putting your production systems into these native databases yet, since they’re still new and not well-tested in the real world. But maybe in the near future, we truly will be doing 100% of our data integration engineering in Fabric.
Summary
With this technology, it feels like we could be at the cusp of a new big wave of change for how we operate in the data realm, so I am really interested to see how things play out in the next year or two. Using Fabric for all things data could be the next big thing like Microsoft is trying to sell, or maybe it will simply fade into being just another option in the existing sea of options for data manipulation and analytics in the Microsoft ecosystem.
Do you think Fabric is going to be the the next big thing in data technology like Microsoft claims?
This week I am spending every day at the TechCon365 conference in Atlanta, and will be recapping what I learn each day here. Today was technically a pre-con day where I attended an all-day workshop; the main conference starts on Wednesday. I am grateful that I have the opportunity to go to these extra days this year.
Copilot, Copilot, and More Copilot
The workshop I attended all day today was focused on all the ways you can use Microsoft’s Copilots in the Power Platform applications. Besides learning methods of working with Copilot in that suite of tools, I generally learned about the suite of tools itself. I have only personally worked with Power Automate in the past, and the workshop covered Power Apps, Power Automate, Power Automate Desktop, and Power Pages, so I saw a lot of new applications today.
As you are probably aware, Microsoft has shoved Copilots in your face from every direction in their fleet of technologies, and the Power Platform is no exception. Microsoft wants us to believe that the Copilot in every application will someday soon make it so that manually coding simple tools and applications will be a thing of the past, but the workshop today with live demos argues otherwise.
While we did see good examples of how Copilot can help you quickly perform simple and repetitive tasks, we also saw a lot of negative examples of how the tool falls short of replacing your brain, which is good news for developers everywhere.
Room for Growth
The best part of today’s workshop is that the speaker gave us a realistic look at what these tools can do instead of hiding the shortfalls of the Copilot tools behind perfect prompts. In my experience, a lot of seminars showing new technology will have scripts for their demos which make the tools look flawless, despite the tools being less than helpful when a normal person uses them in normal contexts.
Today demonstrated that Copilot doesn’t quite fully understand natural language when it comes to receiving technical directions, and when it does understand, it doesn’t always have the power to execute what has been requested. Copilot seems to be good at making simple changes like setting the background colors of different widgets on a Power App screen or Power Pages website. It isn’t as good at understanding and executing detailed changes to flows in Power Automate. And then if you want to use the Plan Designer in Power Apps to create an entire application and agent environment with a few keystrokes as promised, you will be a bit disappointed when you still need to manually click “Create” for all the new planned resources. Maybe one day the Copilot AI will do all the work of developing for you, but that day has not yet arrived.
Summary
Today I learned much more than I expected from the first workshop of the conference, and I loved it. The speaker was engaging enough to make the 8 hour day go by relatively quickly, and I saw so many new possibilities of using the Power Platform tools. My main takeaway from the class is that AI will not be taking our tech jobs anytime soon since they can’t even do work that I consider fairly simple. If they can’t do Power Apps well, they certainly won’t be doing complicated ETLs anytime soon. But to hedge my bets for the future, I will still be making an effort to learn how to prompt my way through new development when I can.
A task that I completed a while ago that I wanted to share was creating two different pipelines for a business user to load data to and from one of their databases to another server for monthly processing. The business user requested that they receive an email if the pipeline failed so that they would know sooner rather than later that there had been a problem with processing. Unfortunately, there isn’t currently a way to natively send an email from a Synapse Analytics pipeline, there isn’t a built-in activity for that (that I’m aware of). To enable the functionality requested by the business users, I had to create a custom system to send an email.
Since there isn’t a native activity within Synapse Analytics that allows you to send an email, the workaround system I created is to make a Logic App that will send an email when triggered by an HTTP request, then add an HTTP activity in the pipeline to trigger the pipeline and feed through error information to the Logic App to generate a useful email. Keep reading to learn the details of how to setup such a process.
Creating a Logic App to Send an Email
1. Click on the component “Add a trigger” in the designer. In the search box, type “request” and then select the option that says “When an HTTP Request is received”.
2. Click on the link to “Use sample payload to generate schema”
3. Enter a simple JSON-formatted string containing the values you will need to use to send the email
4. Click “Done” when you’ve included all the parameters you want to include in your incoming JSON values, and your simple JSON will be automatically formatted in the exact way needed for the payload to handle the request.
5. Choose the METHOD that will be used for the HTTP request, select “POST”
6. Click into the open area of the designer to close the editor for the HTTP request component then click the “+” under the component to add another to the flow
7. In the search box, search for “outlook”, scroll down until you see the section for “Office 365 Outlook” and click the component for “Send an Email (v2)”
8. If you haven’t logged into an email account through the Logic App yet, it will request that you do so in order to start specifying how the email should be sent. Go through the process of signing into an email account. I recommend using a service account email instead of your own email, unless you’re the only person that will ever be working on this process in the future.
9. Click into the “To” box in the properties of the “Send an Email” component, and you’ll see two blue icons pop up to the left of the box that will allow you to dynamically enter information into that text box. Click the top icon that looks like a lightning bolt to pick values from the previous step of the process.
10. In the box that opens, select the “to” value from the HTTP request component to fill into the “To” box of the email properties.
11. Repeat the same steps for the Subject of the email and the Body of the email.
Adding the Email Capability to your Synapse Pipeline
After saving the Logic App, you will be ready to use it from your Synapse pipeline. Once saved for the first time, you will now see that a URL has been generated for the HTTP Request activity which you will use to call the Logic App from your pipeline. Copy that URL.
Return to your Synapse pipeline so that we can set it up to call the Logic App to send an email. Since we essentially created an API which we can hit to send an email, we need to add a “Web” component to our pipeline. In my case, I want to send an email to a business user when the pipeline fails since they are the key party interested in the pipeline failing. For that, I have added my “Web” component as the “On Failure” route of my pipeline (red “x”/line).
Open the “Web” activity to set it up to hit your new API. Give the Activity whatever name you would like, in my case it is named “Send Failure Email 1”. Then go to the “Settings” tab and paste the URL into the URL field. Then for “Method”, choose “POST” since we want to send information to the URL from our pipeline.
Next, we need to write the body of text to send to the API. This Body will need to match the formatting that we specified in the Logic App when we configured the JSON Payload. In that payload, I specified the following parameters should be required: To, Subject, and EmailBody. Based on that, I have written my Body for the Web activity as the following:
{
"to":"@{pipeline().parameters.EmailRecipient}",
"subject":"@{pipeline().parameters.EmailSubject}",
"emailbody":"Pipeline failed when trying to copy data for table @{item().TableName}."
}
When working with the expression builder in Synapse, you need to be very particular about formatting when using pipeline parameters or variables in an expression. Pay close attention to the double-quotes, the curly brackets, and the @ symbols. In my HTTP Body, I set the values of two of the parameters I send to the Logic App through pipeline parameters so that I only had to change them in one place and not in every iteration of the email component in my pipeline. If you don’t want to use parameters like that, you could hard-code the values of the key-value pairs in the body to whatever you would like, similar to how I set the value of emailbody.
Click “OK” to close the expression builder panel. The very last thing we need to add to our “Web” activity is the Headers of the POST HTTP message, we only need to add one. Click the “+ New” button to add one, then enter the Name as “Content-Type” and the Value as “application/json”.
Testing the Pipeline Email Capability
Once you have finalized the addition of the “Web” activity in your pipeline, you are ready to Debug and test the changes, which you can do in the manner in which you normally test your Synapse pipelines. For me, I knew my pipeline was setup successfully in the way I wanted when I created a false failure scenario (since the email only sends when a component fails), I ran a debug, the previous component failed and I received an email containing the failure information that I wanted. Your testing may look slightly different.
Summary
Although Synapse Analytics doesn’t have a built-in feature for sending an email from a pipeline, you are able to build that functionality yourself fairly easily using a Logic App and a Web activity in your pipeline. I would love to hear if you found this method useful or implemented it yourself. Thanks for reading!
As part of my ongoing series about working with Bicep and SQL Servers and SQL Databases, I am going to cover how you can turn on auditing for a SQL Server instance using Bicep. The auditing I am referring to is what you will find if you go to an Azure SQL Server (Platform as a Service, fully cloud version), then navigate to “Settings” and then “Auditing”. By default, the auditing option will track a few different things that happen on your server, and then you’re able to customize it further if you would like. This post will specifically cover how to enable auditing for a SQL Server using a Storage Account for the destination of the audit logs and the “Storage Access Keys” option for the Storage Authentication Type.
According to the Azure portal, auditing “tracks database events and writes them to an audit log in your Azure Storage account, Log Analytics workspace or Event Hub.” This allows you to keep track of what is happening on your server and its databases in case you ever need to look back and see what changed or if you need to troubleshoot an error in more detail. Auditing gives you a similar level of server and database traceability as the standard SQL Server Logs on a normal instance, just in a slightly different form since it’s 100% in the cloud.
In my organization, we have a requirement for auditing to be turned on for all our SQL Servers as a safety measure, which is why I wanted to learn how to enable it in the portal and then later how to enable it through a Bicep template now that we’re automating our resource creation more.
Note: This Bicep template applies auditing at the SQL Server level, not the SQL Database level. Both are options that you can choose from. Understand that I am not showing how to enable auditing for a specific database, as the template for that is likely slightly different.
In the portal, this is what the Auditing settings page looks like.
Continue reading and I’ll show you how to convert that portal page into a Bicep template that you can deploy without much effort at all.
Full Bicep Template
This post is discussing a subset of a Bicep template that I’ve also covered in part in other posts. If you would like to see the Bicep template in its entirety, you can find it on my GitHub.
Create a Storage Account Resource
Since I have opted to send my SQL Server audit logs to a Storage Account, we first need to create a storage account to send the logs to. If you already have an existing storage account that you would like to send the files to instead, you can skip this step and do an “existing” type reference in the script.
The definition below creates a very standard storage account. The only thing to mention is that it must be a “V2” type of storage in order to work with auditing.
Once you have your storage account created and ready to use, you can then enable auditing on your SQL Server by creating a auditingSettings type resource like what I have below.
The first notable thing is that I have specified the parent of the resource as the SQL Server that I created earlier in the template (not shown in this post, but is on GitHub), which is important to tell the auditing resource what you want it to be applied/connected to. If you created the SQL Server in a separate template, you would have to reference the server differently to get the template to understand what you’re trying to apply the auditing to.
Specify what you want to audit
The next notable thing about this resource definition is the list of auditActionsAndGroups, which I chose as the default values. If you go to the Auditing settings in the portal and hover above the (i) icon next to where you enable the auditing feature, you will get information about what auditing rules the Azure SQL Server will get by default unless you specify otherwise.
The default policies are:
BATCH_COMPLETED_GROUP
SUCCESSFUL_DATABASE_AUTHENTICATION_GROUP
FAILED_DATABASE_AUTHENTICATION_GROUP
In my experience so far, those options have been sufficient for my troubleshoot needs, but you can add others to the list if you would like. To see all the options, review this Microsoft document.
The best part of creating the auditing resource through Bicep is that you are able to specify those additional “action groups”. Otherwise, you would have to use PowerShell to update the list of things you want to be audited, since you aren’t able to make that change in the portal at this time.
Retention length and storage account connection
One of the final things you will need to set in your auditing resource is the number of days you would like to retain the logs, with a maximum value of 100. I chose 90 days, according to my organization’s standards.
Then you need to specify the storage account and the storage account keys to grant access with. I have set the storageEndpoint value using a variable which contains the name of my storage account, which was created earlier in the full template. I then need to provide the “primary” key of the storage account which gives the auditing resource access to save logs to the storage account. To do this, since I had created my storage account in the same template above, I extract the keys using built-in Bicep functionality. storageAcct.listKeys().keys[0].value
The breakdown of that key extraction is <stg acct symbolic name>.<function that will list keys>.<first key in list (primary)>.<value in keys array at location 0>.
Summary
With just two resource definitions in a Bicep template, you are able to specify custom auditing for your Azure SQL Server instance, and do extra detailed auditing beyond the defaults which isn’t available through the portal. Let me know in the comments below if you found this helpful. Thanks for reading!
If you’ve been following along in my series on using Bicep templates to deploy database-related resources in Azure, you may have seen my template for creating Azure SQL Servers and Databases on GitHub which contains a section for defining role assignments. I haven’t yet covered the specifics of that type of resource in a post, so I will be going over that today to simplify the information you need to know to assign particular Role-Based Access Control (RBAC) roles to resources you’re creating in Bicep templates. Because why would you want to manually add roles to a resource that you’re automatically deploying? (You don’t!)
The template I am pulling the role assignment section from creates an Azure SQL Server, Azure SQL Database, and then a Key Vault, and you can learn more about that in my previous post. What I didn’t cover previously though, is that in my actual use case, I need to then give my team admin rights on the Key Vault after it’s created so that we have the permissions needed to add secrets to the Key Vault.
The Full Template
If you would like to see the Bicep template I will be covering in today’s post in full, instead of piecing it together from this post, you can find it on my GitHub.
Assigning RBAC Roles to Resources with Bicep
The process of creating role assignments on Azure resources using Bicep templates is relatively simple. The only data you need to supply to create such role assignments are the Object ID of the group or user you want to assign the role to and the ID for the specific role you want to assign. The hardest part of that is to find the ID of the role assignment in the Azure portal.
Get the Object ID for the user or group
To get to the Object ID value for the user or group to which you want to assign a role, navigate to Entra ID in the Azure Portal, then search for and select the user or group from the dropdown list. When you open the user/group, you will find the Object ID value to copy into your template.
Finding the ID for the Role Assignment
The easiest way to figure out the ID value of the role assignment you want to give to a user or group is to go to the type of resource that will get the assignment, in this case a Key Vault, go to the IAM page of the resource in the portal, and then view role assignments and view the JSON for the needed assignment. If you know of a better way, please let me know in the comments below!
First, navigate to an existing Key Vault in your Azure environment and go to the “Access control (IAM)” page.
Screenshot showing the Access control page for a Key Vault resource in the Azure Portal
Then, click on the “Roles” tab. On that page, search for “key vault”, or whatever keyword will help you find the role you’re looking for. Once you’ve found the role you want to add through your Bicep template, click on the “View” link for the role under the “Details” column.
Screenshot showing how you can find a specific role from the Roles tab of the Access control page for a resource
When you click on “View”, it will open another pane with details about the role. In that pane, select the “JSON” tab and then copy out the GUID value from the end of the string for “id” in the JSON description.
Screenshot of the Azure Portal showing where you locate the ID of a role assignment from the JSON details of the role
That is the value you will need to put into your template, so keep that handy.
Creating the roleAssignment resource
The short definition to assign a role to a user or group via Bicep is like the following:
var roleDefIDKVAdmin = '00482a5a-887f-4fb3-b363-3b7fe8e74483' /* The role ID for "Key Vault Administrator" */
var admin_entra_group = '0000-0000-0000-0000' /* Object ID for our Entra group */
var roleAssignmentName = guid(admin_entra_group, roleDefIDKVAdmin, resourceGroup().id)
resource roleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01'= {
name: roleAssignmentName
properties: {
principalId: admin_entra_group
roleDefinitionId:resourceId('Microsoft.Authorization/roleDefinitions', roleDefIDKVAdmin)
}
}
Variables
I have 3 variables in my template that simplify the process of specifying the settings for the roleAssignment resource.
roleDefIDKVAdmin: The ID value I retrieved from the Azure Portal for the role I want to assign
admin_entra_group: The Object ID for the Entra group I am assigning the role to
roleAssignmentName: A value I generate to give to the roleAssignment resource based on the other two variables. Important to note: the part “resourceGroup().id” is what ties the role assignment to the key vault, because I am granting the role assignment at the resource group level. If you want to apply the role assignment to a single resource, you would do symbolicName.id instead.
Resource definition
Once you have the 2 values retrieved as I specified above, you simply need to pass the variables into the correct locations of the resource definition. The name of the resource is given the name from variable roleAssignmentName. The principalID is the user or group the role should be given to, which is provided through variable admin_entra_group. And finally, the roleDefinitionID which we provide by using the resourceId function in Bicep, which retrieves the actual ID value for a role given the ID value we retrieved manually from the portal. We are required by Bicep formatting to use this function instead of simply passing in that value ourselves.
Summary
Once you retrieve the ID of the user or group you want to give a role assignment to, get the ID for the specific role assignment you want to give, then tie those together with the Bicep template, it’s very simple to set up the resource definition to assign that role to that group when the template deploys.
This is the fifth in my series of posts about working with Bicep, the declarative scripting language created by Microsoft as the modern facade for Azure Resource Manager templates. Today’s post is focusing on a small bit of information that may be important for some Azure environments.
For those of us who work in Azure and have Azure SQL Servers and Databases, we often want those resources to be available to other resources, such as applications, within the Azure environment. To grant that access through the portal, there is a simple checkbox on the Networking page of the Azure SQL Server that will allow that. But if you’re trying to declaratively create your Azure infrastructure with Bicep like I am, how are you meant to accomplish checking that same box? That is what this post will answer.
For those of you who may not be as familiar with what I was referencing above, what I am referring to is this checkbox that you find on the Networking page of an Azure SQL Server instance in the Azure Portal.
Screenshot showing the checkbox I am talking about, which allows other Azure resources to access the SQL Server resource
The purpose of this checkbox, as you can learn by hovering over the information icon at the end of the line of text for the checkbox, is to configure the firewall settings to allow inbound access from Azure resources. By default, this box is left unchecked.
Screenshot showing the information about the checkbox in the Portal screen
However, if you are in a situation where you have a separate application that is going to be accessing the server and database (which is fairly common), you will also likely need to ensure this networking rule is checked when creating new Azure SQL Servers.
How to check that box in a Bicep template
Understanding what the checkbox does
The heart of what that checkbox is doing in the background when it is checked is creating a firewall rule which indicates to Azure services if they are allowed to connect to the server and database or not. You can see what it’s doing by querying the system table which contains the list of server firewall rules, which I wrote about in a previous post.
When the box is checked, you can see a firewall rule for the server in the table sys.firewall_rules (run on the master database), called “AllowAllWindowsAzureIps” with start and end IP addresses both set to 0.0.0.0.
Screenshot of SSMS showing query that lists all server firewall rules
Conversely, if that box is not checked on the portal, you will not see a firewall rule in the table with that name or IP address range. Once you understand that the checkbox is setting a specific firewall rule for you in the background, it becomes simple to figure out how you can make that box get checked through a Bicep template.
Bicep setup to check the box
Since the checkbox in the portal is creating a simple firewall rule with a specific name and IP address, we can recreate that easily within a Bicep template by creating a firewallRule resource within the template and tying that to your Azure SQL Server with the parent attribute.
Create the sqlServer resource in the template
First, create the SQL Server resource so that we can tie the firewall rule resource to it. If you would like to see how to define such a resource to then create a firewall rule for, see last week’s post.
Create the firewallRules resource in the template
After you’ve create the Azure SQL Server resource in your script, the next step is to specify a second resource of the type firewallRules which we will use to check that box on the server’s networking page.
The important parts to note from that resource definition are the following:
name: 'AllowAllWindowsAzureIps': This specifies the name of the firewall rule that is going to be created. In this instance, the rule needs to have this specific name in order to check the settings box in the portal and allow the access
parent: sqlServer: This line is what tells the template that this firewall rule needs to be applied to the server we created in the same template previously, with the symbolic name of sqlServer.
endIpAddress: '0.0.0.0': This, along with the startIpAddress, must be set to all 0s, which is what Azure looks for when trying to see if another Azure resource has access to the server. This “range” is special, reserved for this situation.
startIpAddress: '0.0.0.0': This, along with the endIpAddress, must be set to all 0s, which is what Azure looks for when trying to see if another Azure resource has access to the server. This “range” is special, reserved for this situation.
Then you can also specify any and all Azure SQL Databases you want to be on your server in the template if you would like and then deploy the template. After the template has been deployed, navigate to your new server in the Azure Portal and review the networking settings to see that the “Allow Azure services and resources to access this server” box is successfully checked.
Summary
If you’ve been wondering how you can get the one box on the Azure SQL Server Networking page in the portal checked by using a Bicep template, it’s as simple as adding a firewall rule for the server with a start and end IP address of all 0s. This special firewall rule is used to indicate to Azure that a given resource is available to access through other Azure resources.
Let me know in the comments below if this tip helped you or if you have any questions regarding setting up firewall rules for an Azure SQL Server using Bicep.
Continuing on in my series about doing Infrastructure as Code (IaC) using Bicep templates in the Azure environment, today’s post will cover how to create an Azure SQL Server and Azure SQL Database with a deployable Bicep template. If you haven’t seen it yet, also check out my previous posts about creating an Infrastructure as a Service (IaaS) virtual machine with a Bicep template and using a Custom Script Extension with that type of template.
If you would like to see the Bicep template I will be covering in today’s post in full, instead of piecing it together from this post, you can find it on my GitHub.
What the template does
This template creates 3 different resources, which are the ones my team wants to spin up any time we have a new project of a certain type. The resources it creates are: Azure SQL Server, Azure SQL Database, and a Key Vault.
The Azure SQL Server and Azure SQL Database are the fully cloud, Platform as a Service (PaaS) resources for SQL Server. These resources make it very easy to create a new server and any number of databases on that server without having to manage the infrastructure yourself and to only pay for what you need to use.
The Key Vault is a resource type that allows you to store any secrets, certificates, or keys in a secure location which is accessible to other privileged resources in your subscription (like a pipeline executing code). This resource would be optional for others who are only looking to create the server and database, it’s just something I’ve included because I have it in my production template for my team.
In addition to those 3 resources, there are a few others created in my Bicep template that I will be covering in-depth in future posts to keep things shorter here, so you’ll only see those in my GitHub repo for now.
How to know what configurations to choose
When I first started writing this template, I had no idea what configuration values to choose. When looking at the Microsoft documentation for the resource type in Bicep, it seemed like there were endless possibilities for what I could choose to configure the resources. That was overwhelming to me at first. I then had the idea that I would compare the list of possible values in the documentation for the resource type to the settings of an existing resource with settings similar to what I wanted for my new machine.
That effort originally started with me looking at the normal portal view of the resource, but I didn’t stick with that for long. I quickly realized that the portal view of settings doesn’t show a lot of values that I was looking for. But I figured out that you can view all the setup information for a given resource in the portal in a JSON format, which is very similar to the Bicep formatting I was needing. I believe this JSON document is likely what would be used by Azure Resource Manager (ARM) to create the same resource, which is why it’s available for every resource in the portal.
To view the JSON version of the resource settings, navigate to the resource in the portal, then near the top right corner of the Overview page, you will have a link to “JSON View”.
Screenshot showing where you can find the JSON view of a resource definition in the Azure Portal
When you open that pane, you will see something like the following, with all pertinent details of the resource, which you can then use to help you create your Bicep template.
Screenshot showing the JSON definition for an Azure SQL Server resource
Creating the template
Parameters
When creating a new Bicep template, the first thing you’ll need to decide (apart from what specific resources you need to create) are what parameters you will need to input into the template, which will usually be fed in from a pipeline which deploys the template. For my template, I created the following parameters:
serverName (string): The name you want to give to the Azure SQL Server that the template is going to deploy. You may want to add a minLength and maxLength setting to this parameter is you are worried your template deployers are not going to choose names that are within the length requirements Azure requires.
databaseName (string): The name you want to give to the Azure SQL Database that the template is going to deploy. You may want to add a minLength and maxLength setting to this parameter is you are worried your template deployers are not going to choose names that are within the length requirements Azure requires.
keyVaultName (string): The name you want to give to the Azure Key Vault that the template is going to deploy.
location (string): The region you want the resources deployed in. If you are always going to want to use the same region, you can change this to a variable instead. I have set a default for this of “westus2”.
saLoginName (string): The login name that you want to have created for the system admin (SA) user for the SQL Server. When you create an Azure SQL Server resource, you are required to provide an SA, so I am setting that value by passing it in as a parameter to the template at runtime.
saLoginPassword (secure string): The password you want to give to the SA login the server will be created with.
Variables
For the three resources I’m discussing the creation of in this post, there are no variables needed.
Resource: Azure SQL Server
Once you have your parameters and variables defined for the template, the next step is to start defining the main resources you want to create. Depending on the settings and configurations you need for your own SQL Server, you may need to define this resource differently. In my template, I have the following definition for the Azure SQL Server resource:
For the name of the server, I pass in the parameter “serverName”. Then, under the properties object, I define the required elements of the resource, which are:
administratorLogin: set to the value of the parameter saLoginName
administratorLoginPassword: set to the value of the parameter saLoginPassword
publicNetworkAccess: I have set this to “enabled” so that we’re able to access this resource through the public internet, which is required for normal usage and adminstration of the server and databases after they’re created.
adminstrators: This is a more complicated setting that is its own object. This “administrator” setting is the Microsoft Entra Admin for the server, which is essentially an Entra user or group that becomes the super user of the server with complete control to do whatever they want on the server and databases on it. I set this to an Entra group my team uses for ourselves by setting the principalType to “Group” and then giving the name of the group for login, and then the ID value and tenant ID for our group. You can find the “sid” value as the Object ID value for the group or user in Entra in the Azure Portal. And you can also find your Tenant ID through the portal as well.
restrictOutboundNetworkAccess: I have set this to “disabled” because we want the server and databases on it to be able to send data outside of themselves so that we can use them like normal.
Resource: Azure SQL Database
Depending on the settings and configurations you need for your own SQL Database, you may need to define this resource differently. In my template, I have the following definition for the Azure SQL Database resource:
For a SQL Database resource definition in Bicep, it’s really important to specify the parent value so that Bicep and Azure Resource Manager (ARM) know what server the database should be deployed onto and that the server resource should be created before the database resource. For my template, I made sure to specify the “symbolic name” of the Azure SQL Server resource I created previously in the template, which is “sqlServer”. After specifying the parent, you will also need to specify the name of the resource, which I used a parameter for, and then the region the resource should be deployed in, which I also used a parameter for.
The next thing you’ll need to specify is the sku of the database, which is the type settings for the database which dictate price, performance, and storage. In this case, I have chosen to make a database in the General Purpose tier, which is a little cheaper, and then to give it 2 CPUs. I think the easiest way to find the specifications for this section is to review an existing resource in your environment and see what values it is set to, or refer to Microsoft documentation.
Once you’ve set up the sku information, you’ll then need to specify the general properties of the SQL Database through the properties object in the resource definition. For mine, I set the following:
collation: This is the standard collation type you want your database to be setup with. I have chosen the default for North America/US, which will use English and then be case-insensitive and accent-sensitive.
maxSizeBytes: This is the number of bytes you want to be able to store in your database. That number I have set equates to 34 GB approximately, which I believe is the default.
catalogCollation: Same as the normal collation listed above
zoneRedundant: I’ve chosen not to make my database zone redundant, because I want it to be as cheap as possible and won’t be severely impacted if the entire Availability Zone goes down.
licenseType: I chose “LicenseIncluded” which means we will pay for the license and database as we use it, we don’t have an external license key to provide
readScale: I elected to not have the database setup for scaling for reads
requestedBackupStorageRedundancy: I selected “Geo”, which means we will have some redundancy, but not much. This is the standard my team set for this type of scenario.
isLedgerOn: I do not want this to be a ledger database so I’ve set this value to “false”.
availabilityZone: I’ve set as “NoPreference”
Resource: Key Vault
While most of you probably aren’t here for learning how to create a Key Vault with a Bicep template, I thought I would include it just because I have it in my production template. As with every other type of resource, you may want to change the below definition to match your own needs.
Just like with the other resources defined above, I set the name of the resource and the region it should be deployed in using parameters. And similarly to the definition of the SQL Database resource, for the Key Vault definition I also needed to define a sku, which is the pricing model for the resource. I chose Standard, and I really don’t know what other options there may be since this is such a simple type of resource.
The tenantId value is the ID for my tenant resource, so the template knows where to deploy the Key Vault. For networkAcls, I chose to not setup any bypass and to “Allow” traffic by default. Then the rest of the values are ones that seem self-explanatory to me and I believe are set to the defaults for all Key Vaults.
Summary
Creating an Azure SQL Server and an Azure SQL Database through Bicep is a fairly quick and simple process, much simpler than defining an IaaS VM like in my previous posts. That is due to the fact that the SQL Server and SQL Database are both fully-cloud resources, they’re not something you are able to access the infrastructure for. Since Azure manages the infrastructure, it is much easier for them to create these resources from a template.