Category: Azure (page 1 of 4)

Recovery Options for Azure Key Vaults

I recently had a post about how to manage deleted Azure Key Vaults: how to find them after they’ve been deleted, how to recover them if you didn’t really want them to be deleted, and how to purge them if you want them to be permanently removed. In today’s post, I am going to cover more of the finer details of the recovery options you can set on a key vault when creating it, which will dictate the options you have available for recovering them in the case they get deleted. My organization has defaults that enabled me to recover a deleted key vault, but you may not get that option unless you specify that yourself, so I will tell you how!

What’s in this post

Recovery Options for Existing Key Vaults

As of the end of 2025, when I went into an existing key vault that I’ve already created, and I reviewed the “Properties” of it under the “Settings” tab in navigation, I could see the following settings:

  • Soft delete policy
  • Days to retain deleted vaults
  • Purge Protection policy

In the portal, that looks like this:

Screenshot of the Properties page of a Key Vault in the Azure Portal showing the retention settings available: soft delete, days to retain deleted vaults, purge protection.

For the example key vault shown, Soft Delete was enabled when it was created, which means that when a key vault is deleted, it will no longer show in your main list of KVs, so it appears deleted. Even though it looks deleted, you can still see the vault for the specified retention period if you go into a separate section of the console, which I covered in my previous post.

For this KV, we set the number of days to retain vaults after they’re deleted to 90 days, which is generally the default value supplied during setup. The final option you can choose is to either enable or disable Purge Protection, which is a feature that dictates whether or not you can permanently delete a KV before the end of the retention period. In our case, Purge Protection was set to disabled because this wasn’t an important key vault, and we don’t believe we would be negatively affected by an internal bad actor if they decided to delete and then permanently delete this vault.

One thing to note about the settings of this already created KV is that most of the recovery options cannot be changed after it’s created.

Recovery Options When Creating a Key Vault

For the most part, when you create a new key vault, you have total control over the recovery options for that vault. However, starting at the beginning of 2026, Microsoft changed the settings to force Soft-delete to be enabled for all key vaults. At the time I took these screenshots, which was at the end of 2025, I was still able to create a key vault through the Azure CLI, PowerShell, or REST API with soft-delete disabled, which is noted with the “i” icon next to the title during setup, but that has likely already been removed at the time of posting this:

Screenshot of an information icon on the Soft Delete setting for a new key vault, which says, "The ability to turn off soft delete via the Azure portal has been deprecated. You can create a new key vault with soft delete off for a limited time using CLI / PowerShell / REST API. The ability to create a key vault with soft delete disabled will be fully deprecated by the end of the year."

Besides the soft-delete setting, though, you have total control over the two remaining settings, which are the days to retain deleted vaults and then whether or not Purge Protection is enabled.

Screenshot of the recovery options settings available when creating a new key vault in the azure portal

By default, the value for “Days to retain deleted vaults” is 90, but you could set that to any value you would like. Also, by default, the Purge Protection option is set to “Disable”, but you can also change that if you would like to prevent people from permanently deleting key vaults after they’ve been soft-deleted.

Depending on our use case, my organization will decide to either enable or disable Purge Protection. If the vault is for something that is business critical, it will get the Purge Protection. But if the vault is only something that we use for our own testing or development, we likely won’t enable it. There is no additional cost either way, so choose the option that suits your security and retention requirements best.

Summary

When creating a new Azure Key Vault, there are very few retention settings you have control over. If you’re looking to change an existing key vault, there are even fewer things you can change for retention. That means you need to ensure you choose the correct settings for you and your organization at key vault creation time, since you will only be able to toggle the Purge Protection option after the vault has been created.

If you’re looking for more information on managing deleted key vaults, hop over here to see my post on that. Interested in learning how to create key vaults with a Bicep template so you don’t have to do it manually? I have a post about that as well!

Limitations of Azure Synapse Analytics SQL Pools

In the past couple of months, I have run into some annoying limitations and unsupported features with T-SQL in our Azure Synapse Dedicated SQL Pools, which is where my team puts our data warehouses for reporting needs across our business. These unsupported features were a bit surprising to me, which is why I’ve decided to share them in this post. This is certainly not going to be an all-inclusive list of things you can’t do in Dedicated SQL Pools that you can in a normal SQL database, but I will keep it updated as I run into more barriers in the future.

What’s in this post

No Primary Keys

When you understand that the Dedicated SQL Pools (DSP) are meant to be used as Online Analytics Processing (OLAP) data warehouses for analytics and reporting purposes, it makes more sense that primary keys are not allowed on tables since that would affect performance. I did not initially put that concept together until I was trying to create a new table on the DSP and got the following error:

Screenshot showing a T-SQL command to create a Primary Key on a table in SQL Server Management Studio and the corresponding error since the command is not supported on this type of database

The full error message is:

Enforced unique constraints are not supported. To create an unenforced unique constraint you must include the NOT ENFORCED syntax as part of your statement.

Unfortunately, there is no workaround for this limitation. You must accept that you will not be able to enforce uniqueness on your tables in the DSP, unless you want to have “uniqueness” in name only and not have it actually enforced.

No Foreign Keys

In the same vein as above, foreign keys are also not allowed in a Dedicated SQL Pool (DSP), since that would also hinder the performance of OLAP queries if they existed. When trying to create a foreign key on a table in a DSP, you get the following error:

Screenshot showing a T-SQL command to create a Foreign Key on a table in SQL Server Management Studio and the corresponding error since the command is not supported on this type of database

The full error text is:

Parse error at line: 3, column: 1: Incorrect syntax near 'FOREIGN'.

This error is different from the error received when trying to create a primary key, because this query simply returns a parsing error; the query processor doesn’t even understand the concept of trying to apply a foreign key constraint to the tables in this database, since it’s a DSP.

There is no workaround for this limitation beyond accepting that you will not be able to maintain relationships between tables in your database for this type of server.

No Default Constraint using a Function

This is the constraint I am most sad about not being able to apply to tables in a Dedicated SQL Pool (DSP), since I think it would be really useful for certain situations. But again, needing to apply a default value function for every row inserted into a DSP table would slow down the processing speed, which is bad for OLAP queries.

When you try to add a default constraint using a function or expression to a table in a DSP, you will get the following error:

Screenshot showing a T-SQL command to create a Default Constraint on a column in SQL Server Management Studio and the corresponding error since the command is not supported on this type of database

The full error message for that is:

An expression cannot be used with a default constraint. Specify only constants for a default constraint.

However, you are still allowed to add default constraints that have a constant value, like this:

ALTER TABLE dbo.CUSTOM_FORECAST_Raw
ADD CONSTRAINT DF_MyDefault
DEFAULT '1900-01-01 00:00:00' for UploadDatetime;
Screenshot showing a T-SQL command to create a default constraint with a constant value on a column in SQL Server Management Studio and the corresponding error since the command is not supported on this type of database

My most common use of a function in a default constraint for a column is to set an update datetime for the row, and that is precisely what I was trying to accomplish when I found this limitation. There is no direct workaround for this limitation. What I did to subvert it was to add the UpdateDatetime value to the column further upstream in the ETL process. Hopefully, that type of workaround will be doable for you as well.

No Multi-Line Inserts

This limitation is one I truly don’t understand, and I was really confused about it when I first ran into it during development. In a normal SQL Server database, you can insert multiple rows into a table at once using the syntax INSERT INTO dbo.MyTable VALUES (value1, value2...), (value3, value4...);. Running that command would insert two different rows into the table at the same time. That syntax is not allowed in Dedicated SQL Pools (DSP).

If you were to run the same command on a DSP, you would get this error:

Screenshot showing a T-SQL command to add multiple rows of data at once in SQL Server Management Studio and the corresponding error since this command format is not supported on this type of database

The exact error message for that scenario is:

Parse error at line: 2, column: 19: Incorrect syntax near ','.

This is another scenario where the query processor doesn’t even know how to approach this code, even though it would work just fine if I instead ran it on a normal SQL Server database.

The workaround for this limitation is to create an INSERT statement for every row of data you want to insert into your table, or use a pipeline in the Synapse workspace to load data instead.

INSERT INTO dbo.MyTable VALUES ('AR',5103);
INSERT INTO dbo.MyTable VALUES ('CA',6111);

I only needed to load a few test rows into a table on my DSP for testing purposes, so the multiple INSERT statements worked in a pinch. I would not want to do that if I were adding more than a few rows to the table.

Cannot use sp_rename

For this scenario, I’m not entirely sure what the reasoning is behind removing the ability to run this particular procedure (and likely others like it), but I have learned that in a Dedicated SQL Pool (DSP), you are not able to rename objects in the database using the system stored procedure sp_rename. Instead, you now either need to just not rename your object, drop it and recreate it, or use a new command called rename object, which has a different syntax.

When you try to run the sp_rename procedure on your DSP, you will get an error like this:

Screenshot showing T-SQL commands to rename tables using the sp_rename procedure in SQL Server Management Studio and the corresponding error since the command is not supported on this type of database

The full error message is:

"An insufficient number of arguments were supplied for the procedure or function sp_rename"

This is an odd message, because it leads you to believe you’ve typed something wrong instead of the fact that this procedure is not valid on this type of database. I’m not sure why the procedure is even installed on the database, given that it doesn’t work as it does on other databases.

When I first got that message, I wasted time going to the documentation and verifying that I did have the correct parameters for the procedure. The following message is displayed at the top of the documentation page, but that isn’t specific to any given T-SQL command that may not work, like the one being documented on the current page. Plus, the notice only warns for serverless SQL Pools and not dedicated SQL Pools, like what I am working with.

Screenshot of a note in a Microsoft document about the sp_rename procedure, saying that some T-SQL commands are not supported for Azure Synapse Analytics serverless SQL pools

Mostly, when it comes to this system procedure not working, I am annoyed that, from the documentation of the procedure, it seems like it should work in this scenario. That includes the existence of the green checkmark next to “Azure Synapse Analytics” at the top of the page!

Screenshot of the top of the Microsoft document for the sp_rename procedure showing the supported list of database types the page applies to, which includes Azure Synapse Analytics

How to Rename Tables on a Dedicated SQL Pool

If the main system stored procedure for renaming objects doesn’t work for DSPs, how then are we supposed to rename objects as needed? Or are we out of luck and have to drop and recreate the table, or leave its name alone? At first glance, it does seem like we may be out of luck, because even the “Rename” option in the right-click menu on a table is grayed out and won’t let us change it from there:

Screenshot of the right-click menu of a table in the Object Explorer of SQL Server Management Studio showing that the Rename option is disabled and grayed out.

There is a way to rename, it’s just a little harder to find! With this type of database, there is a new command rename object that will let you rename a table. The syntax for that looks like:

rename object MySchema.MyTable to NewTableName;

That command worked successfully for me in my scenario of renaming a table that I wanted to mark for future deletion, as shown by this screenshot:

Screenshot showing a T-SQL statement to rename a table using the "rename object" command in SQL Server Management Studio with no error messages since it is supported for this type of database

With further reading of the sp_rename procedure documentation, if you scroll all the way to the bottom part with Examples, there is another note that the sp_rename feature is currently only in “preview” for Azure Synapse Analytics and is therefore only available for objects in the dbo schema. And honestly, that’s a little silly to me. It’s also silly that you wouldn’t find out that information unless you looked for examples; the information was not up in the main section of the documentation, so it isn’t as easy to find.

Screenshot of the Microsoft document for the sp_rename procedure saying that the procedure is still in preview for Azure Synapse Analytics so is only supported for the dbo schema and no custom schemas

If you would like to read further on the new syntax for renaming tables in Dedicated SQL Pools, you can review the document for the rename object command.

Summary

SQL Pools in Azure Synapse Analytics give you a lot of great performance and features that are specifically tuned for use in Data Warehouse situations, but those performance improvements do come with a cost–not being able to do things exactly as you are used to with standard SQL Server databases. I’ve struggled quite a bit with database development on this style of database, since it seems like every time I want to do something on the database, I hit a temporary wall of differences that I didn’t expect. I have obviously found an explanation for or a workaround to most of these problems, as evidenced by the information above, but it’s not a welcome interruption to my workflow. I hope that you will hit fewer walls in your own database development now that you’ve read through this post.

Related Posts

Managing Deleted Key Vaults

Having delete protection on shared cloud resources is usually a very nice and beneficial feature to enable, since it protects you and your organization from the disaster of someone accidentally deleting a resource they didn’t intend to by keeping the resource available in the background for restore after deletion. My team has the feature enabled on our storage accounts and some other resources which I knew about, but I did not know that our key vaults also had the same feature enabled. Until I was trying to create a new key vault with the same name as a key vault I had already deleted and was getting an error saying a key vault with that name already existed.

In this post I will show how to find and manage delete key vaults and how to permanently delete them if you want to. You could use this process to find a key vault to recover it if it was accidentally deleted, or you can use it to do what I did and get rid of it permanently so you can recreate it.

What’s in this post

Finding Deleted Key Vaults

When running a Bicep template, which was creating a new version of a key vault I had deleted moments before, I got an error that the key vault couldn’t be created because one with the same name already existed. Confused, since I knew I had already deleted the resource, I went back out to the Azure portal and searched for the key vault the template error indicated, which was called “biceptest”. As you can see in the screenshot below, searching for that name returned no results.

Screenshot of the Azure Portal page for Key Vault resources showing that one named "biceptest" does not appear when searched for, since it has been deleted.

As I mentioned above, key vaults can be set to not permanently delete immediately, and instead stay alive in the background for a set amount of time so they can be restored if needed. To find any deleted key vaults that are still available for restore, you can click on the “Manage deleted vaults” on the top menu of the key vault list.

Screenshot of the Azure Portal page for Key Vault resources showing where to locate the "Manage deleted vaults" button

When you click that, a new pane will pop up that will let you filter and view deleted key vaults by Subscription. Choose your subscription from the dropdown menu, and you will then be given a list of deleted key vaults that are still available for restore.

Screenshot of the Azure Portal page for Key Vault resources showing the "Managed deleted vaults" pane which lists recently deleted vaults that have not yet been permanently purged

Notice in the above screenshot that the deleted vaults list shows the date it was deleted and then the date it is set to be permanently removed from Azure. In my case, I had 90 days to recover a deleted vault.

Recover a Deleted Key Vault

To recover a deleted key vault, you need to check the box next to it in the pane showing a list of deleted vaults for a subscription, then click the “Recover” button at the bottom of the screen:

Screenshot of the Azure Portal page for Key Vault resources showing the "Managed deleted vaults" pane where you can click the "Recover" button to undelete the resource.

Permanently Delete a Deleted Key Vault

If you would like to permanently get rid of a deleted key vault, perhaps to create a new vault with the same name without getting an error, you will need to click the “Purge” button at the bottom of the screen after checking the box next to the vault you want to permanently delete.

Screenshot of the Azure Portal page for Key Vault resources showing the "Managed deleted vaults" pane where you can click the "Purge" button to permanently delete the resource

Note: If the key vault has been setup with “purge protection enabled”, you will not be able to purge/permanently delete the vault. In that case, the vault will only be permanently deleted once the preset number of days has been reached.

Summary

Choosing to delete a key vault through the Azure portal does not guarantee that the vault has been completely deleted from your system. If the vault was setup to have delete protection enabled, you may be able to recover the deleted vault for a set amount of time after it was deleted. If you want to permanently delete a vault that had delete protection enabled, you will need to go into “Manage Deleted Vaults”, choose the vault you want to completely remove, then click the option to “Purge”. Once you have done that, the key vault will be 100% gone and you will be able to create a new one with the same name if you choose to do so.

Related Posts

Return the True URL for a Document in SharePoint Online Indexer for Azure Search

I am going to keep today’s post short and sweet, covering a quick change I needed to make to my SharePoint Online Indexer (still in preview but we’re using it for our custom chat bots) to make the index of a SharePoint library return the true URL of the source document so that we can feed that back to users so they can validate the chat bot answers.

It took me longer than I would like to admit to figure out how to do this, even though the metadata item is listed in the one and only Microsoft document for this tool, because I was wanting to return the URL and the documentation only mentioned URI and didn’t explain what they meant by that and gave no examples of it being used.

This index is specifically used for custom chat bots created through Azure AI Foundry, and not for those created with other AI or cognitive services within Azure. I saw a lot of documentation and forum posts about those versions of indexers, but didn’t see anything covering this topic specifically which is why I wanted to write this post.

What’s in this post

Adding SharePoint document URL to index results

For creating my Azure Search data sources, indexes, and indexers, I have used Postman to run the API calls needed to hit the SharePoint Online (SPO) indexer service, since that is the only way to create this type of indexer (can’t use the Azure portal wizard).

It is very simple to return the document URL in your index, you only need to add this line to your index, and then rename it/map it in the indexer definition if you want. I didn’t not want to rename it, so I only changed the index definition.

{ "name": "metadata_spo_item_weburi", "type": "Edm.String", "key": false, "searchable": false, "filterable": false, "sortable": false, "facetable": false },

Once you add that to your index definition, make sure to send the API request through again, then reset and run the indexer related to the index. At that point, you should be able to query your index through the console and see that URL included in the results of the index.

Screenshot of an Azure AI Search Index resource test query demonstrating what the full URL of a SharePoint document looks like in the index

Summary

If you are using the SharePoint Online Indexer for Azure AI Search (with Azure AI Foundry) and you would like to return the full URL of a source document for a chat bot response, you can do so by adding the “metadata_spo_item_weburi” metadata field to your index definition.

Related Posts

Troubleshoot ALL Layers of Your Data Flows

This is going to be a short post, which I am writing as a reminder for my future self as well as any of you reading out there.

If you are troubleshooting an error with your pipeline, especially if you work in a database system that has layered views, make sure you fully dig down through all layers of your data setup before you make a ticket with Microsoft. I learned this the annoying way over a work week while working through a ticket with Microsoft for a very strange issue I was seeing in a pipeline that uses a Synapse Serverless SQL Pool. We had checked so many things that week with no luck in changing the outcome of how the pipeline runs, and then the pipeline just went back to working when I ran it outside of its schedule.

What’s in this post

The Error

The error I was seeing made it look like the Serverless SQL Pool views, which use an OPENROWSET call to a parquet file, were referencing the wrong source file even though I confirmed multiple times that the view definitions were correct. For example, the view was written to use parquet files under TestDatabase/MyTable/** as the source files, but the error was making it seem like they were instead pulling data from TestDatabase/OtherRandomTable/** which was confusing to say the least. I thought that the Serverless node was broken or had a bug that was making the views look at “OtherRandomTable” files instead of the correct files.

The Cause

The error happened because multiple views used a CROSS APPLY to another view tied to a parquet file in a data lake, and that parquet file was being deleted and recreated by a parallel pipeline. When the failing pipeline tried to reference its views, it couldn’t find that base view because the source file had not yet been recreated by the parallel pipeline. Makes sense and is so obvious in hindsight, but it took Microsoft support directly asking me to make me realize that I had a view referencing another view, which I needed to check the definition of.

The change I needed to make was to update the pipeline triggers so that the process deleting and recreating the base view would be done making the parquet files when the second pipeline ran and tried to use those files.

If I had done my due diligence and dug through every layer of the data environment, which I am normally good at with other scenarios, I would have quickly and easily discovered the issue myself. But sometimes we need to learn the hard way because our brains aren’t running at full capacity. (It also helps that I finally had dedicated time to set aside to this problem and so wasn’t trying to multitask multiple work items at once.)

Summary

If you are troubleshooting ETL failures of any kind, make sure you dig down through all layers of the process to ensure you have checked everything possible related to your failure before reaching out to support. They’ll happily help you find what you missed, but it will save everyone time if you can figure it out yourself first.

Related Posts

Azure SQL Database – Removing Replicas Doesn’t Delete the Replica DB

A couple months ago, I was asked to add geo-replication for all our Azure SQL Databases to align with our recovery strategy in case of disaster. A few weeks ago, when upper management finally realized the full cost of that replication for all our databases, they requested that we remove replication from anything that isn’t business critical and doesn’t need to be recovered immediately in case of a disaster to reduce the shocking cost of replication.

I mistakenly didn’t do research before doing what I thought was fully removing the replicas I had previously created, which was removing the replica from the primary databases. I only recently realized that those replica databases were still alive and well and charging us money that we thought we were already saving while I was reviewing resources for another task . Keep reading to learn how to do better and fully get rid of the replicas you no longer need.

What’s in this post

What is a replica for an Azure SQL Database?

A replica for an Azure SQL Database is a way to make a secondary copy of your database on a separate logical SQL Server in a different region that you can keep available to failover to in case of a full region outage in Azure. Although this scenario is rare, it has happened in the past, and most companies do not want to be caught without their vital resources for hours while Microsoft troubleshoots their outage. In such a case, having a geo-replica means that you can immediately failover to an exact copy of your database in a different region and keep your business running.

How to Remove Replica Link from Primary

Getting rid of a replica of an Azure SQL Database is a two step process. The first step is to remove the replication link between the primary and secondary databases, which I will cover here, and the second step is to delete the database itself, which I will cover in the section below.

Removing the replication link between primary and secondary is as simple as the click of a button. Navigate to the primary database for which you want to remove the replica, and go to the “Replicas” page under “Data Management” in the menu.

On that page, you will see the primary database listed first, then in the section below that, any and all replica databases.

To remove the replica, you will click on the ellipses menu on the right side of the replica database, then choose “Stop Replication”.

At first I was confused as to why this said that it was going to stop replication because I was assuming that I would be able to delete the replication and delete the replica in one step. But now I better understand that this is a two step process.

After you have chosen to “Stop Replication”, you will get a prompt to have you confirm that you want to remove the replica. It also clearly points out what happens when you choose to do this, but I just didn’t understand what it meant. “This will remove server/MySecondaryDatabase” from replication relationship immediately and make it a stand-alone database.” When I read that, I thought it meant that removing the replication would be reverting the primary database to a standalone database, but now I know that it means what it says: the secondary database will become a standalone database that you will later have to deal with.

Click “Yes” to remove the replication relationship.

You will get a notification that replication is being removed.

After a few minutes, you will be able to refresh the page and see that no replica link exists for the primary database anymore.

However, if you search for the name of the database that you previously had a replica for, you will see that the replica still exists, it’s just no longer linked to the primary through a replication process.

Fully Removing Replica (so that it’s like it never happened)

To get rid of the replica you no longer want so you can stop being charged for it, you will need to navigate to that former-replica database in the portal and then delete it like you would any other database. Before deleting, ensure that this is the database that you really want to get rid of since the deletion cannot be undone.

Once you have deleted the Azure SQL Database resource for the replica, you are finally done with removing your replica.

Summary

If you want to remove a geo-replica database from an Azure SQL Database to save money (or for any other reason), you will need to complete the two step process to do so. First, remove the replication relationship between the primary and the secondary through the “Replicas” page under the primary resource. Once that is complete, navigate to the former-secondary database in the portal and delete the resource. Removing the replica relationship alone won’t delete the database, and you will keep getting charged for that secondary database until you fully delete it.

Related Posts

Synapse Serverless SQL Pool View Seemingly Caching Data

This morning I ran into the strangest data issue I’ve seen with Azure Synapse Serverless SQL Pools, which is that a view we have seemed to be caching data, and that data was very weird, wrong for the column data type, and causing issues with an application query. While I still have no idea how or why this happened since the serverless SQL pools aren’t supposed to cache data like the standard SQL pools, but I thought I would share what I saw to maybe help others (and to see if anyone else knew why this was happening).

I was able to correct the issue to get the app query working again, I just won’t ever know why (unless you know why, in which case you should share in the comments below!).

What’s in this post

Background

I first learned of this issue through an application developer on another team emailing me saying that they were running the same query on our serverless and dedicate SQL pools on Synapse, and the query was working as it normally does on the dedicated pool but was giving the following error on the serverless SQL pool:

Msg 241, Level 16, State 1, Line 3

Conversion failed when converting date and/or time from character string.

The query they were trying to run is one that hasn’t had any issues in the past.

In our setup, we load the same data into both our serverless and dedicated SQL pools for different use cases, and both are loaded from the same set of delta parquet files in a Data Lake on an Azure Storage Account. Which means that there should be no reason that the query running on one should run and the other not.

The first step of my troubleshooting process was to replicate the error that the developer sent me, and I was able to do that. I then reviewed the very long query for anything that was doing a CAST or CONVERT on a date field, which helped me narrow down the problem to a single view on our serverless pool.

The Weird Data

After I identified the one view that seemed to have the bad data on a date column, I ran a DISTINCT query on that one column and received the most confusing results.

Screenshot of SQL query in SSMS demonstrating the bad data contained in the date column of my serverless SQL pool view.

None of those values look like dates to me. Which made me question myself and the data type for that column “order_entry_dt”, since a data column couldn’t possibly hold string values like this. But no, the column data type does say it’s a “datetime2”.

Screenshot of the problematic view column and its data type

At that point, I started downloading and reviewing individual parquet files from the Data Lake to see if any of them had that weird data, and none did.

The next step I took was to look at the view definition to see what it was doing again, to make sure I was looking at the correct source data, and that all looked good. I decided that I would run my distinct query on the view itself, but then also run the exact same query using the SELECT statement that defines the view, to see what both returned. And that made things even more weird, because it confirmed that the source data should be correct, but the view is somehow wrong.

Screenshot of two SQL queries in SS<S showing that selecting from my view and running the select contained in the view definition return different results

If the view definition uses that exact same “OPENROWSET” query as it’s definition, why does the view return different data than the “OPENROWSET” query??

Correcting the Problem

Even if I couldn’t figure out why or how the problem data happened, I still needed to figure out how to resolve the issue so that the app query would start working again. What I decided to do was drop and recreate the view, which is easy given how simple the view definition is.

Once I dropped and recreated the view, I ran the exact same queries as above again, yet this time the view started to return data that I expected.

Screenshot of two SQL queries in SSMS showing correct data being returned from selecting from my view and run the query that is the view definition

So why was the view data weird?

While writing this post, I decided that I couldn’t let this go without knowing why the data got weird, so I decided to test out some theories for myself. (Spoiler: I couldn’t figure it out.)

My first theory was that someone ran an UPDATE query on the view accidentally to change the data. But that is not allowed in serverless SQL Pools, which I even tested out myself. By making a copy of the problem view with a different name and then trying to run an update statement, I was able to prove to myself that this theory was not how the bad data was created.

Screenshot of a SQL query in SSMS where I attempted to update data in a column in the view, which pulls data from a parquet file so thus cannot be updated

Then I thought maybe someone wrote a query to change the underlying parquet files directly, but that doesn’t seem to be possible either.

The only thing I can now think of is that someone accidentally overwrote the source parquet file, or made a second that was also used as source data, which contained the bad data. Parquet files are immutable, so there should have been no way for someone to update a parquet source file with bad data to get bad data into our serverless pool view.

I have learned many things today, but one of those things is not why this issue happened, unfortunately.

But like I said above, if you have ideas, I would love to hear them in the comments below!

Summary

If you run into this same strange issue where a Synapse Serverless SQL Pool view has weird data that you don’t expect, and that data doesn’t match the query that is used as the view data source, drop and recreate the view to fix the problem.

Related Posts

Azure Synapse Hive Connector V2.0 Issues

If you use a Hive Linked Service in your Azure Synapse workspace, you have probably been getting a banner at the top of your workspace every time you sign in telling you that you have linked services that are about to be deprecated so you should update them. If, like me, you go ahead and do that upgrade from version 1.0 to version 2.0, which is touted as being as simple as toggling a switch, you may have been disappointed to find that you now have an error with your connector that prevents your pipelines from running. If this is relatable to you, keep reading because I will tell you how you can resolve the issue.

Fortunately or unfortunately, depending on how you look at it, I was a preview tester for Microsoft for the fix to the Hive V2.0 connector issue because the problem had already been reported and had a fix in progress when I created a ticket with Microsoft support. The solution has now been released and is available for you to implement if you need it.

What’s in this Post

The Problem with the Hive 2.0 Connector

For a couple months, my team was being annoyed with a banner every time we opened one of our workspaces which utilizes a Hive linked service, telling us that we have a linked service version that is going to be deprecated soon. In August, I finally decided I had enough of the banner and decided to do the requested upgrade to our Hive Linked Service, to upgrade to version 2.0. Making the change was simple enough, changing the radio button we had selected in the settings of the linked service (LS), and I didn’t even have to fix any settings of the LS after making the switch like I had to do with an upgrade to our Oracle connectors. All previously saved settings stayed where they were, so I saved the change and merged it.

A couple days later, on the next run of our pipeline that uses that linked service, we got the following error on the “Copy Data” activity which pulls data from the Hive data source into our data lake.

"ErrorCode=FailedToConnectMSDISource,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to connect to the connector. Error code: 'AuthenticationError', message: 'One or more errors occurred., ErrorCode: AuthenticationError',Source=Microsoft.DataTransfer.Connectors.MSDIBridgeAdapter,''Type=Microsoft.DI.Driver.Hive.Exceptions.HiveConnectorException,Message=One or more errors occurred., ErrorCode: AuthenticationError,Source=Microsoft.DI.Driver.Hive,''Type=Apache.Arrow.Adbc.Drivers.Apache.Hive2.HiveServer2Exception,Message=An unexpected error occurred while opening the session. 'Required field 'client_protocol' was not present! Struct: TOpenSessionReq(client_protocol:null)',Source=Apache.Arrow.Adbc.Drivers.Apache,''Type=Thrift.TApplicationException,Message=Required field 'client_protocol' was not present! Struct: TOpenSessionReq(client_protocol:null),Source=Apache.Arrow.Adbc.Drivers.Apache,'",

The important part of that error message is An unexpected error occurred while opening the session. 'Required field 'client_protocol' was not present!.

As any developer then does, I started googling that error but didn’t find any pages with that same error for Azure Synapse related Hive issues. But I did find several similar errors with different uses of Hive that were caused by version issues between the client and the server, essentially the client got upgraded beyond what the server version was, which caused a similar error for others.

For my pipeline, I do not control the Hive data source that we were failing to pull from because that is managed by one of my company’s customers. I tried to get information about their Hive version to see if it was something they could fix, but I quickly hit a dead end and they told me to make a support ticket with Microsoft. Which I did.

Thankfully Microsoft was immediately helpful and I got to work with a very kind and knowledgeable support woman, who immediately tracked down and found that the issue I was having was one that the engineering team was already working on resolving. Great! There was a fix already in the works! And a couple days after being told I just needed to wait for the fix to be sent out, I received a message from the support agent asking if I would test out their proposed fix to the problem, which was an updated version of the Self-Hosted Integration Runtime (SHIR) that allowed us to connect our systems to our customer’s.

How to Resolve the Problem

The Self-Hosted Integration Runtime (SHIR) that Microsoft wanted me to test was easy to install on our server where we normally run our SHIR for the workspace I was upgrading. They provided a download link to me, so I hit that then easily install the SHIR like you normally would and it upgraded in place. The version of the SHIR I tested for them was “IntegrationRuntime_5.58.9364.4”.

Once I installed that on our server, I updated the Hive connector again (since I was forced to downgrade while the solution was being developed so I could continue getting our data) and this time, the test connection worked. I then committed the change and ran my pipeline, and it successfully completed the “Copy Data” activity that had been previously failing with the 2.0 version of the connector.

The only problem we have now is that the 2.0 version of the connector seems to have changed how data types are handled for Parquet files in Data Lakes, so now I need to remap data types between our Data Lake and the final data warehouse to get the pipeline completely working again. I had to remove the test SHIR from our production system at that point anyway since they told me we couldn’t use it in production yet, so when I do the official upgrade to the 2.0 Hive connector, I will also need to make that data mapping fix.

As of writing this post, the SHIR that I believe contains the fix is live on the site where you normally download that software, except the version is slightly higher. What’s available on their download site is “IntegrationRuntime_5.58.9377.1”. If you are having the same issue I am with this connector upgrade, give this latest SHIR a try to see if that resolves your issue like the test version did for mine. I will update this post once I have done my final upgrade and have confirmed that the SHIR version available online fixes my problem for good.

Screenshot of the Microsoft webpage listing the available Integration Runtime downloads

Summary

If you started to experience connection failures on your Hive linked service in Synapse after changing the version of the linked service from V1.0 to V2.0, update your Integration Runtime to version 5.58 or later to get the problem fixed. When updating the linked service, there is no mention of needing to upgrade the SHIR unless it is really out of date (when I first tried upgrading the connector I couldn’t because we were on SHIR 5.52), so you will need to be aware of this mismatch yourself.

Let me know down in the comments if you also ran into this issue with the upgrade!

Related Posts

TechCon365 Day 2 Recap

Today at TechCon365 in Atlanta, I did another full day session of the pre-con, this time covering everything in the realm of Microsoft Fabric. Before today, I knew hardly anything about Fabric, aside from it being something that Microsoft was pushing our team to use for a new project I wasn’t involved with. The workshop, which didn’t have any hands-on work but was still good, was led by John White and Jason Himmelstein. The two speakers did an excellent job at teaching all aspects of Fabric without being boring. This was the introduction to Fabric as a concept that I’ve been needing for months.

Note: The content below is not guaranteed to be accurate, it’s just what I took away from the workshop. In the coming months, I will be learning more about these tools myself to fact check everything and see if the new tools will work for my team.

What’s in this post

Goodbye Synapse

For me, the biggest takeaway of things I learned during today’s workshop is that Microsoft intends to get their customers migrated off Synapse as the data movement platform and into Fabric and its OneLake. As of now, Synapse as a product no longer has a product manager (according to the presenters), which means that Microsoft does not intend to make any feature updates to that tool. There is still a maintenance team for the product to fix any issues that may arise and to keep it running, but there are no long term plans for it anymore.

Synapse going away is concerning to me because my company only recently migrated to Synapse off of SSIS. My fear is that if we spend a ton of time and energy converting our ETL systems into Fabric, that as soon as we get the work done, we’ll need to start anew with whatever product line Microsoft releases next, just like what we just did with SSIS to Synapse. I understand that cycles like this are everyday occurrences in the technology sector, but I also know that my boss and the executives at my company are likely not going to be happy when I tell them what I learned in today’s workshop.

If you’re in the same situation my team is in, you are totally in Synapse and maybe just wrapped up getting there so are wary to move on to the next transition to Fabric, don’t worry too much. The presenters did assure us that Synapse will be supported for a while yet, no concrete plans have been announced to retire it. Based on that, John and Jason recommended that teams stop doing any new development using Synapse and instead start to do new development in Fabric to work as real-world test cases of how useful it can be. I haven’t yet done anything like that, so the usefulness is still unknown to me personally.

Hello Fabric

Fabric is being sold as the one stop shop for all things data in the Microsoft universe, the future of all data analytics and processing. No longer is Fabric going to be Power BI 2.0; now it will serve as the location for anything you need to do with data. While the sales pitches are great, I am wary of the transition, like I said above already. I dislike when companies claim that their new tool is going to solve all your problems, because nothing is ever that great in reality. But here is what I learned today.

OneLake: Put every speck of data here

With each tenant, Microsoft gives you a single OneLake, because it should be your one data lake for your organization, just like there is only a single OneDrive for each user. No more having dozens of storage accounts to act as separate data lakes for different processes. Now you get the one and you will like it. More details I learned:

  • Parquet files: the reigning champions
  • A “Shortcut” to the source data
    • If you want to get data from a source system, like your Oracle finance database for PeopleSoft finance, you can add a “Shortcut” to the data source without having to pull the data into OneLake with a normal ETL process.
    • This concept is the most intriguing to me, and I really want to see how it plays in real life with real data and systems. Could be too good to be true.
  • Security redefined
    • They’ve developed a new way to manage security for the data that is supposed to reduce overhead. They want to have you manage security in only one place, just like you’ll be managing all your data in one place with OneLake.
    • This feature is still in preview, so remains to be seen if it works as well as Microsoft claims
  • File Explorer: An old but good way to explore your data
    • A new plugin for Windows has been created that will allow you to view your OneLake data through the Windows File Explorer, like they’re normal files. Which they are, if they’re Delta-Parquet flat files. The new feature is OneLake File Explorer.
    • You can also view your OneLake data files through Azure Storage Explorer, which is a good second option as well since my team already uses that for other purposes.

For the OneLake concept, I like that it is trying to prioritize reusing what you already have–tools, data, reporting frameworks–so that you don’t have to keep reinventing the wheel for each new data flow process you need to create. The concepts are good, so I’m eager to use them for myself after the conference.

Lakehouse

This is a new way to store data that is offered exclusively by OneLake in the Microsoft ecosystem. It combines the concepts of data lakes and data warehouses by using Delta Parquet format files. It gives you the ability for ACID compliant transactions on the data files while also still having compressed and cheap data storage like a data lake.

Data Factory

This is the part of Fabric that is replacing Synapse pipelines. In the demos during the workshop, I could see that the Dataflow Gen 2s and the data pipelines look almost exactly like what is currently available in Synapse, which is promising, because then at least I don’t need to learn a new GUI. Like with Synapse, the Dataflows are still backed by Spark engine to allow for parallel processing and high throughput. Both tools allow you to move data from point A to point B (the OneLake) for the scenarios where Shortcuts won’t work and you still need to literally move data between systems.

Fabric Native Databases

If the other new data storage types in OneLake and Fabric weren’t enough for you, there is now also going to be Fabric-native databases available for use. These will be SQL Databases (and some other database types) that you create directly in Fabric instead of creating them independently then joining them to the OneLake with a Shortcut. This change to Fabric is what they are intending to use to take Fabric from a data analytics platform to a full data platform. However, the presenters of today’s workshop did caution that they don’t recommend putting your production systems into these native databases yet, since they’re still new and not well-tested in the real world. But maybe in the near future, we truly will be doing 100% of our data integration engineering in Fabric.

Summary

With this technology, it feels like we could be at the cusp of a new big wave of change for how we operate in the data realm, so I am really interested to see how things play out in the next year or two. Using Fabric for all things data could be the next big thing like Microsoft is trying to sell, or maybe it will simply fade into being just another option in the existing sea of options for data manipulation and analytics in the Microsoft ecosystem.

Do you think Fabric is going to be the the next big thing in data technology like Microsoft claims?

Related Posts