Month: September 2025 (page 1 of 1)

Synapse Serverless SQL Pool View Seemingly Caching Data

This morning I ran into the strangest data issue I’ve seen with Azure Synapse Serverless SQL Pools, which is that a view we have seemed to be caching data, and that data was very weird, wrong for the column data type, and causing issues with an application query. While I still have no idea how or why this happened since the serverless SQL pools aren’t supposed to cache data like the standard SQL pools, but I thought I would share what I saw to maybe help others (and to see if anyone else knew why this was happening).

I was able to correct the issue to get the app query working again, I just won’t ever know why (unless you know why, in which case you should share in the comments below!).

What’s in this post

Background

I first learned of this issue through an application developer on another team emailing me saying that they were running the same query on our serverless and dedicate SQL pools on Synapse, and the query was working as it normally does on the dedicated pool but was giving the following error on the serverless SQL pool:

Msg 241, Level 16, State 1, Line 3

Conversion failed when converting date and/or time from character string.

The query they were trying to run is one that hasn’t had any issues in the past.

In our setup, we load the same data into both our serverless and dedicated SQL pools for different use cases, and both are loaded from the same set of delta parquet files in a Data Lake on an Azure Storage Account. Which means that there should be no reason that the query running on one should run and the other not.

The first step of my troubleshooting process was to replicate the error that the developer sent me, and I was able to do that. I then reviewed the very long query for anything that was doing a CAST or CONVERT on a date field, which helped me narrow down the problem to a single view on our serverless pool.

The Weird Data

After I identified the one view that seemed to have the bad data on a date column, I ran a DISTINCT query on that one column and received the most confusing results.

Screenshot of SQL query in SSMS demonstrating the bad data contained in the date column of my serverless SQL pool view.

None of those values look like dates to me. Which made me question myself and the data type for that column “order_entry_dt”, since a data column couldn’t possibly hold string values like this. But no, the column data type does say it’s a “datetime2”.

Screenshot of the problematic view column and its data type

At that point, I started downloading and reviewing individual parquet files from the Data Lake to see if any of them had that weird data, and none did.

The next step I took was to look at the view definition to see what it was doing again, to make sure I was looking at the correct source data, and that all looked good. I decided that I would run my distinct query on the view itself, but then also run the exact same query using the SELECT statement that defines the view, to see what both returned. And that made things even more weird, because it confirmed that the source data should be correct, but the view is somehow wrong.

Screenshot of two SQL queries in SS<S showing that selecting from my view and running the select contained in the view definition return different results

If the view definition uses that exact same “OPENROWSET” query as it’s definition, why does the view return different data than the “OPENROWSET” query??

Correcting the Problem

Even if I couldn’t figure out why or how the problem data happened, I still needed to figure out how to resolve the issue so that the app query would start working again. What I decided to do was drop and recreate the view, which is easy given how simple the view definition is.

Once I dropped and recreated the view, I ran the exact same queries as above again, yet this time the view started to return data that I expected.

Screenshot of two SQL queries in SSMS showing correct data being returned from selecting from my view and run the query that is the view definition

So why was the view data weird?

While writing this post, I decided that I couldn’t let this go without knowing why the data got weird, so I decided to test out some theories for myself. (Spoiler: I couldn’t figure it out.)

My first theory was that someone ran an UPDATE query on the view accidentally to change the data. But that is not allowed in serverless SQL Pools, which I even tested out myself. By making a copy of the problem view with a different name and then trying to run an update statement, I was able to prove to myself that this theory was not how the bad data was created.

Screenshot of a SQL query in SSMS where I attempted to update data in a column in the view, which pulls data from a parquet file so thus cannot be updated

Then I thought maybe someone wrote a query to change the underlying parquet files directly, but that doesn’t seem to be possible either.

The only thing I can now think of is that someone accidentally overwrote the source parquet file, or made a second that was also used as source data, which contained the bad data. Parquet files are immutable, so there should have been no way for someone to update a parquet source file with bad data to get bad data into our serverless pool view.

I have learned many things today, but one of those things is not why this issue happened, unfortunately.

But like I said above, if you have ideas, I would love to hear them in the comments below!

Summary

If you run into this same strange issue where a Synapse Serverless SQL Pool view has weird data that you don’t expect, and that data doesn’t match the query that is used as the view data source, drop and recreate the view to fix the problem.

Related Posts

Azure Synapse Hive Connector V2.0 Issues

If you use a Hive Linked Service in your Azure Synapse workspace, you have probably been getting a banner at the top of your workspace every time you sign in telling you that you have linked services that are about to be deprecated so you should update them. If, like me, you go ahead and do that upgrade from version 1.0 to version 2.0, which is touted as being as simple as toggling a switch, you may have been disappointed to find that you now have an error with your connector that prevents your pipelines from running. If this is relatable to you, keep reading because I will tell you how you can resolve the issue.

Fortunately or unfortunately, depending on how you look at it, I was a preview tester for Microsoft for the fix to the Hive V2.0 connector issue because the problem had already been reported and had a fix in progress when I created a ticket with Microsoft support. The solution has now been released and is available for you to implement if you need it.

What’s in this Post

The Problem with the Hive 2.0 Connector

For a couple months, my team was being annoyed with a banner every time we opened one of our workspaces which utilizes a Hive linked service, telling us that we have a linked service version that is going to be deprecated soon. In August, I finally decided I had enough of the banner and decided to do the requested upgrade to our Hive Linked Service, to upgrade to version 2.0. Making the change was simple enough, changing the radio button we had selected in the settings of the linked service (LS), and I didn’t even have to fix any settings of the LS after making the switch like I had to do with an upgrade to our Oracle connectors. All previously saved settings stayed where they were, so I saved the change and merged it.

A couple days later, on the next run of our pipeline that uses that linked service, we got the following error on the “Copy Data” activity which pulls data from the Hive data source into our data lake.

"ErrorCode=FailedToConnectMSDISource,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to connect to the connector. Error code: 'AuthenticationError', message: 'One or more errors occurred., ErrorCode: AuthenticationError',Source=Microsoft.DataTransfer.Connectors.MSDIBridgeAdapter,''Type=Microsoft.DI.Driver.Hive.Exceptions.HiveConnectorException,Message=One or more errors occurred., ErrorCode: AuthenticationError,Source=Microsoft.DI.Driver.Hive,''Type=Apache.Arrow.Adbc.Drivers.Apache.Hive2.HiveServer2Exception,Message=An unexpected error occurred while opening the session. 'Required field 'client_protocol' was not present! Struct: TOpenSessionReq(client_protocol:null)',Source=Apache.Arrow.Adbc.Drivers.Apache,''Type=Thrift.TApplicationException,Message=Required field 'client_protocol' was not present! Struct: TOpenSessionReq(client_protocol:null),Source=Apache.Arrow.Adbc.Drivers.Apache,'",

The important part of that error message is An unexpected error occurred while opening the session. 'Required field 'client_protocol' was not present!.

As any developer then does, I started googling that error but didn’t find any pages with that same error for Azure Synapse related Hive issues. But I did find several similar errors with different uses of Hive that were caused by version issues between the client and the server, essentially the client got upgraded beyond what the server version was, which caused a similar error for others.

For my pipeline, I do not control the Hive data source that we were failing to pull from because that is managed by one of my company’s customers. I tried to get information about their Hive version to see if it was something they could fix, but I quickly hit a dead end and they told me to make a support ticket with Microsoft. Which I did.

Thankfully Microsoft was immediately helpful and I got to work with a very kind and knowledgeable support woman, who immediately tracked down and found that the issue I was having was one that the engineering team was already working on resolving. Great! There was a fix already in the works! And a couple days after being told I just needed to wait for the fix to be sent out, I received a message from the support agent asking if I would test out their proposed fix to the problem, which was an updated version of the Self-Hosted Integration Runtime (SHIR) that allowed us to connect our systems to our customer’s.

How to Resolve the Problem

The Self-Hosted Integration Runtime (SHIR) that Microsoft wanted me to test was easy to install on our server where we normally run our SHIR for the workspace I was upgrading. They provided a download link to me, so I hit that then easily install the SHIR like you normally would and it upgraded in place. The version of the SHIR I tested for them was “IntegrationRuntime_5.58.9364.4”.

Once I installed that on our server, I updated the Hive connector again (since I was forced to downgrade while the solution was being developed so I could continue getting our data) and this time, the test connection worked. I then committed the change and ran my pipeline, and it successfully completed the “Copy Data” activity that had been previously failing with the 2.0 version of the connector.

The only problem we have now is that the 2.0 version of the connector seems to have changed how data types are handled for Parquet files in Data Lakes, so now I need to remap data types between our Data Lake and the final data warehouse to get the pipeline completely working again. I had to remove the test SHIR from our production system at that point anyway since they told me we couldn’t use it in production yet, so when I do the official upgrade to the 2.0 Hive connector, I will also need to make that data mapping fix.

As of writing this post, the SHIR that I believe contains the fix is live on the site where you normally download that software, except the version is slightly higher. What’s available on their download site is “IntegrationRuntime_5.58.9377.1”. If you are having the same issue I am with this connector upgrade, give this latest SHIR a try to see if that resolves your issue like the test version did for mine. I will update this post once I have done my final upgrade and have confirmed that the SHIR version available online fixes my problem for good.

Screenshot of the Microsoft webpage listing the available Integration Runtime downloads

Summary

If you started to experience connection failures on your Hive linked service in Synapse after changing the version of the linked service from V1.0 to V2.0, update your Integration Runtime to version 5.58 or later to get the problem fixed. When updating the linked service, there is no mention of needing to upgrade the SHIR unless it is really out of date (when I first tried upgrading the connector I couldn’t because we were on SHIR 5.52), so you will need to be aware of this mismatch yourself.

Let me know down in the comments if you also ran into this issue with the upgrade!

Related Posts