This morning I ran into the strangest data issue I’ve seen with Azure Synapse Serverless SQL Pools, which is that a view we have seemed to be caching data, and that data was very weird, wrong for the column data type, and causing issues with an application query. While I still have no idea how or why this happened since the serverless SQL pools aren’t supposed to cache data like the standard SQL pools, but I thought I would share what I saw to maybe help others (and to see if anyone else knew why this was happening).
I was able to correct the issue to get the app query working again, I just won’t ever know why (unless you know why, in which case you should share in the comments below!).
What’s in this post
Background
I first learned of this issue through an application developer on another team emailing me saying that they were running the same query on our serverless and dedicate SQL pools on Synapse, and the query was working as it normally does on the dedicated pool but was giving the following error on the serverless SQL pool:
Msg 241, Level 16, State 1, Line 3
Conversion failed when converting date and/or time from character string.
The query they were trying to run is one that hasn’t had any issues in the past.
In our setup, we load the same data into both our serverless and dedicated SQL pools for different use cases, and both are loaded from the same set of delta parquet files in a Data Lake on an Azure Storage Account. Which means that there should be no reason that the query running on one should run and the other not.
The first step of my troubleshooting process was to replicate the error that the developer sent me, and I was able to do that. I then reviewed the very long query for anything that was doing a CAST or CONVERT on a date field, which helped me narrow down the problem to a single view on our serverless pool.
The Weird Data
After I identified the one view that seemed to have the bad data on a date column, I ran a DISTINCT query on that one column and received the most confusing results.

None of those values look like dates to me. Which made me question myself and the data type for that column “order_entry_dt”, since a data column couldn’t possibly hold string values like this. But no, the column data type does say it’s a “datetime2”.

At that point, I started downloading and reviewing individual parquet files from the Data Lake to see if any of them had that weird data, and none did.
The next step I took was to look at the view definition to see what it was doing again, to make sure I was looking at the correct source data, and that all looked good. I decided that I would run my distinct query on the view itself, but then also run the exact same query using the SELECT statement that defines the view, to see what both returned. And that made things even more weird, because it confirmed that the source data should be correct, but the view is somehow wrong.

If the view definition uses that exact same “OPENROWSET” query as it’s definition, why does the view return different data than the “OPENROWSET” query??
Correcting the Problem
Even if I couldn’t figure out why or how the problem data happened, I still needed to figure out how to resolve the issue so that the app query would start working again. What I decided to do was drop and recreate the view, which is easy given how simple the view definition is.
Once I dropped and recreated the view, I ran the exact same queries as above again, yet this time the view started to return data that I expected.

So why was the view data weird?
While writing this post, I decided that I couldn’t let this go without knowing why the data got weird, so I decided to test out some theories for myself. (Spoiler: I couldn’t figure it out.)
My first theory was that someone ran an UPDATE query on the view accidentally to change the data. But that is not allowed in serverless SQL Pools, which I even tested out myself. By making a copy of the problem view with a different name and then trying to run an update statement, I was able to prove to myself that this theory was not how the bad data was created.

Then I thought maybe someone wrote a query to change the underlying parquet files directly, but that doesn’t seem to be possible either.
The only thing I can now think of is that someone accidentally overwrote the source parquet file, or made a second that was also used as source data, which contained the bad data. Parquet files are immutable, so there should have been no way for someone to update a parquet source file with bad data to get bad data into our serverless pool view.
I have learned many things today, but one of those things is not why this issue happened, unfortunately.
But like I said above, if you have ideas, I would love to hear them in the comments below!
Summary
If you run into this same strange issue where a Synapse Serverless SQL Pool view has weird data that you don’t expect, and that data doesn’t match the query that is used as the view data source, drop and recreate the view to fix the problem.
