Month: October 2025 (page 1 of 1)

Troubleshoot ALL Layers of Your Data Flows

This is going to be a short post, which I am writing as a reminder for my future self as well as any of you reading out there.

If you are troubleshooting an error with your pipeline, especially if you work in a database system that has layered views, make sure you fully dig down through all layers of your data setup before you make a ticket with Microsoft. I learned this the annoying way over a work week while working through a ticket with Microsoft for a very strange issue I was seeing in a pipeline that uses a Synapse Serverless SQL Pool. We had checked so many things that week with no luck in changing the outcome of how the pipeline runs, and then the pipeline just went back to working when I ran it outside of its schedule.

What’s in this post

The Error

The error I was seeing made it look like the Serverless SQL Pool views, which use an OPENROWSET call to a parquet file, were referencing the wrong source file even though I confirmed multiple times that the view definitions were correct. For example, the view was written to use parquet files under TestDatabase/MyTable/** as the source files, but the error was making it seem like they were instead pulling data from TestDatabase/OtherRandomTable/** which was confusing to say the least. I thought that the Serverless node was broken or had a bug that was making the views look at “OtherRandomTable” files instead of the correct files.

The Cause

The error happened because multiple views used a CROSS APPLY to another view tied to a parquet file in a data lake, and that parquet file was being deleted and recreated by a parallel pipeline. When the failing pipeline tried to reference its views, it couldn’t find that base view because the source file had not yet been recreated by the parallel pipeline. Makes sense and is so obvious in hindsight, but it took Microsoft support directly asking me to make me realize that I had a view referencing another view, which I needed to check the definition of.

The change I needed to make was to update the pipeline triggers so that the process deleting and recreating the base view would be done making the parquet files when the second pipeline ran and tried to use those files.

If I had done my due diligence and dug through every layer of the data environment, which I am normally good at with other scenarios, I would have quickly and easily discovered the issue myself. But sometimes we need to learn the hard way because our brains aren’t running at full capacity. (It also helps that I finally had dedicated time to set aside to this problem and so wasn’t trying to multitask multiple work items at once.)

Summary

If you are troubleshooting ETL failures of any kind, make sure you dig down through all layers of the process to ensure you have checked everything possible related to your failure before reaching out to support. They’ll happily help you find what you missed, but it will save everyone time if you can figure it out yourself first.

Related Posts

Azure SQL Database – Removing Replicas Doesn’t Delete the Replica DB

A couple months ago, I was asked to add geo-replication for all our Azure SQL Databases to align with our recovery strategy in case of disaster. A few weeks ago, when upper management finally realized the full cost of that replication for all our databases, they requested that we remove replication from anything that isn’t business critical and doesn’t need to be recovered immediately in case of a disaster to reduce the shocking cost of replication.

I mistakenly didn’t do research before doing what I thought was fully removing the replicas I had previously created, which was removing the replica from the primary databases. I only recently realized that those replica databases were still alive and well and charging us money that we thought we were already saving while I was reviewing resources for another task . Keep reading to learn how to do better and fully get rid of the replicas you no longer need.

What’s in this post

What is a replica for an Azure SQL Database?

A replica for an Azure SQL Database is a way to make a secondary copy of your database on a separate logical SQL Server in a different region that you can keep available to failover to in case of a full region outage in Azure. Although this scenario is rare, it has happened in the past, and most companies do not want to be caught without their vital resources for hours while Microsoft troubleshoots their outage. In such a case, having a geo-replica means that you can immediately failover to an exact copy of your database in a different region and keep your business running.

How to Remove Replica Link from Primary

Getting rid of a replica of an Azure SQL Database is a two step process. The first step is to remove the replication link between the primary and secondary databases, which I will cover here, and the second step is to delete the database itself, which I will cover in the section below.

Removing the replication link between primary and secondary is as simple as the click of a button. Navigate to the primary database for which you want to remove the replica, and go to the “Replicas” page under “Data Management” in the menu.

On that page, you will see the primary database listed first, then in the section below that, any and all replica databases.

To remove the replica, you will click on the ellipses menu on the right side of the replica database, then choose “Stop Replication”.

At first I was confused as to why this said that it was going to stop replication because I was assuming that I would be able to delete the replication and delete the replica in one step. But now I better understand that this is a two step process.

After you have chosen to “Stop Replication”, you will get a prompt to have you confirm that you want to remove the replica. It also clearly points out what happens when you choose to do this, but I just didn’t understand what it meant. “This will remove server/MySecondaryDatabase” from replication relationship immediately and make it a stand-alone database.” When I read that, I thought it meant that removing the replication would be reverting the primary database to a standalone database, but now I know that it means what it says: the secondary database will become a standalone database that you will later have to deal with.

Click “Yes” to remove the replication relationship.

You will get a notification that replication is being removed.

After a few minutes, you will be able to refresh the page and see that no replica link exists for the primary database anymore.

However, if you search for the name of the database that you previously had a replica for, you will see that the replica still exists, it’s just no longer linked to the primary through a replication process.

Fully Removing Replica (so that it’s like it never happened)

To get rid of the replica you no longer want so you can stop being charged for it, you will need to navigate to that former-replica database in the portal and then delete it like you would any other database. Before deleting, ensure that this is the database that you really want to get rid of since the deletion cannot be undone.

Once you have deleted the Azure SQL Database resource for the replica, you are finally done with removing your replica.

Summary

If you want to remove a geo-replica database from an Azure SQL Database to save money (or for any other reason), you will need to complete the two step process to do so. First, remove the replication relationship between the primary and the secondary through the “Replicas” page under the primary resource. Once that is complete, navigate to the former-secondary database in the portal and delete the resource. Removing the replica relationship alone won’t delete the database, and you will keep getting charged for that secondary database until you fully delete it.

Related Posts

Recent Security Issues with AI

Are you and your company keeping track of the security of the artificial intelligence (AI) tools your employees are using? Are you aware that AI is not magically more secure than other software tools, and may in fact be more prone to attack due to its newness and speedy development? If not, you need to start watching the news for cyber attacks that are related to AI. These aren’t even exclusive to all the new AI startups making moves in the industry; even tech giants like Google have been found to have major flaws in their AI tools.

I am not a cybersecurity expert, so I won’t go into detail attempting to cover the vulnerabilities that have been found, but I highly encourage you to read through these two articles I found recently that covered the exploits.

The first is about three major vulnerabilities discovered in Google’s Gemini AI assistant. Three different issues spread across different facets of the tool. I expect better of Google.

https://www.darkreading.com/vulnerabilities-threats/trifecta-google-gemini-flaws-ai-attack-vehicle

The second article is about a much more niche AI tool, a “Model Context Protocol” (MCP) server package that had the most ridiculously simple exploit.

https://www.darkreading.com/application-security/malicious-mcp-server-exfiltrates-secrets-bcc

Standard cybersecurity processes are more important now than ever. Never trust the software or code you are using. Don’t put your most sensitive company data into tools managed by people outside of your company that you don’t trust 100%. Due diligence is always useful. AI is proliferating more than ever and it is guaranteed a lot of the tools won’t be following security best-practices. Protect yourself as much as you can using common sense, and keep on top of recently announced exploits using trusted news sources.