Today at TechCon365 in Atlanta, I did another full day session of the pre-con, this time covering everything in the realm of Microsoft Fabric. Before today, I knew hardly anything about Fabric, aside from it being something that Microsoft was pushing our team to use for a new project I wasn’t involved with. The workshop, which didn’t have any hands-on work but was still good, was led by John White and Jason Himmelstein. The two speakers did an excellent job at teaching all aspects of Fabric without being boring. This was the introduction to Fabric as a concept that I’ve been needing for months.
Note: The content below is not guaranteed to be accurate, it’s just what I took away from the workshop. In the coming months, I will be learning more about these tools myself to fact check everything and see if the new tools will work for my team.
What’s in this post
Goodbye Synapse
For me, the biggest takeaway of things I learned during today’s workshop is that Microsoft intends to get their customers migrated off Synapse as the data movement platform and into Fabric and its OneLake. As of now, Synapse as a product no longer has a product manager (according to the presenters), which means that Microsoft does not intend to make any feature updates to that tool. There is still a maintenance team for the product to fix any issues that may arise and to keep it running, but there are no long term plans for it anymore.
Synapse going away is concerning to me because my company only recently migrated to Synapse off of SSIS. My fear is that if we spend a ton of time and energy converting our ETL systems into Fabric, that as soon as we get the work done, we’ll need to start anew with whatever product line Microsoft releases next, just like what we just did with SSIS to Synapse. I understand that cycles like this are everyday occurrences in the technology sector, but I also know that my boss and the executives at my company are likely not going to be happy when I tell them what I learned in today’s workshop.
If you’re in the same situation my team is in, you are totally in Synapse and maybe just wrapped up getting there so are wary to move on to the next transition to Fabric, don’t worry too much. The presenters did assure us that Synapse will be supported for a while yet, no concrete plans have been announced to retire it. Based on that, John and Jason recommended that teams stop doing any new development using Synapse and instead start to do new development in Fabric to work as real-world test cases of how useful it can be. I haven’t yet done anything like that, so the usefulness is still unknown to me personally.
Hello Fabric
Fabric is being sold as the one stop shop for all things data in the Microsoft universe, the future of all data analytics and processing. No longer is Fabric going to be Power BI 2.0; now it will serve as the location for anything you need to do with data. While the sales pitches are great, I am wary of the transition, like I said above already. I dislike when companies claim that their new tool is going to solve all your problems, because nothing is ever that great in reality. But here is what I learned today.
OneLake: Put every speck of data here
With each tenant, Microsoft gives you a single OneLake, because it should be your one data lake for your organization, just like there is only a single OneDrive for each user. No more having dozens of storage accounts to act as separate data lakes for different processes. Now you get the one and you will like it. More details I learned:
- Parquet files: the reigning champions
- A “Shortcut” to the source data
- If you want to get data from a source system, like your Oracle finance database for PeopleSoft finance, you can add a “Shortcut” to the data source without having to pull the data into OneLake with a normal ETL process.
- This concept is the most intriguing to me, and I really want to see how it plays in real life with real data and systems. Could be too good to be true.
- Security redefined
- They’ve developed a new way to manage security for the data that is supposed to reduce overhead. They want to have you manage security in only one place, just like you’ll be managing all your data in one place with OneLake.
- This feature is still in preview, so remains to be seen if it works as well as Microsoft claims
- File Explorer: An old but good way to explore your data
- A new plugin for Windows has been created that will allow you to view your OneLake data through the Windows File Explorer, like they’re normal files. Which they are, if they’re Delta-Parquet flat files. The new feature is OneLake File Explorer.
- You can also view your OneLake data files through Azure Storage Explorer, which is a good second option as well since my team already uses that for other purposes.
For the OneLake concept, I like that it is trying to prioritize reusing what you already have–tools, data, reporting frameworks–so that you don’t have to keep reinventing the wheel for each new data flow process you need to create. The concepts are good, so I’m eager to use them for myself after the conference.
Lakehouse
This is a new way to store data that is offered exclusively by OneLake in the Microsoft ecosystem. It combines the concepts of data lakes and data warehouses by using Delta Parquet format files. It gives you the ability for ACID compliant transactions on the data files while also still having compressed and cheap data storage like a data lake.
Data Factory
This is the part of Fabric that is replacing Synapse pipelines. In the demos during the workshop, I could see that the Dataflow Gen 2s and the data pipelines look almost exactly like what is currently available in Synapse, which is promising, because then at least I don’t need to learn a new GUI. Like with Synapse, the Dataflows are still backed by Spark engine to allow for parallel processing and high throughput. Both tools allow you to move data from point A to point B (the OneLake) for the scenarios where Shortcuts won’t work and you still need to literally move data between systems.
Fabric Native Databases
If the other new data storage types in OneLake and Fabric weren’t enough for you, there is now also going to be Fabric-native databases available for use. These will be SQL Databases (and some other database types) that you create directly in Fabric instead of creating them independently then joining them to the OneLake with a Shortcut. This change to Fabric is what they are intending to use to take Fabric from a data analytics platform to a full data platform. However, the presenters of today’s workshop did caution that they don’t recommend putting your production systems into these native databases yet, since they’re still new and not well-tested in the real world. But maybe in the near future, we truly will be doing 100% of our data integration engineering in Fabric.
Summary
With this technology, it feels like we could be at the cusp of a new big wave of change for how we operate in the data realm, so I am really interested to see how things play out in the next year or two. Using Fabric for all things data could be the next big thing like Microsoft is trying to sell, or maybe it will simply fade into being just another option in the existing sea of options for data manipulation and analytics in the Microsoft ecosystem.
Do you think Fabric is going to be the the next big thing in data technology like Microsoft claims?