Tag: PostgreSQL (page 1 of 1)

Updatable Views

Welcome to a coffee break post where I quickly write up something on my mind that can be written and read in less time than a coffee break takes.

Recently, while exploring possible options for converting our existing ETLs to working with Postgres, I found myself asking the question “can you update a table using a view that has been created on that table?” and the answer is Yes for both Postgres and SQL Server, which I learned after doing some research.

In Postgres, there is a concept of an updatable view, which is essentially a very simple view, that usually doesn’t include a join to a second (or third or more) table. There are other qualifications for what makes a view updatable, but it is generally the fact that the query creating the view is very simple. If a view meets these criteria, you can in fact update the base table using a query to the view.

For example, if you have a table called employee, then you create a view on top of that table which selects most but not all the columns from the base table, with the view being called v_employee, you can then run this query and it will update the data in employee.

UPDATE v_employee
SET full_name = 'George Smith'
WHERE emp_id = 15;

This concept is intriguing to me, and could also be really useful in updating our ETLs more quickly after we convert to Postgres, so I thought I would share this fun little fact with everyone.

Sources

Postgres vs. SQL Server, Part 3

If you haven’t seen it already, I had two previous posts listing differences I have found between Postgres and SQL Server while learning to develop in the Postgres environment. As I am continuing to learn, I am finding even more differences that I think are interesting, and they are going to be shared below.

What’s in this post

Creating a temporary table

The difference between SQL Server and Postgres for creating a temp table is small, but significant. Mostly because my company uses temp tables frequently in our stored procedure and ETL code, so as we migrate off of SQL Server, we’re going to have to do a lot of updating to move temp table statements to Postgres. The syntax for SQL Server is to use a hashtag/pound/number symbol in front of the table name, or to use a double-pound symbol to create a global temporary table, like “CREATE TABLE #MyTable” or “CREATE TABLE ##MyGlobalTempTable“. In Postgres, the syntax requires that you use the TEMP or TEMPORARY keyword to make the temp table, like “CREATE TEMP[ORARY] TABLE my_table“.

You also have to use the TEMP or TEMPORARY keyword when creating a temp table in Postgres by doing a SELECT INTO statement, such as SELECT column1, column2 INTO TEMP TABLE my_table. In SQL Server, that same command would look like SELECT column1, column2 INTO #MyTable.

General differences between temp tables in each engine

Not only is the way you create temporary tables different between Postgres and SQL Server, but how the tables are actually stored on the database is completely different between the two as well.

In SQL Server, any temporary objects are stored in a separate database called tempdb. When creating a temporary table, you can specify whether it’s localized to your current session only or whether it is a global temporary table that can be accessed from different sessions or query windows, as I mentioned above. To do the first, you specify the table name as #MyTable, then the do the latter, you specify the table name as ##MyTable.

In Postgres, temporary tables are not stored in a separate database. Instead, they are put into their own system schemas whose names are based on numerical values (ex. pg_temp_1, pg_temp_2, etc.), so it’s harder to see the specific location where your temp table is being stored. You also are generally unable to access a temp table in one session from a different session, so there is no concept of a global temp table in Postgres.

One similarity of temp tables between the two engines, though, is that they are dropped/deleted in both as soon as the session in which they were created ends.

Getting the current UTC time

In SQL Server, there are two built-in functions that allow you to get the current UTC time, GETUTCDATE() and SYSUTCDATETIME(). The former returns the time in type DATETIME, and the latter returns the time in type DATETIME2 (more precise).

Postgres also has multiple functions and methods of getting the current UTC time, but they look much different from each other than the SQL Server options look. While there are more than just these two options, the most common ways to get the UTC time are the following:

  • SELECT NOW() AT TIME ZONE 'utc';
  • SELECT CURRENT_TIMESTAMP AT TIME ZONE 'utc'

Although the two options look different, they will return the same value. I’ve had some issues trying to use one versus the other in different scenarios in the SQL code I’ve written for Postgres, so it seems like there may be times where it’s better to use one over the other, but I’m just not able to differentiate those scenarios right now. There are also other options for getting the current UTC time, but I won’t cover those here.

Updating a table with a JOIN

One of the biggest syntax differences between T-SQL and pgSQL that has really gotten me confused so far is the formatting of an UPDATE statement that includes a JOIN to a different table. To me, the syntax of T-SQL makes a lot more logical sense and the pgSQL syntax doesn’t seem as intuitive. The T-SQL syntax is as follows:

UPDATE p
SET p.MyColumn = s.MyColumn
FROM person AS p
INNER JOIN staging AS s
	ON s.MyColumn2 = p.MyColumn2

Then the syntax for pgSQL is:

UPDATE person
SET p.my_column = s.my_column
FROM staging
WHERE staging.my_column2 = person.my_column2

To me, the pgSQL syntax doesn’t scream “joining to a different table”; it really looks to me like a simple update statement without a join. Yet the join is there. Also, it doesn’t seem to let you use aliases for the query, which I also dislike since I use aliases for every query I write. I wish I could change the standard for the pgSQL syntax, but obviously that isn’t within my power so I’m going to have to learn to live with it.

Making a column nullable

Unlike the pgSQL syntax for updating a table using a join to a different table, I actually like the syntax of making a column nullable in pgSQL more than I like the T-SQL syntax. The pgSQL syntax seems much more clear and straightforward for this action. The syntax in Postgres for making a NOT NULL column nullable is as follows: ALTER TABLE my_table ALTER COLUMN my_column DROP NOT NULL;

The T-SQL syntax for the same action is less clear, since you are basically redefining the column like you would if you were adding it, instead of specifying what you are changing. If you had a column that was set to NOT NULL in a table, you would write out the whole definition of the column again but write NULL instead of NOT NULL. For example: ALTER TABLE Mytable ALTER COLUMN MyColumn INT NULL;

I’m not fond of that T-SQL syntax because you can’t tell at a glance what is changing with the column definition unless you already knew what it was before running the statement. Since that same statement would also be the same syntax for changing the data type for the column if the column was already set to be nullable.

Data type for automatically updated binary numbers

In SQL Server, there is a data type called TIMESTAMP or ROWVERSION that is an automatically updating binary value that is added to a record when that record is updated or inserted. (The TIMESTAMP synonym for the ROWVERSION data type has been deprecated.) My company currently uses this data type heavily to help identify changed data quickly for our ETLs. As we are looking at moving everything into the Postgres world, we have found that Postgres does not have any built-in functionality similar to the ROWVERSION column, at least not at the precision we would need it to be (you can read about system columns here). You of course can store binary data within a table, but there is no data type for columns that will automatically increment a binary value when data is changed with the same precision as SQL Server’s ROWVERSION.

There do seem to be some ways people have replicated that functionality using custom triggers/functions that we are considering, but I haven’t looked into it that much personally so I cannot yet speak to how well that would work to replicate SQL Server’s ROWVERSION functionality. So we are either going to have to add these custom triggers/functions or find another reliable and fast way to identify changed data between source and target for our ETLs. Soon, I am hoping to do a blog post about the possible Change Data Capture (CDC) options for Postgres that we are considering.

Conclusion

There are so many weird quirks and differences between SQL Server and Postgres that I am having to learn as I grow in the Postgres development space. While I have already listed so many of them across several posts now, I am sure there will be many more to come as I keep learning. And I will continue to compile a list of the differences and share them here on my blog as I find them.

Sources

  • https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-temporary-table/
  • https://www.postgresql.org/docs/current/datatype-datetime.html
  • https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-update-join/
  • https://www.postgresql.org/docs/7.3/sql-altertable.html

How to Get Public IPv4 DNS for AWS EC2 Instance

I have been trying to learn how to work with AWS Glue because it’s probably going to be a new ETL solution my organization uses as we migrate to Postgres in AWS. Part of learning how to use Glue is learning how to set up and use Postgres RDS instances so that I can move data between them with Glue.

Setting up the RDS instances was the easy part, since AWS makes that process go very smoothly. Even setting up the EC2 jump server to connect locally to my RDS instances seemed like it was easy as well, only a few options to select and then a new server was created for me.

The Problem

However, in my most recent attempt at creating all 3 of these servers (I have to regularly delete what I have while not using it to not incur additional charges), I kept running into an issue where my EC2 server was not being assigned an IPv4 Public DNS address, and without that value, I can’t connect to that server as a jump host on my local computer. That was a big problem for me.

I spent over a half hour trying to troubleshoot this problem, double-checking the VPC rules for DNS and everything I could think of, and none of it was working. I terminated and recreated the instance multiple times and that did not do the trick. Finally I found this Stack Overflow answer that was exactly what I needed, and the fix was super obvious but also hard to see at the same time.

The Solution was Simple

For some unknown reason, the settings that AWS defaulted to when I was creating new instances was to set “Auto-assign public IP” to “Disabled”, and I didn’t catch it at first because that section of the instance creation settings was in a non-editable state by default as well. If you run into this same issue, when you get to the “Network Settings” part of your instance creation dialog and “Auto-assign public IP” is set as Disabled and it looks like there’s no way to change that, click the edit button at the top right of that pane to change the default instance settings. Then Enable the option to assign a public IP address to the instance.

It’s that simple. I can’t believe it took me so long to figure out something so obvious! But that’s life in IT sometimes.

Extra Note

When you stop and then start your EC2 instance again, it will assign a new Public IPv4 DNS name to the instance. It took me longer than I would like to admit to figure this out. I kept having an issue each morning where my SSH tunnels to my RDS databases through this EC2 server would no longer work. After several weeks and trying many different things, I finally figured out that the Public IP address was changing each time I stopped my instance at the end of the work day and restarted it the following day, and that’s what was causing my tunnel to break.

More Postgres vs. SQL Server

Welcome to a coffee break post where I quickly write up something on my mind that can be written and read in less time than a coffee break takes.

As I’m getting further into my PostgreSQL adoption project at work, I am continuing to learn about so many more small differences between Postgres and SQL Server that I never would have expected. None of the differences I’ve found so far are profound, but they will pose challenges for all the developers we have (including myself) that have only worked with SQL Server.

Postgres does not have default constraints

That’s right, there is no way to make a default constraint on a Postgres table, instead you make a default value, which cannot have a name assigned to it. In SQL Server, you can define a default constraint in essentially the same way as you would define a unique or key constraint and you can give that constraint a name. In Postgres, you simply specify that a column has a default value when adding that column to a table. In theory (I haven’t tested it yet), the generation of default values works exactly the same between the two engines, one just isn’t saved as it’s own script or file with a name.

The index differences are amazing (and confusing)

I’m not going to lie, the prospect of figuring out when to use what kind of index with Postgres is daunting to me. Deciding which type of index to use with SQL Server is very straightforward, since you only need to know if you already have a primary key/clustered index on the table already and then go from there. You pretty much only have clustered or non-clustered indexes in SQL Server. But Postgres has a handful of different possible options and you need to better know how the data will be queried in order to pick the best one. The default/usual best option seems to be a B-Tree index, which is comforting and familiar, but one day I know I’ll likely have to figure out the other options as well.

The default length of object names is really short

Of these three new items I learned recently, I think this one is going to be the most annoying. The name length for objects in Postgres is surprisingly low, only 31 characters. The max length for objects in SQL Server is much longer. I’m mostly concerned that we’re going to need to figure out how to rename many long object names as we migrate off SQL Server, and that could be tedious. But we’ll see how that goes. I read somewhere in my research that you may be able to change this default length, so that will need to be something else for me to research.

What is Liquibase?

Liquibase is a tool used to track and deploy changes to databases. It can connect with Git and other software, including CI/CD software, allowing your team to collaborate and track changes to your databases. There are several different software options similar to (and even better than) Liquibase, but what makes this tool unique is that it has both an open source version, which could benefit smaller organizations that don’t have as much cash to spend on database software, and a “Pro” version, which gives more features for those who need it and can pay the additional cost.

I’m currently reviewing this software and playing with it for a project at work, and I’ve found it a bit challenging to figure things out from the Liquibase websites and documentation. I’m summarizing my findings here not only for others who may read this in the future but also to help myself get everything organized and clarified in my mind. This will be a two-part series of posts, with the second providing a deep dive into how to use Liquibase with your database.

What’s in this post:

  • Why would you want to use this tool?
  • Features of open-source edition
  • Features of Pro edition
  • How to get 30-day free trial of Pro edition
  • Other options for similar tools

Why would you want to use this tool?

You want to be able to track and manage changes happening in your database

In the modern development process, all code, including database code, should be tracked using source control. Using source control enables you to have a complete history of what changed when, and who made that change. Having this level of tracking can be great for overall documentation of what you have in your systems, but it can also help immensely in troubleshooting bugs and other issues in your code to find when an issue started happening. It also enables you to quickly revert any bad or unintended changes without having to update all the code again manually. This is a well-known and well-used system in the software development realm but doesn’t seem to be quite as popular in the database realm, at least from my own experience. Liquibase can help organizations start on this journey.

You want to automatically deploy database changes to multiple environments

When you first start using Liquibase, you can generate a changelog file that acts as a baseline for recreating the database, with the file containing all of the SQL queries necessary to recreate all objects within the database. Then, as you make more changes, those will be added to one or more files that can be tracked with Git or other source control to help with deployments and change tracking of your system. If you are looking for a way to automatically deploy your database changes through pipelines, Liquibase should be able to help with that, although I haven’t gotten that far in my project so I can’t say for sure. But based on how the tool is set up, you should be able to use the software on a build and deployment server and have it execute the changelog scripts you create on databases in other environments. You can read more about the integration with CI/CD systems on the Liquibase website.

You need a database change-tracking tool that works with multiple database engines

On the Liquibase website, they claim they can work with over 50 database engines, which is a huge claim to make and an accomplishment to have. There are very few other options for this type of tool that can claim to support more than just SQL Server, let alone 49 other database options. I am looking at this tool exactly for this reason because I need to be able to manage database changes on at least SQL Server and PostgreSQL going forward, and perhaps others if needed since it supports so many different options. At this point in my current tool exploration project, I have only played around with the tool and how it works with Postgres since that’s the future my department is looking at, but I’m sure it would be just as simple to set it up to work with a SQL Server database as it has been with Postgres.

Features of the Open-Source Option

I’m not going to go through and list every single one of the features of the open-source version of Liquibase because you can see that on their website. Instead, I will discuss the features I am interested in and why I think it’s interesting that those features are available completely for free.

Run Preconditions Before Executing SQL Changes

The concept of the preconditions interests me because it almost combines a custom system my department uses to validate data with the change tracking and deployment of changes. Although I haven’t had a chance to use the preconditions in my testing yet, it looks like a cool feature because it allows you to validate data using a SQL statement before the next change query is executed. For example, if you are adding an ALTER TABLE statement to your changelog file to drop or change a column, you can first add a precondition that will check to see what the data looks like in the column to ensure it doesn’t run if data exists in the column. With the precondition for SQL files, you specify an expectedResult value and then the query you want it to run, and if the results of the query don’t match the expectedResult value, it will fail the execution of the file and the changes won’t get deployed. This would be very useful for ensuring you’re only running code in the scenarios you want it to be run in.

Preview SQL Changes Before Running Them

I’ve already used this feature in my testing to see what exactly Liquibase is doing under the hood when I run certain commands and apply changes, and it seems useful to me. I, like a lot of database developers, am paranoid about knowing what exactly I am about to execute on a database, so Liquibase provides that capability with the command “update-sql” which will print out in the CLI all of the SQL queries that will be executed when you apply the existing changelog and changesets to your database, including the ones that the tool runs in the background to track the changes in the two log tables it uses.

Automatically Detect & Script Change for Non-programmable Objects

When I was first working with Liquibase for my proof of concept project, I thought that the command to automatically generate changelogs based on the current state of the database was only available in the Pro version. But I finally figured out that my assumption was incorrect, you can use the “generate-changelog” command even on the open-source version, but it will only generate queries for what I call non-programmable objects, everything but functions, procedures, etc. If you are working with the open-source version of Liquibase and use the “generate-changelog” command, it will generate a new changelog (or overwrite an existing one if you specify an argument) containing SQL queries to recreate every object in your database that isn’t one of the programmable type objects.

I was happy to find this command in the Liquibase arsenal because it’s a working method that my department uses regularly with our current database change management and source control tool. The normal workflow for a developer making a database change is to go into the database, make the change using SSMS, and then use our existing software to automatically detect the change and generate a migration script to track and implement that change upon deployment. This was a workflow that I was hoping to not lose as we change tools, and it looks like we won’t lose that if we switch to Liquibase. In my next post, I’ll cover both of the two different methods of creating and tracking changes with Liquibase, which each have a different angle of attack for where to make the change.

Features of the Pro Edition

I have been working with a 30-day free trial of the Pro edition of Liquibase to see what features it offers in that edition that we might need. If you would like to see a full list of the upgrades you can get with Liquibase Pro, visit their website to compare it against the open-source version.

What I found while working with the tool is that you do not need the Pro edition to use the command “generate-changelog” like I originally thought. If you would like to be able to automatically script out programmable objects like stored procedures and functions, you will need the Pro version for that. However, if you only want this tool to track changes to tables, foreign keys, primary keys, and other constraints (see the documentation for the entire list), then you would be fine to use the open-source edition of Liquibase.

Another big feature of the Pro edition that most people are probably interested in is that it is the only version of the software that you can integrate with CI/CD tools to do automatic deployments of the code you’re tracking with the tool. This feature would be a big one for my company since I doubt we are okay with going back to manual deployments of database code across dozens of databases in multiple environments.

The final reason you might choose the Pro edition over the open-source edition is that you can get a much greater level of support from the company with the Pro version (which makes sense). Pro comes with what they define as “standard” support which is through email. If you would like more advanced and involved support, such as 24-hour emergency help, you can add that to your subscription for an additional cost.

How to get a 30-Day Free Trial of Pro

If you would like to try out the Pro version of Liquibase to see if it meets your needs, they do offer a 30-day free trial through the website. You will need to go through the process of giving your email and other contact information, as well as setting up an appointment with one of their representatives to get the license key. In my experience, since my company is seriously considering this tool as an option and I can’t do the contact or negotiation for licenses (someone else handles that), I went through the meeting setup process but then contacted the representative and told them I would be unable to make the appointment, and still received the trial license through an autogenerated email. As of writing this, I still haven’t been bombarded by sales emails or calls from the company which I appreciate. It’s nice to be able to try out the full version of the tool without being harassed for it.

Other options for similar tools

While there are other very expensive options for database change management and tracking software available, Liquibase seems to be unique in that it offers a lot of useful features without requiring you to pay for them. They outcompete the other free options by a long shot since they can work with different database engines. It’s hard for them to compete with options like Red-Gate SQL Change Automation or Flyway since those tool suites are robust and expensive, but they are a viable option for people who don’t want to be stuck paying thousands of dollars per year for this type of software.

To see a more complete list of alternatives for this type of software, you can review the list that originally helped me in my search at DBMSTools.com.

Want to learn more details about how to work with Liquibase?

Next week, I will be posting about how specifically to set up and work with Liquibase since the startup documentation on their website was scattered and hard to move through as a beginner with the software. I thought I would make it easier for others to learn how to work with this tool by posting the notes I made while working with it.

Differences Between Postgres & SQL Server

A couple of weeks ago, one of my coworkers sent our group an article from Brent Ozar that they found interesting. The article is “Two Important Differences Between SQL Server and PostgreSQL”, which is a short and sweet discussion about two differences between the database engines that might trick or confuse a developer who’s learning PostgreSQL after working with SQL Server (someone exactly like me). After I read his post, I went down a rabbit hole of researching all of the little things left unsaid in the article that I hadn’t thought about yet since I’m only beginning my transition into Postgres. I recommend you go read that article, but here are the takeaways I learned from it as well as the extra information I learned from my additional googling.

CTEs are very similar but also different in Postgres vs. SQL Server

As Brent Ozar says in his post, CTEs are fairly similar between Microsoft SQL Server (MSSQL) and Postgres, but the main difference is in how the queries are processed by the engines which can affect how you should format your queries and their filters. If you don’t change your CTE writing style when switching to Postgres, you may end up with performance issues. This scares me a little simply because I’m sure this isn’t the only “gotcha” I’ll encounter while transitioning our systems to Postgres. However, I will always look at the whole project and each new challenge as a learning experience so that I don’t get overwhelmed by the mistakes I’m bound to make.

The one good thing I learned in my research about Postgres CTEs after reading Brent’s article was that I can continue to use Window Functions just the same with them, which is usually why I use CTEs with my current work anyway. I use the ROW_NUMBER function with CTEs quite often, so I was happy to see that the function also exists in Postgres.

I have taken simple query formatting for granted my entire career

After reading Brent’s post and the other subsequent articles to learn more about Postgres DO blocks, I came to the realization that I’ve really taken for granted the nice and simple query formatting, for both ad hoc and stored queries, that SQL Server and T-SQL provide. Even if I’m running a one-time query for myself to get data needed to develop a stored procedure, I use IF statements quite frequently and that is so simple to do with SQL Server. I only need to write “IF… BEGIN… END” and I have a conditional code block that I can run immediately to get the data I need.

Doing that in Postgres is a little more complicated, especially if I’m wanting these conditional statements to give me a result set to review since that isn’t possible with DO blocks in Postgres. In order to run conditional statements with Postgres, you need to use a DO block to let the query engine know that you’re about to run conditional logic. That itself is simple enough to adapt to, but there was one aspect of these code blocks that confused me for too long, which was the “dollar quote” symbols often seen in Postgres code examples. These were confusing because I wasn’t sure why they were needed or when to use them and everyone used to working with Postgres took that information for granted and never explained it with their examples.

In Postgres, when you define a function or DO block, the code within that block must be encapsulated by single quotes, which is different from SQL Server which doesn’t require any kind of quote encapsulation for the code contained in an IF statement. Most developers use the dollar quote styling method instead of single quotes because doing so prevents you from having to escape every single special character you may be using in your code block, such as single quotes or backslashes.

DO $$
<code>
END $$;

Additionally, it is possible to specify tags between the beginning and ending double dollar signs (ex: DO $tag$ <code> $tag$) which can help you organize your code to make it easier to read. As far as I can tell, the tags do not provide any function besides styling code for easier reading.

Once I understood these dollar quotes better, I felt more confident about being able to write Postgres SQL code. But there were still a few other catches with the code that I wasn’t expecting, coming from MSSQL.

Postgres is very specific about when you can and can’t return data from your query/code

Of all the Postgres info that I have learned so far, I think this one is going to get me the most due to how I write queries on a day-to-day basis. As I said above, I write a lot of ad hoc queries throughout the day to view and validate data for many purposes. Postgres’ rules about what you can return data from are much different than MSSQL. In SQL Server, it seems like you can return/select data from anywhere your heart desires, including functions, procedures, conditional statements, and ad hoc queries. Postgres only allows you to return data from two of those options, functions and ad hoc queries (as long as it doesn’t include conditional logic).

If you have a chunk of code that you want to save to run again later in Postgres, and that code returns data with a SELECT statement or a RETURN statement, you must use a function and not a stored procedure. Stored procedures in Postgres are only meant to perform calculations or to do anything else besides return data. If you put a SELECT statement in a procedure in Postgres, it will NOT display that data like you would expect or like you are used to with SQL Server. This will affect my organization greatly when we move to Postgres because we have a lot of stored procedures whose sole purpose is to return specified data with a SELECT statement. All of those procedures will need to be converted to Postgres functions (which isn’t that big a deal, but it definitely needs to be kept in mind).

You also are unable to return data from a DO block with conditional statements as Brent Ozar mentioned in his post. If you want to have conditional logic that returns data for you to review, that will need to be done with a function. This one is a little crazy to me since that is the opposite of what’s possible with SQL Server. I can run this statement without issue as an ad hoc query in SQL Server, but it wouldn’t work in Postgres DO block:

DECLARE @Today VARCHAR(10) = 'Friday';

IF (@Today = 'Friday')
BEGIN
	SELECT 1 AS TodayIsFriday
END
ELSE
BEGIN
	SELECT 0 AS TodayIsFriday
END

This will take getting used to.

Postgres has the ability to understand other programming, scripting, and query languages

This is the most interesting new thing I learned about Postgres with this research. It is fascinating to me that this database engine allows you to easily write and execute code written in other languages and that it will work just as easily as normal SQL code. When you write a function in Postgres, you are able to specify what language you are writing the code in, which can be many different languages. If you don’t specify any, it will default to standard SQL.

Postgres comes with four different language modules installed, which are PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python. The “PL” in those module names stands for “Procedural Language”. Tcl and Perl aren’t as interesting to me personally since I’ve never heard of the first and have never used the second, but the other two built-in language options intrigue me.

PL/pgSQL interests me because that’s the standard Postgres SQL language module available which gives you a lot of coding features that standard SQL doesn’t provide, such as custom functions, procedures, complex computations, etc. The types of functionality that I assumed were normal for database engines since they’re built-in to T-SQL/SQL Server.

PL/Python also interests me since I’m learning Python at the moment, in addition to learning Postgres, and it seems like the most useful scripting language I’ve ever used. If you can integrate Python with the database engine so easily, I can see getting a lot of use out of that to pull and analyze data. But I haven’t yet used this functionality with Postgres so I can’t say for sure.

Conclusion

Overall, I am excited to learn these new things about Postgres because I am starting to feel more ready for our migration from SQL Server to PostgreSQL. As I’ve written in posts before, I am a natural procrastinator and had been procrastinating starting to learn Postgres until my coworker sent me the Brent Ozar article. I am super thankful that he sent that, because it led me down this wonderful rabbit hole of learning many differences between Postgres and SQL Server that I’m sure will benefit me going forward.

The below list is all of the resources I used while learning the above information. I thought the EnterpriseDB link was the best overall summary of different topics while the rest really get into the weeds of different Postgres features.

  • https://www.postgresql.org/docs/current/sql-createprocedure.html
  • https://www.enterprisedb.com/postgres-tutorials/everything-you-need-know-about-postgres-stored-procedures-and-functions
  • https://www.postgresql.org/docs/current/sql-do.html
  • https://stackoverflow.com/questions/12144284/what-are-used-for-in-pl-pgsql
  • https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING
  • https://www.geeksforgeeks.org/postgresql-dollar-quoted-string-constants/#
  • https://www.postgresql.org/docs/current/tutorial-window.html
  • https://www.geeksforgeeks.org/postgresql-row_number-function/
  • https://www.postgresql.org/docs/8.1/xplang.html
  • https://www.postgresql.org/docs/current/plpgsql-overview.html
  • https://www.postgresql.org/docs/current/sql-createfunction.html

Highlights of PASS Data Community Summit 2023

This year I was able to attend the PASS Data Community Summit for the first time, and it was also the first time I had ever attended a professional conference in my career. Not only did I learn more than I expected about new and existing databases and tools, but I was also able to meet many interesting new people that I never would have been able to connect with otherwise. Each of the people I chatted with at various points throughout the conference had a unique story to share and was more than happy to introduce themselves, share their experience, and offer me help going forward if I need it. Going in as someone who’s never been to such an event before, I was relieved to find that I was fully welcomed and not made to feel like an outsider.

While I really enjoyed meeting so many new people from all around the world (I can’t believe how far some people traveled!), the best part of the conference was learning about many new and useful topics. These are the most interesting highlight items I learned across the three days of the conference:

Red-Gate Test Data Manager

I think this was the most exciting thing I learned while at the conference because it would be the most beneficial for my job and company. Currently, my organization has a homegrown tool that pulls down a subset of test data (from our QA environment) to each developer’s local database. While this tool is useful and has been mostly sufficient for us since it was created, it’s becoming harder to maintain and less useful to developers over time. The main issue we have with our custom tool is that it’s set to only pull down 500,000 records for any given table, with no regard (for the most part) for getting useful data for testing. Often, when trying to test a complicated or totally new process, the data subset either isn’t enough records or the records aren’t what is needed for testing.

That is why the Test Data Manager (TDM) from Red-Gate is so interesting to me. It solves all of the problems we currently have with our custom tool plus more we hadn’t considered yet. It allows you to create a clone in less than a minute from production data (my organization’s tool currently takes a long time to pull down all of the data and it pulls from our QA environment which isn’t as good of a testing set). TDM also allows you to mask any Personal Identifiable Information (PII) while keeping that data relevant, related, and useful after it has been masked (our custom tool isn’t that smart). Additionally, you can select a premade subset of data for any given table (small or large) or you can specify your own custom type of subset (e.g. only pull customer data for 2022 customers), and the software will ensure the data stays related and meaningful as it creates the subset of data.

Unit testing for databases is possible

You might think this is obvious, but the fact that unit testing can be added to database code now is crazy and exciting to me. I figured that it would be possible to write unit tests for stored procedures since that’s just code, but there’s a whole world of possibilities for other ways to unit test databases using tools already created and perfected by others. The two main tools that were discussed in regards to database unit testing were tSQLt, an open-source testing framework, and Red-Gate’s SQL Test which utilizes that framework while also giving additional capabilities for your team. While I haven’t had a chance to try to use either of these tools yet, I look forward to exploring the world of unit testing. I would love to be able to have a way to automatically test database code changes, since currently, all testing of database-related code changes, whether it’s SQL or SSIS, must be done manually by someone else on my team (and testing resources have been scarce lately).

PostgreSQL (PGSQL) is similar but also really different from Microsoft SQL Server (MSSQL)

Over the course of the Summit, I went to many different sessions related to PostgreSQL, some of them more helpful than others. What I learned is that there are multiple different tools in the AWS platform that could be useful to others to help in their transition from MSSQL to PGSQL, but those tools might not be as useful to my company. We are considering a full transition to PGSQL in the coming years for cost savings, which means that we will want to completely move our current application systems from one engine to the other.

One of the AWS services I learned about that didn’t seem like it would be as helpful to us is BabelFish, which is a service that allows you to create an instance of PGSQL in AWS and still use your existing TSQL queries to interact with the database. This service seems like it would be very helpful for anyone who is looking to migrate a single database instance, or maybe up to a few, to PGSQL without having to rewrite their entire codebase, since it lets you keep working with your same application code. BabelFish probably won’t be helpful to my team in our migration efforts since we do want to do the full switch, so there would be no need to spin up BabelFish instances that we would later have to migrate fully anyway (especially since we’re looking at moving dozens of databases).

The other AWS services that were discussed for migration from MSSQL to PGSQL, Schema Conversion Tool (SCT) and Database Migration Service (DMS), seem like they would be closer to what my organization needs for our transition to Postgres. These tools allow you to convert most of your existing TSQL codebase to Postgres without any work on your part. However, AWS doesn’t claim that the converted code is going to be aligned with PGSQL best practices or will be performant, so that could be a challenge we’ll have to deal with later. The first half of the migration process with these tools is SCT which does the conversion of tables, schemas, functions, procedures, etc. from TSQL to PL/PGSQL Then the second half of the toolset, the DMS, would help with moving the data from one server to another. If I was going to guess how my organization is going to manage the transition from one engine to another, I would say that we’ll likely use these two tools to get us started, recognizing that they won’t get us 100% converted and we’ll have to do some manual conversions ourselves as well.

The last thing I learned about how Postgres differs from Microsoft SQL Server is that the indexing system is much more customizable and specific in PGSQL than in MSSQL, and that means I will need to learn a lot about indexes in the former. MSSQL only has a few different index types you can use, clustered, nonclustered, and columnstore, while PGSQL has at least 5 or 6 different kinds that are built for very specific use cases. While it may be a bit confusing to me, a career-long Microsoft database developer, to figure out the different types of Postgres indexes available when we start the transition, I’m going to focus on what an interesting learning opportunity it is. This will be the first large database engine change I will have in my career, and learning a new system will make me a better developer and a more lucrative job candidate in the future.

AI is being added to tools many of us use daily, to help us work more efficiently so we can do the more important and interesting work

This is one of my final takeaways from the conference, but I don’t think it will be quite as useful or relevant for my current role as the other things I listed above. A large part of the sessions at the Summit this year were focused on how AI is going to start changing how we work, and that we are going to need to adapt and learn how to use it to benefit ourselves instead of being left in the technological dust by refusing to use it. AI certainly isn’t going to be stealing our jobs in the next couple of years, but it will change how we work so we should prepare for that.

For myself, I plan on using AI tools currently available (mainly ChatGPT) to try to help me be more efficient in my role, probably with non-coding tasks such as writing emails and documentation. AI can help you work much more efficiently so you can spend most of your time on what is most important to you and your company instead of the mundane daily tasks that hog your time. That is the most exciting takeaway from the AI discussions I attended at PASS.

In the future, when other new AI tools, such as Red-Gate’s SQL Prompt+ (which was announced at the conference) are released, I also intend to play around with those to see how they might increase my work efficiency. The demo that the Red-Gate team gave for how SQL Prompt+ can be used is to highlight a query in a stored procedure and ask the tool for recommendations on how to improve the performance of the query. While I guarantee the tool is going to make a lot of bad suggestions that a human can easily tell won’t improve anything, I also think that it may provide a good starting point for investigating underperforming queries that you may not have thought of yourself. That is why I go back to my point of AI not taking our coding jobs in the near future but it becoming a great resource for making your job easier.

The concept of AI in general is a bit frightening (as a lover of sci-fi and horror books and movies), but it’s not likely to take over the world anytime soon. However, it may trickle into your workplace in the next few years so it’s better to confront change head-on and figure out how you can use these new tools to your advantage.

Summary

I hope that in the future, I’ll be able to attend more of the PASS Summit events since this year’s conference proved to be such a fun and relatively easy way to learn a lot of new information relevant to my career. Whether you’re newer in your career like me or an old hat in the trade, I really would recommend attending the next PASS Data Community Summit if you’re able to. You never know what you’ll learn that could improve your work life and make you a better developer.