Archives (page 2 of 3)

Updatable Views

Welcome to a coffee break post where I quickly write up something on my mind that can be written and read in less time than a coffee break takes.

Recently, while exploring possible options for converting our existing ETLs to working with Postgres, I found myself asking the question “can you update a table using a view that has been created on that table?” and the answer is Yes for both Postgres and SQL Server, which I learned after doing some research.

In Postgres, there is a concept of an updatable view, which is essentially a very simple view, that usually doesn’t include a join to a second (or third or more) table. There are other qualifications for what makes a view updatable, but it is generally the fact that the query creating the view is very simple. If a view meets these criteria, you can in fact update the base table using a query to the view.

For example, if you have a table called employee, then you create a view on top of that table which selects most but not all the columns from the base table, with the view being called v_employee, you can then run this query and it will update the data in employee.

UPDATE v_employee
SET full_name = 'George Smith'
WHERE emp_id = 15;

This concept is intriguing to me, and could also be really useful in updating our ETLs more quickly after we convert to Postgres, so I thought I would share this fun little fact with everyone.

Sources

Postgres vs. SQL Server, Part 3

If you haven’t seen it already, I had two previous posts listing differences I have found between Postgres and SQL Server while learning to develop in the Postgres environment. As I am continuing to learn, I am finding even more differences that I think are interesting, and they are going to be shared below.

What’s in this post

Creating a temporary table

The difference between SQL Server and Postgres for creating a temp table is small, but significant. Mostly because my company uses temp tables frequently in our stored procedure and ETL code, so as we migrate off of SQL Server, we’re going to have to do a lot of updating to move temp table statements to Postgres. The syntax for SQL Server is to use a hashtag/pound/number symbol in front of the table name, or to use a double-pound symbol to create a global temporary table, like “CREATE TABLE #MyTable” or “CREATE TABLE ##MyGlobalTempTable“. In Postgres, the syntax requires that you use the TEMP or TEMPORARY keyword to make the temp table, like “CREATE TEMP[ORARY] TABLE my_table“.

You also have to use the TEMP or TEMPORARY keyword when creating a temp table in Postgres by doing a SELECT INTO statement, such as SELECT column1, column2 INTO TEMP TABLE my_table. In SQL Server, that same command would look like SELECT column1, column2 INTO #MyTable.

General differences between temp tables in each engine

Not only is the way you create temporary tables different between Postgres and SQL Server, but how the tables are actually stored on the database is completely different between the two as well.

In SQL Server, any temporary objects are stored in a separate database called tempdb. When creating a temporary table, you can specify whether it’s localized to your current session only or whether it is a global temporary table that can be accessed from different sessions or query windows, as I mentioned above. To do the first, you specify the table name as #MyTable, then the do the latter, you specify the table name as ##MyTable.

In Postgres, temporary tables are not stored in a separate database. Instead, they are put into their own system schemas whose names are based on numerical values (ex. pg_temp_1, pg_temp_2, etc.), so it’s harder to see the specific location where your temp table is being stored. You also are generally unable to access a temp table in one session from a different session, so there is no concept of a global temp table in Postgres.

One similarity of temp tables between the two engines, though, is that they are dropped/deleted in both as soon as the session in which they were created ends.

Getting the current UTC time

In SQL Server, there are two built-in functions that allow you to get the current UTC time, GETUTCDATE() and SYSUTCDATETIME(). The former returns the time in type DATETIME, and the latter returns the time in type DATETIME2 (more precise).

Postgres also has multiple functions and methods of getting the current UTC time, but they look much different from each other than the SQL Server options look. While there are more than just these two options, the most common ways to get the UTC time are the following:

  • SELECT NOW() AT TIME ZONE 'utc';
  • SELECT CURRENT_TIMESTAMP AT TIME ZONE 'utc'

Although the two options look different, they will return the same value. I’ve had some issues trying to use one versus the other in different scenarios in the SQL code I’ve written for Postgres, so it seems like there may be times where it’s better to use one over the other, but I’m just not able to differentiate those scenarios right now. There are also other options for getting the current UTC time, but I won’t cover those here.

Updating a table with a JOIN

One of the biggest syntax differences between T-SQL and pgSQL that has really gotten me confused so far is the formatting of an UPDATE statement that includes a JOIN to a different table. To me, the syntax of T-SQL makes a lot more logical sense and the pgSQL syntax doesn’t seem as intuitive. The T-SQL syntax is as follows:

UPDATE p
SET p.MyColumn = s.MyColumn
FROM person AS p
INNER JOIN staging AS s
	ON s.MyColumn2 = p.MyColumn2

Then the syntax for pgSQL is:

UPDATE person
SET p.my_column = s.my_column
FROM staging
WHERE staging.my_column2 = person.my_column2

To me, the pgSQL syntax doesn’t scream “joining to a different table”; it really looks to me like a simple update statement without a join. Yet the join is there. Also, it doesn’t seem to let you use aliases for the query, which I also dislike since I use aliases for every query I write. I wish I could change the standard for the pgSQL syntax, but obviously that isn’t within my power so I’m going to have to learn to live with it.

Making a column nullable

Unlike the pgSQL syntax for updating a table using a join to a different table, I actually like the syntax of making a column nullable in pgSQL more than I like the T-SQL syntax. The pgSQL syntax seems much more clear and straightforward for this action. The syntax in Postgres for making a NOT NULL column nullable is as follows: ALTER TABLE my_table ALTER COLUMN my_column DROP NOT NULL;

The T-SQL syntax for the same action is less clear, since you are basically redefining the column like you would if you were adding it, instead of specifying what you are changing. If you had a column that was set to NOT NULL in a table, you would write out the whole definition of the column again but write NULL instead of NOT NULL. For example: ALTER TABLE Mytable ALTER COLUMN MyColumn INT NULL;

I’m not fond of that T-SQL syntax because you can’t tell at a glance what is changing with the column definition unless you already knew what it was before running the statement. Since that same statement would also be the same syntax for changing the data type for the column if the column was already set to be nullable.

Data type for automatically updated binary numbers

In SQL Server, there is a data type called TIMESTAMP or ROWVERSION that is an automatically updating binary value that is added to a record when that record is updated or inserted. (The TIMESTAMP synonym for the ROWVERSION data type has been deprecated.) My company currently uses this data type heavily to help identify changed data quickly for our ETLs. As we are looking at moving everything into the Postgres world, we have found that Postgres does not have any built-in functionality similar to the ROWVERSION column, at least not at the precision we would need it to be (you can read about system columns here). You of course can store binary data within a table, but there is no data type for columns that will automatically increment a binary value when data is changed with the same precision as SQL Server’s ROWVERSION.

There do seem to be some ways people have replicated that functionality using custom triggers/functions that we are considering, but I haven’t looked into it that much personally so I cannot yet speak to how well that would work to replicate SQL Server’s ROWVERSION functionality. So we are either going to have to add these custom triggers/functions or find another reliable and fast way to identify changed data between source and target for our ETLs. Soon, I am hoping to do a blog post about the possible Change Data Capture (CDC) options for Postgres that we are considering.

Conclusion

There are so many weird quirks and differences between SQL Server and Postgres that I am having to learn as I grow in the Postgres development space. While I have already listed so many of them across several posts now, I am sure there will be many more to come as I keep learning. And I will continue to compile a list of the differences and share them here on my blog as I find them.

Sources

  • https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-temporary-table/
  • https://www.postgresql.org/docs/current/datatype-datetime.html
  • https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-update-join/
  • https://www.postgresql.org/docs/7.3/sql-altertable.html

Getting Better at Keeping Work at Work

Despite already writing a similar post about this topic myself, I have still been struggling recently to keep my work at work, struggling with not thinking about it in my free time in the evenings and on weekends. Looking for some professional guidance from an outside perspective online, I found this great article from Harvard Business Review that I want to share with others.

Using Technology to Log Out for the Day

Steps 1 and 3 from the article come easily to me. While I don’t have specific hours I am required to be at work, I am great at keeping a regular work schedule for myself that I don’t normally deviate from. Once I log out at 5 PM each day, I don’t check Teams messages, and I don’t even have my work email on my phone so I can’t check that even if I wanted to. I manage this system using the Focus Time feature on my iPhone.

I have two different scheduled Focus Time settings for my phone: 1) Work and 2) Personal Time. I’ve set up the Work Focus Time to be active during my normal work hours and it only allows notifications from Teams, my two-factor authentication apps, and a couple of important family members. Then I’ve set up the Personal Time custom Focus Time to be the opposite: it runs from the time I get off work to the time I start work in the morning and blocks any notifications from Teams or other work-related applications.

This system has been working really well for me and I would recommend others try it out if they haven’t already, because then you never need to think about which notifications to turn off after work each day, or worse, just keep getting those notifications all evening and weekend long that take your mind back to work when it shouldn’t be there.

Struggling with “Mental Clarity” Around Work

Steps 2 and 4 from the HBR article are more challenging for me lately than I would like them to be. Step 2 is to get “mental clarity”, which essentially means that you should know each day what tasks you need to accomplish at work, by keeping a running to-do list or somewhere else to manage all your thoughts about work. I used to be great at this before the last couple months, when my job became less clear and began changing. I stopped keeping a physical notebook because I needed a new system to keep related notes related and physically close since I now have larger projects to manage rather than small tickets to work on daily or weekly. Once I stopped heavily utilizing my physical notebook, I stopped creating daily to-do lists for myself to accomplish.

Not having a go-to place to list out all the current and future things I will need to complete has led to the problem of me randomly remembering important things I need to do for work at inconvenient times, like when I’m trying to fall asleep. I’m sure many can relate to that. This mental clarity step reminds me of the book “Getting Things Done” by David Allen that I read a couple years ago. One of the main points of that book is you need to be constantly dumping thoughts out of your head onto paper (or anywhere you want as long as it’s consistent) so your brain can trust it doesn’t need to constantly remind you of important things at inappropriate times. If you know that you always write important tasks in the same location, you know that anything important is in that location and not just in your head. The Harvard Business Review article made me realize I need to get back to that organization method and start actually using the digital note page I made awhile ago that is supposed to act as my catch-all list of items on my plate now and in the future.

One final note on the mental clarity step of keeping work at work– I actually do still manage to complete an end-of-day wrap up each work day where I write out everything I accomplished throughout the day along with notes for what I need to do the next day and a review of the next day’s calendar. I occasionally skip days of this, so I would like to schedule time for this each day going forward so I’m more diligent about doing it.

The Need to Feel Accomplished

The final step to keeping work at work recommended in the article that I’ve been struggling with the most recently is step 4, which says that you need to get work done at work. It may seem crazy for someone to not get work done during their work day if they’re showing up on time and leaving usually later than they should, but it is entirely possible if you allow your entire schedule to be consumed by meetings with no or minimal time between them. My job has morphed into a collaboration-focused position where I suddenly need to have meetings upon meetings about everything related to the projects I am working on. This means that on an average day, I have multiple hours of meetings with colleagues, and somehow they all seem to leave 30-minute gaps between to tease me with the possibility of getting work done, but then I never actually get time to focus in those gaps.

This cycle of continual meetings and no long stretches of work time has made it really difficult for me to have any decent amount of time to focus on the deliverables I need to complete. While sure, you could theoretically get good work done in the 30 minutes between meetings, reality is much different when you need to work on a confusing or difficult task. Those types of work items are better suited for longer stretches of dedicated focus time, at least one hour but hopefully two or more. That is a rare occurrence in my current work life and I want to change that, and will do so by following the recommendations of this HBR article, and actually some of my own advice from a previous blog post. I need to schedule blocks of one or more hours each day to focus on work. My plan is to do that on Friday afternoons the week before or Monday mornings the week of, so I can block out work time for an entire week to get ahead of meetings that will inevitably be scheduled.

Summary

While the advice of the Harvard Business Review article isn’t entirely new to me, it did act as a great trigger to remind me of the valuable work skills I have neglected recently so I can revive them. Going forward, I am going to revitalize my use of my own calendar and digit notebook to see if that helps me feel more in control and less caught up in the whirlwind at work.

Give the HBR article a read and see if any of it speaks to you like it spoke to me. Is there something in your work-life balance situation that could be improved to reduce your overall stress?

Getting SQL Prompt to Prompt on RDS Servers

This may seem like a ridiculous thing to need to write about, making the Red-Gate tool SQL Prompt generate prompts like it should, but I have been having a weird issue with it over the past couple months and have finally learned the solution. So of course I thought I should share it!

What is SQL Prompt?

SQL Prompt is a tool made by Red-Gate that works as a much cleaner, nicer, and more useful autocomplete feature for SQL Server Management Studio (SSMS). It is a plugin you install to SSMS that then seems to magically work to help you write queries faster. Not only does this tool autocomplete databases, schemas, tables, and column names for you in your queries, but it also provides a lot of other useful tools like a Snippets Manager, which allows you to use default and custom snippets to write code faster (e.g. writing “sf” then pressing Tab will type out “SELECT * FROM” for you so all you need to type is the table name you want to select from).

Every developer in my organization uses this tool heavily in our day-to-day operations while writing any SQL scripts because it makes writing queries so much faster. So when my SQL Prompt seemed to stop working after an update, I was getting really frustrated because it meant I had to write all of my SQL queries manually again. And when all you do all day is write SQL, that adds up to a significant hindrance to your work speed.

My Problem

The problem I was having with SQL Prompt was that when I connected to any of our RDS database instances, the tool would no longer do any prompting of schemas, tables, or columns which was making my coding life so much harder. Oddly, the snippets manager portion of the tool was still working fine, so at least I wasn’t needing to type out the queries I normally use the snippets shortcuts for. Also oddly, Prompt would work perfectly fine connecting to other databases that weren’t on RDS instances, it was only happening for RDS databases.

I dealt with this issue for months on our production server, since I figured it was due to the security settings or something else I wouldn’t be able to fix, and I don’t access prod servers very frequently, so when I did use them and the Prompt wasn’t working, it wasn’t as bothersome. But after I had to completely reset my developer computer and reinstalled SQL Prompt, I started having this same issue for our lower environment databases, that I work with every day, so Prompt not prompting was suddenly a big deal.

The Solution

I created a support ticket with Red-Gate since I’ve always had good luck with their support services. This time wasn’t any different. Within a couple hours of creating the ticket, I had an email from a support rep asking me if I had tried checking the “Trust Server Certificate” check box on the Connection Properties tab while connecting to the RDS servers. No, I had not done that because I did not know that was an option before that day.

I disconnected from the RDS server then reconnected, making sure to check that box before clicking “Connect”, and now I had SQL Prompt back up and working, providing prompts of schemas, tables, and columns just like I want it to. Yay!

I also logged onto our prod server to see if doing the same thing there would fix that issue, and it fixed Prompt there as well. I am so excited to be able to not type out every detail of every SQL query again!

I love quick fixes like this.

Why CFTs Take so Long to Delete

Welcome to another coffee break post where I quickly write up something on my mind that can be written and read in less time than a coffee break takes.

Background

Recently, I went through an AWS workshop for Lake Formation, a data lake management tool in AWS, and that workshop had me create many different Cloud Formation Templates (CFTs) to spin up services to use in the workshop. After I finished that, I then had to go through my development AWS account for work and clean up everything that had been created so we stopped paying for these services I no longer needed.

While attempting to delete the many CFTs I had used, I saw one that was seemingly stuck in the DELETE_IN_PROGRESS state for almost 20 minutes. I did not realize it would take so long to delete one CFT and was getting worried that it was actually stuck. So I started searching online to see if this has happened to others as well.

Why does the delete take so long?

I found this Reddit post of someone reporting the same thing, and it linked to a very informative answer to a similar question on Stack Overflow. I would recommend you go and read that detailed answer there for the best understanding of why CFTs sometimes take forever to delete.

The simple answer is that is just how it is. My CFT in question had set up a lot of Virtual Private Clouds (VPCs), Elastic Compute Cloud (EC2) instances, Elastic Network Interfaces (ENIs) as well as other resources, and some of those items simply take awhile to delete.

Even though I can’t speed up the deletion process for these big CFTs, at least now I know that in the future, should I need to delete any other large CFTs from my AWS account, I can expect it might take longer than I would anticipate to complete.

How to Get Public IPv4 DNS for AWS EC2 Instance

I have been trying to learn how to work with AWS Glue because it’s probably going to be a new ETL solution my organization uses as we migrate to Postgres in AWS. Part of learning how to use Glue is learning how to set up and use Postgres RDS instances so that I can move data between them with Glue.

Setting up the RDS instances was the easy part, since AWS makes that process go very smoothly. Even setting up the EC2 jump server to connect locally to my RDS instances seemed like it was easy as well, only a few options to select and then a new server was created for me.

The Problem

However, in my most recent attempt at creating all 3 of these servers (I have to regularly delete what I have while not using it to not incur additional charges), I kept running into an issue where my EC2 server was not being assigned an IPv4 Public DNS address, and without that value, I can’t connect to that server as a jump host on my local computer. That was a big problem for me.

I spent over a half hour trying to troubleshoot this problem, double-checking the VPC rules for DNS and everything I could think of, and none of it was working. I terminated and recreated the instance multiple times and that did not do the trick. Finally I found this Stack Overflow answer that was exactly what I needed, and the fix was super obvious but also hard to see at the same time.

The Solution was Simple

For some unknown reason, the settings that AWS defaulted to when I was creating new instances was to set “Auto-assign public IP” to “Disabled”, and I didn’t catch it at first because that section of the instance creation settings was in a non-editable state by default as well. If you run into this same issue, when you get to the “Network Settings” part of your instance creation dialog and “Auto-assign public IP” is set as Disabled and it looks like there’s no way to change that, click the edit button at the top right of that pane to change the default instance settings. Then Enable the option to assign a public IP address to the instance.

It’s that simple. I can’t believe it took me so long to figure out something so obvious! But that’s life in IT sometimes.

Extra Note

When you stop and then start your EC2 instance again, it will assign a new Public IPv4 DNS name to the instance. It took me longer than I would like to admit to figure this out. I kept having an issue each morning where my SSH tunnels to my RDS databases through this EC2 server would no longer work. After several weeks and trying many different things, I finally figured out that the Public IP address was changing each time I stopped my instance at the end of the work day and restarted it the following day, and that’s what was causing my tunnel to break.

Using the WordPress Local Development Tool to Avoid JSON Errors

When I started my blog a couple months ago, I determined a posting schedule for myself of once a week on Tuesdays. If you are someone who notices details and patterns, you may have noticed that I went a few weeks without posting at the end of December and into January, but it wasn’t for lack of me trying. In this post I’m going to talk about the issues I faced that prevented me from posting and how I finally got a workaround for the problem that has finally allowed me to start posting again.

Background

While trying to create and edit my tutorial post about working with Liquibase, I kept running into the same vague JSON error when trying to save drafts, always at the same point. The error was: “Updating failed. The response is not a valid JSON response”.

The point at which it always failed was about midway through the post, either when I copied in text that contained double-quote characters or when I added the first image. I spent several weeks going back and forth with Flywheel (the web host I use) support, trying to figure out what was causing this JSON error when I tried to save a draft.

We tried many different things, including the normal troubleshooting steps of disabling plugins, reverting to the default WordPress theme, disabling all of my browser extensions, trying to work in an incognito window, and several other things. I read so many of the articles online discussing how to resolve this error, and none of those solutions worked. On the Flywheel side, they told me they were trying several different server changes, including extending timeout values and other firewall type changes. I even completely restored my website to default, but not even that worked. And then I had to rebuild everything on my website (except for posts and pages which I exported before restoring).

As of the time of writing this, I still don’t officially have a resolution from Flywheel for this error that keeps popping up and generating 403 errors on the backend (I figured that out myself by inspecting what happened when I clicked the “save draft” button). However, thanks to some kind folks on a WordPress forum, I finally figured out a viable workaround that I am actually enjoying using more than the online WordPress admin portal.

The Successful Workaround

The workaround that I am now using, even as I write this, is to use the development tool for WordPress called Local. This tool gives you a local development environment for WordPress that can seamlessly connect with Flywheel and other hosting providers. The setup for the tool was super easy, took me less than 5 minutes. After going through the setup, all I had to do was pull a copy of my site to local and then I have been able to locally edit my website through the tool.

Once you’re done making all your changes, you only need to select to “Push” to your production site, then select which changes you want to be pushed, then it’s quickly sent to your actual production website. It takes a few minutes for Flywheel to update with the changes, but then you can see your locally developed changes on your real site. All without the hassle of the ever-present JSON error when trying to save a draft.

I will be continuing to work with Flywheel to try to resolve the error on my production site, but at least for now, I am able to keep working and posting, back to my normal schedule.

Summary

If you are facing endless errors when trying to create posts for WordPress and you’re hosted with Flywheel, try using the Local tool to develop on your own local machine to see if that helps you in the same way it helped me.

UPDATE

We finally seem to have resolved my issue after going back and forth so many times with the Flywheel support team and continuing to escalate it through all their support levels. I’m not sure if this is what finally resolved the issue as I wasn’t provided any further details, but the last detailed message I received seemed to indicate that something was set up incorrectly with their Web Application Firewall (WAF) that was preventing me from editing. Before getting the issue fixed, I also found that I couldn’t edit the footer of my site because I was also getting 403 errors when trying to save those changes, and that along with my post creation issue has been resolved.

The moral of this update is that if you’re having this same issue and your site is hosted on Flywheel, keep pushing them to try more things on their end, don’t let them say that it must be something with your local environment. They were super great in continuing to try as long as I kept pushing and saying it still wasn’t working. It also seemed to help when I started showing them the exact errors I was getting with the Inspect tool in Chrome so they would have something to work with.

Do DML Statements Work in Liquibase Changesets?

After finishing the blog post last week about how to work with Liquibase, I decided to find the answer to one of the outstanding questions I had about the tool, which was whether or not it allows you to put DML statements in your changelogs and changesets. I couldn’t find any documentation anywhere online about putting DMLs in changesets, so I had to figure it out myself. Finding the answer to this was much easier than I thought it would be since all it involved was adding a DML statement to a changelog, running the Liquibase update statement, and then seeing what happened.

So do DMLs work in Liquibase changesets?

Yes, they do. To prove it, I opened the existing changelog file that I created for last week’s tutorial and I added a new changeset.

Screenshot of text editor containing Liquibase changeset with DML statement to insert into a table

Then I opened the command prompt for Liquibase and ran the normal update statement to get my database aligned with the changelog file.

Screenshot of Liquibase command window showing successful execution of DML changeset

The update statement completed successfully, which I truly was not expecting. Then I had to go into the database to see if that DML statement was actually executed on the DB or not, and it was!

Screenshot of PGAdmin window showing SELECT statement results containing record inserted by DML changeset

I was very excited to see that, because it meant that if my team decided to switch to this tool, we could continue deploying DML scripts alongside any DDL scripts they may be associated with.

Summary

Today’s post is short and sweet. I wanted to see if the Liquibase tool had a key feature I was looking for it to have but couldn’t find documentation about. I was thrilled to see that it does work with DMLs. Such a small but important feature.

How to Set Up and Use Liquibase, Part 2

In last week’s post, I covered the initial setup steps you must follow when starting to work with Liquibase. In this week’s post, I will be finishing up my tutorial of getting started with Liquibase. If you haven’t yet downloaded and set up Liquibase on your computer, please review that post before reading this one.

What’s in this post:

Create the baseline changelog file for your database

Using the command “generate-changelog” with the CLI for Liquibase, we can create a SQL file containing queries that will regenerate all objects in your database. What database objects get scripted into this files depends on which license you have for Liquibase. If you have the open-source version of the tool, it will script out all non-programmable objects like tables, keys, etc. If you want or need to script out all of your programmable objects such as procedures and functions (plus other items), you will need to have the Pro version of the tool.

Either way, the command for creating the script is exactly the same.

liquibase generate-changelog --changelog-file=mydatabase-changelog.sql --overwrite-output-file=true

Let’s break this command down. The first two words are simple, you’re calling Liquibase and specifying you want it to run the generate-changelog command. The next part is the “changelog-file” argument that allows you to specify the file you want to write the new changelog to. The next argument, “overwrite-output-file” tells the tool if you want to overwrite that specified file if it already exists. In this case, I specified true for that argument because I want the tool to overwrite the example changesets in the file it created upon project creation with the actual queries for my database. After running this command, you should get a success message like the following.

And if you open that specified file now, it should contain the actual scripts to generate all of the objects in your database, each change separated into its own changeset. Each generated changeset will be defined with the username of the person who generated the file, as well as the tracking/version number for the set.

Now you are ready to start doing normal development and changes to your database because you have baselined your project.

Adding and tracking ongoing database changes

There are two methods for adding/tracking database changes with this tool: 1) add your scripts to the changelog file as changesets, then “update” the database with those changes, or 2) make your changes within the IDE for your database (ex: PGAdmin) then use the “generate-changelog” command to identify and script those changes.

Method 1: Adding Scripts to Changelog File

Open your changelog file and add a new line. On that line, you are going to add the required comment that lets Liquibase know you are starting a new changeset. This line looks like “– changeset author:versionNumber”. Example: “– changeset elahren:1.1”. Then, add a line below your changeset comment and add the DDL script you would like to run on your database. Once you have added all the changes you would like to your changelog file, save and close the file, then open the Liquibase command prompt to execute those changes on your database.

If you would like to preview the changes Liquibase will run on your database, you can run the command “liquibase update-sql” which will show you all the SQL that will be executed, which will include your queries as well as queries Liquibase will run to track what you’re applying. In the below screenshot, the commands with a green square are the ones I included in my changesets, and the commands with a blue square are the ones that Liquibase will run to track the changes.

If the preview looks correct, you can then run the command “liquibase update” which will apply all the previously viewed SQL queries to your database. You can verify the changes have been successfully applied by opening your database in your normal IDE (e.g. PGAdmin) and confirm the changes.

Method 2: Make Changes in your IDE

The process for making the changes in your IDE and then tracking those changes in Liquibase is almost exactly the same as the process we used to create the initial changelog file when setting up the project. It is as easy as making whatever database changes you want in your IDE and then opening the Liquibase CLI and running the “generate-changelog” command with either a new file name if you want to put it in a new changelog file, or use the same file name with the “--overwrite-output-file=true” argument.

If you are going to use the first option, writing to a new changelog file, it seems like you will then need to edit the file after creating it to remove any of the queries you didn’t create in your latest changes (since the command will try to recreate all objects in your database).

I’m not sure if this is the recommended workflow for tracking database changes, but it was a feature my team was hoping to get from the database change tracking tools we’ve been investigating, so I found a way to make it happen with Liquibase. If you want or need to have a “database-first” approach to change tracking (making changes directly to the database and then generating files to track that), instead of a “migration-first” type approach (making migration/change scripts and then applying that to your database), it appears that is technically possible with this tool.

Structuring your changelogs according to best practices

You can set up and structure your changelogs in any way that you would like, it’s your project, but Liquibase does have some ideas to help you stay organized. There are two different organization methods they recommend: object-oriented and release-oriented.

Object-oriented means you will create a different changelog file for each object or type of object being tracked in your database (e.g. one file for stored procedure changes, one file for table changes, etc.). I personally don’t like the idea of this organization method since it would mean you could be updating many files each time you make database changes, like if you’re updating procedures, tables, indexes, and views all for one release. However, having all the object types separated could also be a benefit, depending on how you normally complete your work.

Release-oriented means you make a new file for each release you make for your software/database. This method seems more familiar to me personally since it’s similar to the concept of migration scripts in Red-Gate’s SQL Change Automation or Flyway tools, where you can combine multiple database changes you’re making at once into a single file and deploy it all at once. This process could also work for organizations that use a more structured delivery system and not continuous delivery/agile development. That way you could put all of your changes for the week, month, or whatever development length into one file to associate with one particular release.

Whichever method you choose should work well if you set it up properly, the decision of which option to choose only depends on how you work and how you prefer to track that work.

Outstanding Questions

The first outstanding question I have about this tool right now is can you put DML scripts in your changelogs? That is something supported by Red-Gate’s SQL Change Automation and Flyway tools, which is what I’m used to. So far, I haven’t been able to figure out if that’s possible with Liquibase. Being able to deploy DML changes alongside a regular deployment really simplifies the process of some DMLs that you may need to run in your production environment, because it makes sure they go out with the deployment they are related to. An example of this is if you are adding a lookup type table (i.e. AccountTypes) and need to add the few records into that table after it’s created. Normally, you would need to run such a DML script manually after your deployment has completed. But SCA and Flyway allow you to put the DML in a deployable script that will automatically insert that data after the table is created. That of course can come with its own challenges, but it’s something I’ve really enjoyed with Red-Gate SQL Change Automation so I want it to be possible with Liquibase.

The second outstanding question I have about Liquibase is whether or not it can work with a secrets manager for database user passwords. How I set up my test project locally required me to put the password for the database user for Liquibase to be saved in cleartext in the properties file, which is not safe. For my purposes, it was fine since it’s a dummy database that doesn’t have any real data in it. But for production purposes, there is no way we would ever save a database user password in cleartext in a file. I haven’t had the chance to research this question more yet, so I’m not sure if the tool would work with a secrets manager or not.

Summary

When I first started working with Liquibase I was pretty frustrated with it since it was a totally new-to-me tool with documentation I didn’t find intuitive. But I kept working with it because I wanted to make it work for my organization and then just found it interesting to learn more about. I now feel fairly confident working with the tool in the ways my organization would need.

For being a tool with a completely free-to-use version, it seems like it has a good amount of features that many developers might need and could use for tracking and deploying changes to their databases. I can’t honestly say that I would prefer using this tool to Red-Gate’s SQL Change Automation or Flyway tools, which I currently work with, since they have a better use interface and seem to have more intuitive script creation processes. But Liquibase does seem like a useful tool that I could get used to working with given enough time. It’s worth a try to work with if your organization is working with a limited tool budget.

How To Set Up and Use Liquibase, Part 1

In a recent post, I gave an overview of what Liquibase is and what features it offers as a bare minimum. This week and next, I am going to give the step-by-step instructions I followed myself to learn how to properly set up a Liquibase project after playing with it for several hours. I personally found the documentation offered by Liquibase a little confusing, so this post is essentially the notes I took while figuring out what I really needed to do to set up a demo project with the tool. The aim of this post is not to be an exhaustive tutorial of the software, since I am far from an expert of Liquibase. Let me know in the comments if you found any of this useful or interesting!

What’s in this post:

Download Liquibase and install it on your computer

I won’t provide a link to it here (because that would be sketchy), but you can find the tar or Windows installer download for the free (open-source) version of Liquibase on the Liquibase website under Editions & Pricing > Open Source. I used the Windows installer version since I am working on a Windows machine. After you download the installer, you can run it to install the tool. I did not change any of the setup options (there were very few). The installation and setup were both extremely fast.

After the installer runs, you should be able to see where it was downloaded, which for me, was under Program Files. Now you can start working from that downloaded folder for the program, or you can copy the entire folder to another directory so you can play around with it without the fear of breaking something and then having to reinstall everything to fix issues. I made a copy in another location and worked from that copy (which was suggested by one of the Liquibase tutorials).

Verify that Liquibase is properly installed on your computer

The next important step is to verify the status of the tool to make sure it installed correctly. To do this, you want to open a command prompt window and navigate to the directory of the liquibase folder that you want to work with, then run the command “liquibase status“.

As you can see in the above screenshot, although I had an error returned by the “status” command (since I haven’t setup a Liquibase project yet), the tool did run and work (as evidenced by the giant Liquibase printout). Download and installation of the tool was successful.

Create your first Liquibase project

Creating your first project is simpler than I originally thought it was. I knew that at a bare minimum, a file called “liquibase.properties” needed to be created, but I thought that I had to do that manually (mostly because I did skip a page or two in the documentation that I thought weren’t needed). Although it is possible to create that file manually and then manually enter the necessary values to create a project, the easiest way to setup a new project and all of its necessary files is to use the “init” command, “liquibase init project“.

That command will run you through the process of setting up a new project, including setting the necessary values for the liquibase.properties file. If you are fine with using all default settings for your project, specify “Y” when prompted, otherwise specify “C” which will allow you to customize the values used by the project so you can work with your own database.

What I specified for each of the setup prompts:

  • Use default project settings or custom? Custom, “C
  • Relative path for the new project directory? In the main Liquibase folder, not in a subfolder, “./” (I would likely change this for an actual development project if I was creating one for work)
  • Name for the base changelog file?mydatabase-changelog“, but you can set it to whatever makes the most sense to you. I definitely wouldn’t use that format for an actual work project if we move forward with this.
  • Changelog format?sql” since I am working with a normal SQL database
  • Name of default properties file?liquibase.properties“, the recommended name, but you could change it to something else
  • JDBC URL to the project database?jdbc:postgresql://localhost:5432/postgres“. This points to the database called “postgres” on my local machine, which is a PostgreSQL database so it uses that type of connection string that I specified. Other database engines will have their own connection string format.
  • Username to connect to database? “Postgres”, the default user for a new PostgreSQL database, which is what the project should use to interact with the database. You should change this to a better-defined user just for the Liquibase tool.
  • Password? The password associated with the specified user. For real-world purposes, I hope we’ll be able to use a secrets manager tool to update that file for deployment purposes so that we don’t have to specify that in clear text in the file.

New Project Files

After you go through all of the above steps, Liquibase will have created several new files for you. I am not sure what most of them are for at this point, but I think some of the other files besides the properties file that were created are used for CI/CD integration (flow files). The properties file I created for my project now looks like this after my project setup:

If you have a license key for the enterprise/Pro version of the software, you can scroll to the bottom of this file and uncomment the line “liquibase.licenseKey:” then add your license key after the colon on that line.

You can also review the default changelog file created by Liquibase if you specified that during project initialization. For me, the file contains some sample changesets that don’t relate to my actual database:

These are the tables in my database for the Liquibase project:

To set up this file to represent your database, we’ll need to use the “generate-changelog” command on the project, and I’ll walk through that in the next post.

Database changes made with project initialization

As you can see from the last screenshot above showing what tables I have in my database, there are two tables called databasechangelog and databasechangeloglock. Those two tables will be created in your project database when you connect Liquibase to the DB because that is how the tool keeps track of what changelogs and changesets have been applied to the database already. You can also prevent some changes from being executed when the changelog is applied, and those changes will be tracked with the “lock” table.

You are now ready to start making changes to your database and tracking those changes with Liquibase!

A quick note about changelogs and changesets

The changelog is the main file (or files) that contain the queries you use to interact with your project database. You can have just one or you can have a series of changelogs, each used for a different part of your database.

Each changelog file contains what Liquibase calls the changeset, which is a single unit of change for your database, like an ALTER TABLE statement or any other DDL statement you can run against a database. A changeset is identified in the changelog file with a comment line which will contain the change’s author and then a change number which can essentially be any number you would like to track the changes. If you use the above steps to have Liquibase create your first changelog file for you, it will create a randomly generated version number for each of the queries listed in the file. This number will determine in what order the changes will be applied.

Summary

In this post, I covered how you go about setting up your first Liquibase project and what each of commands and files related to that means. This tool is surprisingly simple to work with, as can be seen with the initial setup process. Next week, I will be covering what you do now that you have the tool successfully installed and set up on your computer.

What’s in the next post:

  • Create the baseline changelog file for your database
  • Adding and tracking ongoing database changes
    • Method 1: Adding scripts to changelog file
    • Method 2: Making changes in your IDE
  • Structuring your changelogs according to best practices
  • Outstanding questions
  • Summary