If you haven’t seen it already, I had two previous posts listing differences I have found between Postgres and SQL Server while learning to develop in the Postgres environment. As I am continuing to learn, I am finding even more differences that I think are interesting, and they are going to be shared below.
What’s in this post
Creating a temporary table
The difference between SQL Server and Postgres for creating a temp table is small, but significant. Mostly because my company uses temp tables frequently in our stored procedure and ETL code, so as we migrate off of SQL Server, we’re going to have to do a lot of updating to move temp table statements to Postgres. The syntax for SQL Server is to use a hashtag/pound/number symbol in front of the table name, or to use a double-pound symbol to create a global temporary table, like “CREATE TABLE #MyTable
” or “CREATE TABLE ##MyGlobalTempTable
“. In Postgres, the syntax requires that you use the TEMP
or TEMPORARY
keyword to make the temp table, like “CREATE TEMP[ORARY] TABLE my_table
“.
You also have to use the TEMP
or TEMPORARY
keyword when creating a temp table in Postgres by doing a SELECT INTO
statement, such as SELECT column1, column2 INTO TEMP TABLE my_table
. In SQL Server, that same command would look like SELECT column1, column2 INTO #MyTable
.
General differences between temp tables in each engine
Not only is the way you create temporary tables different between Postgres and SQL Server, but how the tables are actually stored on the database is completely different between the two as well.
In SQL Server, any temporary objects are stored in a separate database called tempdb
. When creating a temporary table, you can specify whether it’s localized to your current session only or whether it is a global temporary table that can be accessed from different sessions or query windows, as I mentioned above. To do the first, you specify the table name as #MyTable
, then the do the latter, you specify the table name as ##MyTable
.
In Postgres, temporary tables are not stored in a separate database. Instead, they are put into their own system schemas whose names are based on numerical values (ex. pg_temp_1
, pg_temp_2
, etc.), so it’s harder to see the specific location where your temp table is being stored. You also are generally unable to access a temp table in one session from a different session, so there is no concept of a global temp table in Postgres.
One similarity of temp tables between the two engines, though, is that they are dropped/deleted in both as soon as the session in which they were created ends.
Getting the current UTC time
In SQL Server, there are two built-in functions that allow you to get the current UTC time, GETUTCDATE()
and SYSUTCDATETIME()
. The former returns the time in type DATETIME
, and the latter returns the time in type DATETIME2
(more precise).
Postgres also has multiple functions and methods of getting the current UTC time, but they look much different from each other than the SQL Server options look. While there are more than just these two options, the most common ways to get the UTC time are the following:
SELECT NOW() AT TIME ZONE 'utc';
SELECT CURRENT_TIMESTAMP AT TIME ZONE 'utc'
Although the two options look different, they will return the same value. I’ve had some issues trying to use one versus the other in different scenarios in the SQL code I’ve written for Postgres, so it seems like there may be times where it’s better to use one over the other, but I’m just not able to differentiate those scenarios right now. There are also other options for getting the current UTC time, but I won’t cover those here.
Updating a table with a JOIN
One of the biggest syntax differences between T-SQL and pgSQL that has really gotten me confused so far is the formatting of an UPDATE
statement that includes a JOIN
to a different table. To me, the syntax of T-SQL makes a lot more logical sense and the pgSQL syntax doesn’t seem as intuitive. The T-SQL syntax is as follows:
UPDATE p
SET p.MyColumn = s.MyColumn
FROM person AS p
INNER JOIN staging AS s
ON s.MyColumn2 = p.MyColumn2
Then the syntax for pgSQL is:
UPDATE person
SET p.my_column = s.my_column
FROM staging
WHERE staging.my_column2 = person.my_column2
To me, the pgSQL syntax doesn’t scream “joining to a different table”; it really looks to me like a simple update statement without a join. Yet the join is there. Also, it doesn’t seem to let you use aliases for the query, which I also dislike since I use aliases for every query I write. I wish I could change the standard for the pgSQL syntax, but obviously that isn’t within my power so I’m going to have to learn to live with it.
Making a column nullable
Unlike the pgSQL syntax for updating a table using a join to a different table, I actually like the syntax of making a column nullable in pgSQL more than I like the T-SQL syntax. The pgSQL syntax seems much more clear and straightforward for this action. The syntax in Postgres for making a NOT NULL
column nullable is as follows: ALTER TABLE my_table ALTER COLUMN my_column DROP NOT NULL;
The T-SQL syntax for the same action is less clear, since you are basically redefining the column like you would if you were adding it, instead of specifying what you are changing. If you had a column that was set to NOT NULL
in a table, you would write out the whole definition of the column again but write NULL
instead of NOT NULL
. For example: ALTER TABLE Mytable ALTER COLUMN MyColumn INT NULL;
I’m not fond of that T-SQL syntax because you can’t tell at a glance what is changing with the column definition unless you already knew what it was before running the statement. Since that same statement would also be the same syntax for changing the data type for the column if the column was already set to be nullable.
Data type for automatically updated binary numbers
In SQL Server, there is a data type called TIMESTAMP
or ROWVERSION
that is an automatically updating binary value that is added to a record when that record is updated or inserted. (The TIMESTAMP
synonym for the ROWVERSION
data type has been deprecated.) My company currently uses this data type heavily to help identify changed data quickly for our ETLs. As we are looking at moving everything into the Postgres world, we have found that Postgres does not have any built-in functionality similar to the ROWVERSION
column, at least not at the precision we would need it to be (you can read about system columns here). You of course can store binary data within a table, but there is no data type for columns that will automatically increment a binary value when data is changed with the same precision as SQL Server’s ROWVERSION
.
There do seem to be some ways people have replicated that functionality using custom triggers/functions that we are considering, but I haven’t looked into it that much personally so I cannot yet speak to how well that would work to replicate SQL Server’s ROWVERSION
functionality. So we are either going to have to add these custom triggers/functions or find another reliable and fast way to identify changed data between source and target for our ETLs. Soon, I am hoping to do a blog post about the possible Change Data Capture (CDC) options for Postgres that we are considering.
Conclusion
There are so many weird quirks and differences between SQL Server and Postgres that I am having to learn as I grow in the Postgres development space. While I have already listed so many of them across several posts now, I am sure there will be many more to come as I keep learning. And I will continue to compile a list of the differences and share them here on my blog as I find them.
Sources
- https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-temporary-table/
- https://www.postgresql.org/docs/current/datatype-datetime.html
- https://www.postgresqltutorial.com/postgresql-tutorial/postgresql-update-join/
- https://www.postgresql.org/docs/7.3/sql-altertable.html