Tag: Python (page 1 of 1)

A Note on Python Fernet Encryption

In last week’s post, I wrote about how I use the Fernet functions of Python’s cryptography library to encrypt and decrypt my “.env” files for projects. After finishing the post, I realized I should research the security and methodology of the Fernet algorithms since I am using them. In my research, I found this StackOverflow question that had some good answers about the type of encryption Fernet is using. Check it out to learn a little more about what Fernet is using.

Is the Fernet cryptography module safe, and can I do AES encryption with that module?

Using an Encrypted Env File with Your Python Script to Secure Data

Have you ever created a Python script that needed to pass secrets to an API or something else in order to work, and wondered how you can keep those secrets… well, secret while still using them as needed? Every programmer should know that storing passwords and other secrets in plain text in a code file is bad practice, but sometimes it doesn’t feel like there is another option that is easy to use and doesn’t take a ton of extra time to implement. I am here today to tell you that there is another option that helps you to secure your secrets while still being able to use them whenever needed without much hassle. I was the first person to use this methodology for scripts in my current team, and it has now become a team standard to use this method for storing, retrieving, and securing secrets in code.

What’s In This Post

What is an “.env” file?

The “.env” file is a configuration file that you can use to store any configuration settings you need for your program, including which environment the script is operating in, as well as a place to store sensitive information such as API keys and IDs. In my own Python projects, I use the “.env” file to store ID values associated with the App Registrations I’m using, but I do not store actual secrets in the file. While it is theoretically safe to store secrets in the “.env” file as long as you encrypt that file, my team has decided we would prefer to pull secrets from an Azure Key Vault when needed instead of storing it in any text file, even if that file is encrypted.

An example of what a “.env” file might contain is:

ENVIROMENT=DEVELOPMENT
API_CLIENT_ID=1234-56789-101112123
API_TENANT_ID=9876-5432-103136546
STORAGE_ACCOUNT=blobFileStorageAccount

The values following the key names can be surrounded by quotation marks or not, it does not matter for this. I prefer to not put quotes around anything because I think it looks cleaner. But doing so or not does not affect how you interact with the values from this file.

There is a lot more information surrounding environment variables and how you can use them with your Python scripts than I can cover here. But if you would like to learn more, I felt like this article covered it well. Using .env Files for Environment Variables in Python Applications by Jake Witcher.

Python Cryptography Module

There is a Python library called “cryptography” that you can install and import into your projects, and this library aims to make the “recipe” layer of encryption easy for normal developers to use. This library also has a layer that it calls the “hazmat” layer, since the functions in that layer require more in-depth knowledge of how cryptography works. If you would like to read more about this library, you can check out the documentation here.

Since I do not claim to be an expert on cryptography in any sense, I chose to stick with the “recipe” layer of the cryptography library, using the standard Fernet cryptography functions. Starting to work with this symmetric encryption is extremely easy, it only requires the following steps.

  1. Create a new encryption key: key = Fernet.generate(key)
  2. Create a Fernet object using the previously created key, to encrypt and decrypt files: f = Fernet(key)
  3. After you have created that object, you can encrypt and decrypt whatever you would like at will. You can encrypt variables within your code, or you can encrypt whole files.

    How to Encrypt and Decrypt Files with Fernet

    For my own Python coding, I have made it a standard to always include a script called “EnvEncryption.py”, where I have two functions that I call as needed to encrypt and decrypt files. Since I created the file for my very first project, I’ve since just been able to copy and past the script into each new project I write so I don’t have to rewrite the same code over and over. Below are my two cryptography functions.

    Encryption Function

    from cryptography.fernet import Fernet, InvalidToken
    import sys
    
    def encrypt(filename, key):
    	try:
    		f = Fernet(key)
    		with open(filename, "rb") as file:
    			file_data = file.read()
    		encrypted_data = f.encrypt(file_data)
    		with open(filename, "wb") as file:
    			file.write(encrypted_data)
    	except InvalidToken:
    		errorMessage = "Token error has occurred while trying to encrypt the file \"" + filename + "'""
    		errorLine = sys.exc_info()[2].tb_lineno
    		print("Exception: " + errorType)
    		print("Error Message: " + errorMessage)
    		print("Error line: " + str(errorLine))

    What the above code is doing is first creating the Fernet encryption object with f = Fernet(key). After creating that object from the key that was passed to the function, it will read the data from the specified file, saving it into the variable file_data. After getting that data into the variable, that data is then passed through the Fernet encrypt function using the custom key/Fernet object, and is stored in the encrypted_data variable. Once the data has been encrypted in that variable, it is then written back to the same file, thus making the file encrypted.

    Decryption Function

    def decrypt(filename, key):
    	try:
    		f = Fernet(key)
    		with open(filename, "rb") as file:
    			encrypted_data = file.read()
    		decrypted_data = f.decrypt(encrypted_data)
    		with open(filename, "wb") as file:
    			file.write(decrypted_data)
    	except InvalidToken:
    		print("File not encrypted, cannot decrypt")

    My decryption function is almost the same as the encryption function, just in reverse. The function takes a file and encryption key as input, then generates the Fernet encryption object from the key. The next step is to read the encrypted data from the file into a variable, then pass that data into the Fernet decrypt function and save the results to the variable decrypted_data. The function then opens the file again and writes the newly decrypted data into the file.

    Creating the Key for the First Time

    As you probably noticed from my code above, both the encrypt and decrypt functions take a “key” variable as required input for encrypting and decrypting the file, yet I didn’t say where that was coming from.

    Each time I start a new project that will use this encryption and decryption method, I write a temporary script to generate a key for my functions to use, and then I save that key to an Azure Key Vault so I can access it later and pass the value into the script from a command line when the script is called. So if you were to run the above code as-is without generating your own key first, neither of the functions would work.

    Before you begin using the above scripts, you need to run the following code to generate a key for yourself to use for encryption and decryption with your project.

    Note: After you encrypt your file/data for the first time with the key you are about to generate, you MUST have the same key available to decrypt the data. If you lose the key, you will not be able to easily decrypt your file (unless you’re great at hacking :D). Make sure you save your key in a safe location for future reference after you encrypt your data.

    Note: Direct quote from the documentation: “Generates a fresh fernet key. Keep this some place safe! If you lose it you’ll no longer be able to decrypt messages; if anyone else gains access to it, they’ll be able to decrypt all of your messages, and they’ll also be able forge arbitrary messages that will be authenticated and decrypted.”

    key = Fernet.generate_key()
    print(key)

    That code snippet above will print something that looks like b'tWEVGCX-dvt9v-3MZTSehHMD2-2IpJChgit04QZaAy8='. The keys that are generated are of type byte, so when you print out to the console it lets you know the string is bytes by surrounding it with b''. To save that key somewhere safe, you don’t need the b'' part, just the characters within the quotes.

    Now that you have this key, you can use it with the encryption and decryption functions as needed.

    How to Access Secrets from Encrypted “.env” File

    Once you’ve successfully encrypted your .env file, how are you supposed to use the data you stored within it when you need it? The answer is that you will need to decrypt your file whenever you need to pull data from it, then immediately encrypt the file again after you’ve pulled what you need from it. You want to keep the file decrypted for as little time as possible to prevent any possible sniffing of the data that’s in it. I know a good hacker will be able to get the data if they’re determined, but this model is assuming that you’re not in a high-risk or high-security system. If you are in such a scenario, I strongly suggest that you use stronger safety systems than what I have presented here.

    An example of what this decryption/encryption format looks like is the following:

    ee.decrypt(envFile, encryptionKey)
    # Get the data from the file
    ee.encrypt(envFile, encryptionKey)

    For more information on how to extract key-value pairs from an .env file, please see next week’s post where I cover how I write my code to do that.

    Summary

    In this post, I covered all the details you need to know in order to start using an encrypted “.env” to store your environment/processing values that you want to keep more secure. The key to doing it well is to use the standard Python cryptography library and the Fernet functions. Make sure to always encrypt your file again as soon as possible after decrypting it and pulling the data you need from the file. Happy coding!

    How to Get Role Assignments for a Resource in Azure Using Python

    If you are in a role similar to one I am in now, where there is heavy oversight into certain data the company owns due to federal regulations, you may get assigned a similar project where you need to automatically retrieve various ownership and access information about one or more databases in your Azure cloud environment. What I’ve found through my own process of completing such a project is that the documentation Microsoft offers for its Python SDKs is abysmally low, and there is often nothing helpful being offered about the class you are trying to work with. But I have finally figured it out with my own determination, trial, and effort so I decided I would share it with everyone so that hopefully others don’t have to go through the same amount of effort to figure out something that should be fairly easy to accomplish.

    What’s In This Post

    The Task

    Get a list of the “role assignments” for a SQL Server resource in the Azure Portal, specifically the Owner assignments. According to this Microsoft article, the “Owner” role “grants full access to manage all resources, including the ability to assign roles in Azure RBAC.” Since that is a lot of freedom, our auditors want to make sure they know who has such access so they can determine if those people should have access.

    Necessary Python Modules

    There are two modules that you will need to download and install in order to programmatically generate the list of “Owner” assignments on a resource in Azure. The first is from azure.identity import ClientSecretCredential, which is how we are going to create credentials to use with the Authorization Management client. The second is from azure.mgmt.authorization import AuthorizationManagementClient, which is the module that will allow us to query the Authorization API that Microsoft provides, so that we can get that list of role assignments for a particular resource in the cloud environment.

    Required Permissions

    In order for you to be able to query this data we are looking for, you will need to create an app registration that has been given the appropriate permissions on either the subscription or resource group level, depending on the security requirements of your organization. Lowest access is always preferable, but giving the permission at the subscription level would then allow your app registration to perform this same action on any and all resources you want with just one permission assignment. The permission you need to assign to your app registration at either the Resource Group or Subscription level is “API Management Service Reader Role”, which can be assigned under “Access Control (IAM)” for the Subscription or Resource Group.

    Data Pieces You Will Need

    There are several pieces of information you will need to know and feed in to your Python script to get it to retrieve the role assignments for a given resource.

    • Tenant ID, Client ID, and Client Secret of the app registration you are using with your script
      • These three items are used to generate the credential needed for the Authorization API using the Azure Identity ClientSecretCredential
      • To keep these vital pieces of information secure, you can review my upcoming blog post about encrypting your IDs and keeping your secrets safe while still using them
    • Subscription ID for the subscription your resource is in, even if you are getting the role assignments at the resource group or resource level (not Subscription level)
    • Resource Group name for the group your resource is in
    • Resource Namespace. This item is a bit more confusing to achieve since there really isn’t any clear information about what it means, at least in my own research. (If you can find a Microsoft document talking about this, please let me know!) What I found through my own trial and error is that this value, at least if the resource you’re getting the owners for is a SQL Server resource, will be "Microsoft.Sql". I ultimately figured this out by looking at the full Resource ID for one of my SQL Server instances and seeing that value there. You can also use the Resource Explorer in the Azure portal and look in the “Providers” list to see that there are hundreds of different namespaces that can be used for getting the role assignments for any resource.
      • In this screenshot below, taken from the Properties page of my SQL Server resource, you can see the full ID for the resource, which includes "Microsoft.Sql/servers" which is the namespace/resource type for this resource
    Screenshot showing the full Resource ID for a SQL Server resource, which includes the Resource Namespace and the Resource Type
    • Resource Type. Similarly to getting the resource namespace, it’s hard to find the official value that should be used for Resource Type when querying the Authorization Management API. But I figured it out from the Resource ID like I did the namespace. The Resource Type can also be found under the Resource Explorer “Providers” list, but just looking at that did not clarify for me what value I should be passing in. I once again found out through trial and error what value would work for this parameter.
    • Resource Name. The name of the resource you want to get role assignments for. If you have a SQL Server called “prod_dta”, then you would pass that value in for this parameter.
    • Optional: The ID value for the role type you are looking for, which in my case, was the Owner role. This is a standard ID that is assigned to the Owner role across your organization (or maybe it’s the same for all of Microsoft…). See subsection below for how to find this ID value.

    Getting a Role ID Value

    In the Azure portal, navigate to any resource and then go to the “Access Control (IAM)” page:

    In the IAM page, click on the “Roles” tab then click on “View” for the Owner role (or whichever role you would like to get the ID for)

    In the pane that opens with the details of the role, you want to click on the “JSON” tab, which is where you will find the ID you need. The full ID of the role is longer than just the GUID we are looking for, so make sure to only copy out the GUID from the end of the string (which is what’s covered in gray on the screenshot below). Save this ID for later.

    The Code

    Once you have figured out and gathered all the required pieces of data needed to get role assignments for a resource in Azure, the code for getting that data is fairly simple. Below is all of my code which I used to get just the people with Owner role assignments from a resource.

    from azure.identity import ClientSecretCredential
    from azure.mgmt.authorization import AuthorizationManagementClient
    import json
    
    
    def create_auth_management_client(tenantID, clientID, clientSecret, subscriptionID):
        credentials = ClientSecretCredential(tenant_id=tenantID, client_id=clientID, client_secret=clientSecret)
        authMgmtClient = AuthorizationManagementClient(credential=credentials, subscription_id=subscriptionID)
    
    
    def list_resource_owners(authMgmtClient, serverName):
        resourceGroupName = "MyResourceGroup"
        resourceNamespace = "Microsoft.Sql"
        resourceType = "servers"
        resourceName = serverName
        ownerRoleDefID = "f865f414-b2bf-4993-8f94-51b055da4356" 
        # This is a random GUID value that I replaced with the actual GUID for the Owner role in our system. Since I'm not sure if those are unique to organizations or not, I have taken out the actual value for this example.
        
        permissions = authMgmtClient.role_assignments.list_for_resource(resource_group_name=resourceGroupName, resource_provider_namespace=resourceNamespace, resource_type=resourceType, resource_name=resourceName)
        principalList = []
    
        for item in permissions:
            stringItem = str(item)
            if ownerRoleDefID in stringItem: #If the current item is for an Owner role
                # Replace single quotes with double quotes in string
                replacedStringItem = stringItem.replace("'","\"")
    
                # Replace None with "None"
                replacedStringItem = replacedStringItem.replace("None","\"None\"")
    
                # Get index at which to cut off the string
                stopIndex = replacedStringItem.find(", \"principal_type\"")
    
                # Substring the string item to stop after getting the princiapl id
                substring = replacedStringItem[0:stopIndex]
    
                # Add closing bracket to end of the substring
                substring = substring + '}'
                
                # Load the item into a JSON-format object
                jsonItem = json.loads(substring)
    
                # Get just the principal ID value from the JSON object
                principalID = jsonItem['principal_id']
    
                # Add the principal ID to the principalList object
                principalList.append(principalID)
        return principalList

    Let’s break that down further

    from azure.identity import ClientSecretCredential
    from azure.mgmt.authorization import AuthorizationManagementClient
    import json

    The above code snippet shows the 3 libraries you need to import for this code to work.

    1. ClientSecretCredential from Azure.Identity: This module/library allows you to create a login credential for your Azure environment.
    2. AuthorizationManagementClient from azure.mgmt.authorization: This library is what gives you the modules, functions, and classes needed to interact with role assignments in Azure, which includes functions to create, update, and delete role assignments with your Python code. I only used the “get” functionality because I only wanted to retrieve assignments, not make new ones.
    3. JSON: I use this library to manipulate the data returned from the Auth Management API calls in a JSON format, which is easier to work with to retrieve the value associated with a particular key.
    def create_auth_management_client(tenantID, clientID, clientSecret, subscriptionID):
        credentials = ClientSecretCredential(tenant_id=tenantID, client_id=clientID, client_secret=clientSecret)
        authMgmtClient = AuthorizationManagementClient(credential=credentials, subscription_id=subscriptionID)

    This function, create_auth_management_client takes input of the Tenant ID, Client ID, and Client Secret of the app registration making the calls to Authorization Management, as well as the Subscription ID that the resource we want to get role assignments for is in, and uses the ClientSecretCredential function from the Microsoft library to generate such a credential. It then makes a client to interact with the Authorization Management API with Azure using that credential. The authMgmtClient object will be the object we call methods on to get the data we need.

    We then come to the definition of the function list_resource_owners, which is where I pull the role assignment data from Azure, then parse the data returned into a usable format and pull out just the role assignments that contain the Owner role ID value. The beginning of the function first sets important data values for the Authorization Management Client to use; you could pass these in to your own function if you want instead of hard-coding the values within the function.

        resourceGroupName = "MyResourceGroup"
        resourceNamespace = "Microsoft.Sql"
        resourceType = "servers"
        resourceName = serverName  # Passed in as parameter to function
        ownerRoleDefID = "f865f414-b2bf-4993-8f94-51b055da4356" # Replace with correct value from your environment

    After defining those important values, the code then finally makes a call to Authorization Management:

    permissions = authMgmtClient.role_assignments.list_for_resource(resource_group_name=resourceGroupName, resource_provider_namespace=resourceNamespace, resource_type=resourceType, resource_name=resourceName)

    If you were to then print(permissions), you would get a very long string of role assignments for the given resource (but permissions is an Authorization Management object, not a true string). The results look something like this (except it would contain the GUIDs of all the objects, not the all-zero GUIDs I redacted with):

    {'additional_properties': {}, 'id': '/providers/Microsoft.Management/managementGroups/00000000-0000-0000-0000-000000000000/providers/Microsoft.Authorization/roleAssignments/00000000-0000-0000-0000-000000000000', 'name': '00000000-0000-0000-0000-000000000000', 'type': 'Microsoft.Authorization/roleAssignments', 'scope': '/providers/Microsoft.Management/managementGroups/00000000-0000-0000-0000-000000000000', 'role_definition_id': '/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Authorization/roleDefinitions/00000000-0000-0000-0000-000000000000', 'principal_id': '00000000-0000-0000-0000-000000000000', 'principal_type': 'User', 'description': None, 'condition': "((!(ActionMatches{'Microsoft.Authorization/roleAssignments/write'})) OR (@Request[Microsoft.Authorization/roleAssignments:RoleDefinitionId] ForAnyOfAllValues:GuidNotEquals {00000000-0000-0000-0000-000000000000, 00000000-0000-0000-0000-000000000000, 00000000-0000-0000-0000-000000000000})) AND ((!(ActionMatches{'Microsoft.Authorization/roleAssignments/delete'})) OR (@Resource[Microsoft.Authorization/roleAssignments:RoleDefinitionId] ForAnyOfAllValues:GuidNotEquals {00000000-0000-0000-0000-000000000000, 00000000-0000-0000-0000-000000000000, 00000000-0000-0000-0000-000000000000}))", 'condition_version': '2.0', 'created_on': datetime.datetime(2024, 6, 14, 21, 30, 43, 130165, tzinfo=<isodate.tzinfo.Utc object at 0x0000020C0D71D490>), 'updated_on': datetime.datetime(2024, 6, 14, 21, 30, 43, 130165, tzinfo=<isodate.tzinfo.Utc object at 0x0000020C0D71D490>), 'created_by': '00000000-0000-0000-0000-000000000000', 'updated_by': '00000000-0000-0000-0000-000000000000', 'delegated_managed_identity_resource_id': None}

    You would have one of these sections of data for every single role assignment on the specified resource.

    An important note is that there are several different classes you can call a list_for_resource function on from the authMgmtClient, but the only one that gets this data for the entire resource, and not just for the role assignments that apply to the caller, you need to use the role_assignments class. Here is a link to the Microsoft documentation, it honestly wasn’t that useful to me, but may be useful to you.

    The final step of the processing of role assignments is to format it into something that is more useful, a JSON object, which is what the following section of code aims to do.

        principalList = []
    
        for item in permissions:
            stringItem = str(item)
            if ownerRoleDefID in stringItem: #If the current item is for an Owner role
                # Replace single quotes with double quotes in string
                replacedStringItem = stringItem.replace("'","\"")
    
                # Replace None with "None"
                replacedStringItem = replacedStringItem.replace("None","\"None\"")
    
                # Get index at which to cut off the string
                stopIndex = replacedStringItem.find(", \"principal_type\"")
    
                # Substring the string item to stop after getting the princiapl id
                substring = replacedStringItem[0:stopIndex]
    
                # Add closing bracket to end of the substring
                substring = substring + '}'
                
                # Load the item into a JSON-format object
                jsonItem = json.loads(substring)
    
                # Get just the principal ID value from the JSON object
                principalID = jsonItem['principal_id']
    
                # Add the principal ID to the principalList object
                principalList.append(principalID)
        return principalList

    What that above code is doing is that for every item in the returned permissions object (each line like the example I showed above), formatting is going to be applied.

    1. The current item will be converted to a true string object using str()
    2. If the current item contains the ID of the owner role (which we retrieved earlier), then we will keep processing it. If it doesn’t contain that ID, then the item will be ignored and processing started on the next item in the object.
    3. The first formatting change is to replace any single-quote characters with double-quote characters, which is what JSON expects. Example: 'name': '00000000-0000-0000-0000-000000000000' will change to "name": "00000000-0000-0000-0000-000000000000"
    4. The next formatting change is to replace any instances of None with "None", which also better fits JSON formatting.
    5. Because of weird formatting that happens later in each line of the permissions object, I opted to cut off the string after the final piece of data that I needed, which was the “principal_id” value. So the line stopIndex = replacedStringItem.find(", \"principal_type\"") is retrieving the character index at which the string “, principal_type” is found, so that I can cut off the string right before that value.
    6. The next step of formatting is to perform a string slice on the current item to cut it off at the designated point. That means the above example data will now look like {'additional_properties': {}, 'id': '/providers/Microsoft.Management/managementGroups/00000000-0000-0000-0000-000000000000/providers/Microsoft.Authorization/roleAssignments/00000000-0000-0000-0000-000000000000', 'name': '00000000-0000-0000-0000-000000000000', 'type': 'Microsoft.Authorization/roleAssignments', 'scope': '/providers/Microsoft.Management/managementGroups/00000000-0000-0000-0000-000000000000', 'role_definition_id': '/subscriptions/00000000-0000-0000-0000-000000000000/providers/Microsoft.Authorization/roleDefinitions/00000000-0000-0000-0000-000000000000', 'principal_id': '00000000-0000-0000-0000-000000000000'
    7. Then to return that string to standard JSON formatting, we are going to append a closing curly bracket to the string
    8. After getting the string formatted nicely so that it can be a JSON object, I use json.loads(substring) to create a true JSON object from that string
    9. Once the string has been turned into a JSON object, I use standard JSON string extraction formatting (principalID = jsonItem['principal_id']) to pull just the principal_id value out of the JSON object. If I had not gone this route, it would have been much more difficult to do string parsing to correctly get the value of the principal_id key.
    10. The final step of this function is to add that retrieved principal_id value to the principalList List and then return that List to the caller once all lines of the permissions object have been extracted, formatted, and the principal_id values retrieved.

    Summary

    There isn’t much documentation and there aren’t many references online about how to retrieve role assignment data for a resource from Azure, so I hope this tutorial has been helpful to you. My goal for this post was to make it so that others did not have to go through all the trial and error I had to go through to figure this out. I think that once you get an example of the data that the API is expecting you to pass in, it becomes much easier to query not only SQL Server resources but also any other type of resources in Azure. Let me know in the comments below if you found this tutorial helpful or if you have any remaining questions or misunderstandings on the topic that you would like help with.

    Taking a Break and Switching Things Up

    If you know me personally or follow me on LinkedIn, then you know that several weeks ago I made the decision to leave my friends and colleagues at Scentsy to take a new role at Boise Cascade. I made this decision for a lot of different reasons, but the main one being that I felt like I needed more of a challenge and wanted to get back into development work rather than focusing on project organization and management.

    I am just about through my third week in my new role and it has been a scary, tiring, interesting, and even a bit fun, experience. Starting any new job comes with some new stress as I adjust to the environment and coworkers, and while I’m not completely out of the adjustment phase yet, I am already becoming more comfortable in this role so thought I should come here and give an update.

    Previously, this blog has been heavily focused on cloud development with AWS since that is what I was developing in daily at Scentsy. But with my new role, I am drinking from a firehose to start learning cloud development in Azure, so my posts will be switching to focus on that platform instead. As of right now, I have no plans to do any personal development projects in AWS to be able to continue content in that space. I am really excited to be learning about Azure though, and am equally excited to start sharing what I learn about it here on my blog, since it seems like the Azure documentation is just as hard to understand, interpret and use as the AWS documentation is. So I will have plenty of my own learnings to share.

    In my new role, I am also jumping into the deep end with scripting in Python which I am loving (but of course having issues with just like any other platform), so I will be starting to share some of my learnings about that space as well. Plus, there’s always a possibility I’ll be learning some other new technology because of how diverse my new role is, so my content going forward might be a lot more diverse than it was in the past.

    It may take a few more weeks for me to get into the swing of things enough at my new job to feel like I have something worth posting about here, so I hope you’ll stick around and give the new content a read once it comes out. And as always, if there is anything in particular you would like me to write about, let me know in the comments and I will try to get to it!

    Differences Between Postgres & SQL Server

    A couple of weeks ago, one of my coworkers sent our group an article from Brent Ozar that they found interesting. The article is “Two Important Differences Between SQL Server and PostgreSQL”, which is a short and sweet discussion about two differences between the database engines that might trick or confuse a developer who’s learning PostgreSQL after working with SQL Server (someone exactly like me). After I read his post, I went down a rabbit hole of researching all of the little things left unsaid in the article that I hadn’t thought about yet since I’m only beginning my transition into Postgres. I recommend you go read that article, but here are the takeaways I learned from it as well as the extra information I learned from my additional googling.

    CTEs are very similar but also different in Postgres vs. SQL Server

    As Brent Ozar says in his post, CTEs are fairly similar between Microsoft SQL Server (MSSQL) and Postgres, but the main difference is in how the queries are processed by the engines which can affect how you should format your queries and their filters. If you don’t change your CTE writing style when switching to Postgres, you may end up with performance issues. This scares me a little simply because I’m sure this isn’t the only “gotcha” I’ll encounter while transitioning our systems to Postgres. However, I will always look at the whole project and each new challenge as a learning experience so that I don’t get overwhelmed by the mistakes I’m bound to make.

    The one good thing I learned in my research about Postgres CTEs after reading Brent’s article was that I can continue to use Window Functions just the same with them, which is usually why I use CTEs with my current work anyway. I use the ROW_NUMBER function with CTEs quite often, so I was happy to see that the function also exists in Postgres.

    I have taken simple query formatting for granted my entire career

    After reading Brent’s post and the other subsequent articles to learn more about Postgres DO blocks, I came to the realization that I’ve really taken for granted the nice and simple query formatting, for both ad hoc and stored queries, that SQL Server and T-SQL provide. Even if I’m running a one-time query for myself to get data needed to develop a stored procedure, I use IF statements quite frequently and that is so simple to do with SQL Server. I only need to write “IF… BEGIN… END” and I have a conditional code block that I can run immediately to get the data I need.

    Doing that in Postgres is a little more complicated, especially if I’m wanting these conditional statements to give me a result set to review since that isn’t possible with DO blocks in Postgres. In order to run conditional statements with Postgres, you need to use a DO block to let the query engine know that you’re about to run conditional logic. That itself is simple enough to adapt to, but there was one aspect of these code blocks that confused me for too long, which was the “dollar quote” symbols often seen in Postgres code examples. These were confusing because I wasn’t sure why they were needed or when to use them and everyone used to working with Postgres took that information for granted and never explained it with their examples.

    In Postgres, when you define a function or DO block, the code within that block must be encapsulated by single quotes, which is different from SQL Server which doesn’t require any kind of quote encapsulation for the code contained in an IF statement. Most developers use the dollar quote styling method instead of single quotes because doing so prevents you from having to escape every single special character you may be using in your code block, such as single quotes or backslashes.

    DO $$
    <code>
    END $$;

    Additionally, it is possible to specify tags between the beginning and ending double dollar signs (ex: DO $tag$ <code> $tag$) which can help you organize your code to make it easier to read. As far as I can tell, the tags do not provide any function besides styling code for easier reading.

    Once I understood these dollar quotes better, I felt more confident about being able to write Postgres SQL code. But there were still a few other catches with the code that I wasn’t expecting, coming from MSSQL.

    Postgres is very specific about when you can and can’t return data from your query/code

    Of all the Postgres info that I have learned so far, I think this one is going to get me the most due to how I write queries on a day-to-day basis. As I said above, I write a lot of ad hoc queries throughout the day to view and validate data for many purposes. Postgres’ rules about what you can return data from are much different than MSSQL. In SQL Server, it seems like you can return/select data from anywhere your heart desires, including functions, procedures, conditional statements, and ad hoc queries. Postgres only allows you to return data from two of those options, functions and ad hoc queries (as long as it doesn’t include conditional logic).

    If you have a chunk of code that you want to save to run again later in Postgres, and that code returns data with a SELECT statement or a RETURN statement, you must use a function and not a stored procedure. Stored procedures in Postgres are only meant to perform calculations or to do anything else besides return data. If you put a SELECT statement in a procedure in Postgres, it will NOT display that data like you would expect or like you are used to with SQL Server. This will affect my organization greatly when we move to Postgres because we have a lot of stored procedures whose sole purpose is to return specified data with a SELECT statement. All of those procedures will need to be converted to Postgres functions (which isn’t that big a deal, but it definitely needs to be kept in mind).

    You also are unable to return data from a DO block with conditional statements as Brent Ozar mentioned in his post. If you want to have conditional logic that returns data for you to review, that will need to be done with a function. This one is a little crazy to me since that is the opposite of what’s possible with SQL Server. I can run this statement without issue as an ad hoc query in SQL Server, but it wouldn’t work in Postgres DO block:

    DECLARE @Today VARCHAR(10) = 'Friday';
    
    IF (@Today = 'Friday')
    BEGIN
    	SELECT 1 AS TodayIsFriday
    END
    ELSE
    BEGIN
    	SELECT 0 AS TodayIsFriday
    END

    This will take getting used to.

    Postgres has the ability to understand other programming, scripting, and query languages

    This is the most interesting new thing I learned about Postgres with this research. It is fascinating to me that this database engine allows you to easily write and execute code written in other languages and that it will work just as easily as normal SQL code. When you write a function in Postgres, you are able to specify what language you are writing the code in, which can be many different languages. If you don’t specify any, it will default to standard SQL.

    Postgres comes with four different language modules installed, which are PL/pgSQL, PL/Tcl, PL/Perl, and PL/Python. The “PL” in those module names stands for “Procedural Language”. Tcl and Perl aren’t as interesting to me personally since I’ve never heard of the first and have never used the second, but the other two built-in language options intrigue me.

    PL/pgSQL interests me because that’s the standard Postgres SQL language module available which gives you a lot of coding features that standard SQL doesn’t provide, such as custom functions, procedures, complex computations, etc. The types of functionality that I assumed were normal for database engines since they’re built-in to T-SQL/SQL Server.

    PL/Python also interests me since I’m learning Python at the moment, in addition to learning Postgres, and it seems like the most useful scripting language I’ve ever used. If you can integrate Python with the database engine so easily, I can see getting a lot of use out of that to pull and analyze data. But I haven’t yet used this functionality with Postgres so I can’t say for sure.

    Conclusion

    Overall, I am excited to learn these new things about Postgres because I am starting to feel more ready for our migration from SQL Server to PostgreSQL. As I’ve written in posts before, I am a natural procrastinator and had been procrastinating starting to learn Postgres until my coworker sent me the Brent Ozar article. I am super thankful that he sent that, because it led me down this wonderful rabbit hole of learning many differences between Postgres and SQL Server that I’m sure will benefit me going forward.

    The below list is all of the resources I used while learning the above information. I thought the EnterpriseDB link was the best overall summary of different topics while the rest really get into the weeds of different Postgres features.

    • https://www.postgresql.org/docs/current/sql-createprocedure.html
    • https://www.enterprisedb.com/postgres-tutorials/everything-you-need-know-about-postgres-stored-procedures-and-functions
    • https://www.postgresql.org/docs/current/sql-do.html
    • https://stackoverflow.com/questions/12144284/what-are-used-for-in-pl-pgsql
    • https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING
    • https://www.geeksforgeeks.org/postgresql-dollar-quoted-string-constants/#
    • https://www.postgresql.org/docs/current/tutorial-window.html
    • https://www.geeksforgeeks.org/postgresql-row_number-function/
    • https://www.postgresql.org/docs/8.1/xplang.html
    • https://www.postgresql.org/docs/current/plpgsql-overview.html
    • https://www.postgresql.org/docs/current/sql-createfunction.html