Declaring Variables within SQL

Ensuring key variables are declared at the beginning of the SQL code can help to automate the re-use of the code.

James McNeill
Towards Data Science

--

Photo by Patrick Perkins on Unsplash

When beginning to work with SQL code many data scientists will make the code specific to the task that they want to achieve. With the output created users will then move forward to the next stage in the data journey, which could be a data visualization. Working to create a boilerplate code where hardcoded variable values are defined at the start of the code, ensures that no variables are potentially missed during future use.

Re-factoring the SQL code to ensure that automation tasks can be performed efficiently helps to test the code better. In turn, it also provides users with more confidence when sharing the code with other stakeholders. Many stakeholders may not be aware of the inner workings of the SQL query that has been shared, so the author must take care to clearly define all key items.

In this article, we show how the declare statement can be used with some basic variables. Many of these examples will show the potential that exists for creating more streamlined SQL code.

Dataset

For this piece of analysis, Greece’s Earthquakes dataset has been taken from Kaggle¹.

The SQL code shown in this article was created using an instance of Microsoft SQL Server Management Studio 18.

Getting started

When first working with the declare statement, users will notice the similarities that exist with other programming languages. Each variable can initially be declared with a certain data type, which ensures that the appropriate data is being keyed in by the user. Setting the new variable provides the first data input to be used within the SQL code that follows.

SQL code 1.1 Simple declare and set statements for Exploratory Data Analysis (EDA) (Image by author)

As the reader can see, within the declare statements we have created two date variables and one character variable. The first set statement for the start variable creates a user-defined date value. By including an adjustable date value the user can query the dataset quickly. However, with all user-defined values, there is the potential for a keying error. Therefore, care must be taken to ensure that an appropriate data type value is provided. Keeping a variable in the code provides flexibility to the end user.

For the second date variable stop, we are using an internal keyword variable that supplies today's date as a value. Applying this system generated variable allows the SQL code to work without any adjustment by the user. We can see with the SELECT statement that the function DATEADD is being supplied with this second date variable. What the DATEADD function is creating is the month value that is three months prior to today's date.

Creating a variable value this way ensures that the most recent three months of data are excluded from the query results. At times developers will incorporate this action into their data analysis as the most recent time series data can include some volatility in the results. The meaning of this volatility can relate to the data not being fully validated by the team that has created it. As all of the data sources may not have provided fully verified inputs at the stage of initial data analysis. If this is the case then there can be time allotted to the data source providers to validate their data provided. Should any adjustments be required then they can be supplied for appropriate production of management information reports.

SQL output 1.1 results show the initial rows using the date filter (Image by author)

Including a declared variable that uses today's date and adjusts this within the code to take values from three months previous as an end date, highlights how this query can be scheduled to create outputs automatically. With the results that have been output, we are able to share an appropriate snapshot of the data.

COUNTER

Many times when working with SQL code we want to understand how far into the process we are. Making use of a COUNTER can provide an overview of the most recent stage. Tracking metrics with the COUNTER helps to ensure that we are always aware of the progress made, especially if the program has a number of iterations to perform.

SQL code 1.2 Basic counter to review the stages of the code that have been completed (Image by author)

Creating a counter variable as an integer data type allows the user to set the variable value in the format that suits the task they are aiming to complete. By taking an integer value of zero the user could iterate until a task has completed five steps. However, if the user prefers to review the counter in descending order then a higher value can be set such as five and then this depreciates through each iteration. With the updates to the counter value shown above, there are two different methods available. The first method shows how the counter variable can be updated by adding a value of one to the counter value. With the second method, a shortcut is taken to update the counter variable. At each iteration, the value would increase with each method. These results can be seen below.

SQL output 1.2 Each iteration of the counter can be printed to the result log (Image by author)

WHILE LOOP

Similar to the counter in the previous section, another option to use is the while loop. As seen in other programming languages a while loop provides an endpoint for the loop to work towards. At each iteration of the loop, the function checks to understand if the loop has reached the final iteration value. While the final iteration has not been reached then the program will continue to perform the task requested.

SQL code 1.3 creation of the counter variable using a while loop (Image by author)

The same steps are followed when declaring the counter variable as seen before. A variable is declared as an integer data type. Once again the variable is set to an initial value. Where the counter variable would require continuous duplication of code to update the variable value using the previous counter method. A while loop has optimized the code, to only include the updating of the counter variable in one line. Two keywords are included to BEGIN and END the loop section of the code.

SQL output 1.3 results of working with the while loop to iterate the counter (Image by author)

The demonstration with the while loop results in the first and last counter values being presented.

DATEFORMAT

Another option that can be adjusted at the beginning of the SQL code is to decide which date format is being used. There will be numerous date format options available to a developer, we will not cover these here. For this example, we are highlighting how the set statement can be used to provide the date format that we require.

SQL code 1.4 working with the date format keyword (Image by author)

The code aims to check and see if the date1 variable that was declared is viewed by the SQL interpreter as a SQL date. By returning the result of 1 in the output below, we are able to confirm that this is the case. Using different date formats will allow a developer to change the output date type to the local expectations.

SQL output 1.4 results for the date assessment (Image by author)

The code above that is underscored with red shows that the SQL interpreter is not aware of this value being created yet. As the go statement prior to this section of the code restarted the interpreter then the previous knowledge of this declared variable is lost. To use the declared variable again, then we need to comment out the previous go statement. Alternatively, we can also create the declared variable again and provide a different value. Being aware of these clauses with the SQL interpreter ensures that no potential errors emerge when they are not supposed to happen. Including the go keyword can help to separate code into different sections and could be useful depending on your work requirements.

LANGUAGE

So far we have worked with declaring common variables to work with data. Another option available is to adjust the language that is used in the background. We may be required to use this functionality when working with dates. In the example code, we are showing the difference between the two languages. Creating different variable names by using the language option helps to use build-in data.

SQL code 1.5 adjusting date results by using setting the language option (Image by author)

The code has been kept very similar when using each of the language options. This helps to show the different outputs that are being created automatically without having to change large amounts of code. We also see how the go keyword has been used to create two code blocks. Including different initial date values also highlights how many date formats can be worked with.

SQL output 1.5 differences between the two languages (Image by author)

Having the ability to maintain code syntax helps to make a developer's life much easier. With the code automatically producing the relevant character values for the month name and weekday, we can see how many options are available to produce management information.

These examples only scratch the surface of what is available when using the declare and set statement attributes. By being able to amend a few internal settings within a query a developer has more freedom to use similar code to produce different results. It is always the dream to not have to continuously create lots of similar code.

Conclusion

We have worked through declaring a number of variables in this article. Making use of a number of built-in methods allows the code to be automated. It is this automation that can enable the reproduction of regular reports by setting up a scheduling process flow. When working with dates there are a number of date formats available to use. Taking care to understand the appropriate date helps to ensure that the data is filtered correctly. By using a counter or while loop, we are able to maintain a record of the current iteration of the code. Adjusting the language that is used by the SQL interpreter shows how the same code can be used to produce results that are appropriate for the language required.

Leave your comments and thanks very much for reading!

--

--