3 Ways to Transform your SAS Data Workflow with SAS-SQL

Use SAS? Harness SQL to streamline your work.

Jimmy Luong
Towards Data Science

--

Photo by Roma Ryabchenko on Unsplash
  1. Introduction
  2. Benefits of using SQL in SAS
  3. Example dataset
  4. Example #1: Joining datasets
  5. Example #2: Adding values to a new column
  6. Example #3: Selecting unique values in a column
  7. Conclusion

1. Introduction

As someone who had to quickly pick up SAS in a new workplace, I was happy to see that I could use my existing SQL knowledge to complement my company’s analytics team who primarily uses SAS. Generally, I’ve found that most of what you can do in other SQL platforms such as MySQL and PostgreSQL are applicable, although the syntax may differ.

After some troubleshooting with our existing SAS code, I was able to show my co-workers a few convenient ways that SQL could improve our workflows. I demonstrate a few of these methods below using three examples.

2. A few benefits of using SQL in SAS

  1. Readability— SQL is readable and intuitive to learn.
  2. SQL can perform multiple functions such as manipulating, summarizing and sorting in one step.
  3. If you already know SQL and are new to SAS, you can immediately start contributing to your team.
  4. Combining SQL and SAS data steps can lead to streamlined workflows.
  5. SQL is an ANSI-standard language used by most databases.

3. Our example dataset

The two tables we use in the examples are derived from a Patagonia clothing dataset created by Beth Morrison. Note that for these examples I use SAS Studio although the same syntax can be used on other SAS platforms such as SAS Viya or SAS Enterprise.

The “clothing_category” table. Image by author.
The “clothing_list” table. Image by the author.

4. Example #1: Joining Datasets

Suppose you want to find out which clothing items in the “clothing_list” table are appropriate for either the land or water. In this case, you’ll want to join the “clothing_category” table to the “clothing_list” table using the common column “item_category”.

SAS: The equivalent of joining or merging tables together in SAS can potentially be a multi-step process. Please refer to the SAS documentation here.

SQL: Joining tables with SQL-SAS is straightforward and can save you time. Note that “JOIN” and “LEFT JOIN” are the same in SQL-SAS.

PROC SQL;
SELECT *
FROM clothing_category
JOIN clothing_list
ON
clothing_category.item_category = clothing_list.item_category;
QUIT;
The resulting output table from the above join. You can see now that the “item_category” from the “clothing_list” table is matched with the“Land” or “Water” value from the “clothing_category” table. Image by the author.

5. Example #2: Adding Values to a New Column

Suppose you want to assign values to a new column based on the values of another column. Here, we will add a column with a value that will specify whether each item in the “clothing_list” costs under $100 or over $100.

SAS: In SAS, you can run a DATA step, create a new column “price_category” using the LENGTH function, and then use an “IF-THEN-ELSE” statement to assign values to the rows in the column.

DATA work.output_table;
SET work.clothing_list;
LENGTH price_category $50.;
IF item_price < 100 THEN price_category = 'Less than $100';
ELSE IF item_price > 100 THEN price_category = 'More than $100';
ELSE price_category = 'Not available';
RUN;

SQL: Instead of running a DATA STEP using SAS with multiple IF statements, you can use CASE WHEN statements in SQL-SAS. This also gives you the flexibility to aggregate (GROUP BY), order (ORDER BY), and more in a single step.

PROC SQL;
SELECT item_category, item_name, item_price,
(CASE
WHEN item_price < 100 THEN 'Less than $100'
WHEN item_price > 100 THEN 'More than $100'
ELSE 'Not available'
END) AS price_category
FROM work.clothing_list
;
QUIT;

Running either of the above code blocks will output the following table:

Note the “price_category” column. Image by the author.

6. Example #3: Selecting Unique Values in a Column

Suppose you want to find unique values in a column, such as the number of unique item prices for the clothing in your store. Here is where SELECT DISTINCT in SAS-SQL shines.

SAS: In SAS, you may find unique values using PROC SORT:

PROC SORT DATA = work.clothing_list
OUT = work.unique_values
(KEEP = item_price)
DUPOUT = work.duplicate_table
(KEEP = item_price)
NODUPKEY
;
BY item_price; /*this is the column of interest*/
RUN;

What’s great about SAS, in this case, is that the “DUPOUT” option outputs a table of the extra duplicate values in the table “work.duplicate_table” in addition to the output table “work.unique_values” which contains a list of the unique item prices. You also have the option to output entire matching rows.

“NODUPKEY” tells SAS to only keep the first occurrence for each unique value of the column listed in the BY statement in the output table.

Duplicate item prices in the WORK.DUPLICATE_TABLE. Image by Author.
Unique item prices in the WORK.UNIQUE_VALUES table. You can see here that “75” and 429" are also in this table. Image by the author.

SQL: In SQL, simply write a few lines as below to get the same output as the table with unique values above. Note: This will NOT output a table with the extra duplicated rows.

PROC SQL;
SELECT DISTINCT item_price
FROM clothing_list;
QUIT;

7. Conclusion

I hope this helps you learn more about the potential of using SQL in SAS! In practice, I find that using both SAS data steps and SQL code blocks work best for my work in data preparation.

How are you using SQL in SAS to improve your workflow? Want to know more about how SQL-SAS can save you time and simplify your code? Let me know in the comments below!

— Jimmy

--

--

Harvard Strategic Data Project, Data Fellow | Ex-project manager and civil engineer