Interview Questions – BASE SAS Programming – How SAS works – Program Data Vector




Seeing as we are getting reasonable traffic for our efforts on this website, our team has decided to start posting specific interview questions that we feel may be of importance to our readers. In that spirit, here is the first really important interview question for SAS.

Interviews, very often ask data analytics professionals a pet question. How SAS works? It may seem like a simple question on the face of it, but it is surprising that is somewhat complicated (but easy to understand) and most seasoned professionals goof it up.

To answer, we will first detail the concept behind this question, which should hopefully present the answer itself by the end. The SAS black box is essentially a loop as soon as it detects the keyword ‘data’. A SAS program works in two stages, the compile stage and the execution stage.

In the compile stage, SAS recognizes the DATA step, which triggers this step. The following steps are performed. First, space is created to store the SAS data set that is to be created. The next step, is to create space in the memory to store individual records for in the data set. This space in the memory is called the input buffer. SAS then proceeds to review the syntax in the data step and determines the names and the data types ( and lengths) for all the variables as listed in the INPUT statement. This information is called the descriptor portion of the data set. Please note, that no data is actually read during the compile stage.

Another important concept here is that of the PDV or the Program Data Vector. The Program Data Vector is essentially a place in the memory where the information about the variables in the data set is stored i.e. variable name, variable data type and length. Any variables that are calculated during the execution of the data step are also included in the PDV. The PDV can be viewed as a table with two rows. The top row stores the information about the variables to be imported/computed in the data step and the second row is meant to store the actual values of a single row of data. At this point the compile stage is completed i.e. all the housekeeping steps are completed to start reading in the data from the source.

The first step in the execution stage is to reset all the values in the PDV to a missing value i.e. a ‘.’ character. Next the first line of data is read into the input buffer. From the input buffer, the data is then copied to the PDV. At this time, any computed fields are calculated and then the entire row of data is output to the data set specified in the data step. This process carries on as a loop for each observation in the data set.

We hope that this explanation provides some clarity about inner working of the SAS tool. This is  an important concept for understanding future, more advanced concepts. Hopefully, the explanation helps you get some clarity and you are confident about this question for any interview. Please find relevant reference books in the sidebar, if you should want to purchase the same for further studying. Please sign up for our newsletter, so that we may keep you posted on the latest activity on our website and Youtube channel.




BASE SAS Programming – Exploring the SAS University Edition




Hi All, welcome to another post in the “BASE SAS Tutorials”. Hopefully, by now we expect that you should be set up to start working with the SAS University Edition. However, there is one thing that has not been discussed up to this point, that is the layout of the software itself. In future posts, there will be references to different sections of the tool as we execute sample SAS programs. In this post, we would be giving a quick walk through of the important inteface elements in the SAS University Edition software. Let’s dive right into it.

Keep reference of the Screenshot below:

The interface is neatly divided into two sections. The first section is in the form of the left pane, where the users can review look through the data sets and libraries that are created as SAS scripts are executed. These elements are seen in the ”Libraries” section. For our purposes, this is the most useful section in the left pane. In the interest of discussion, the left pane is a quick way to access default locations for existing data sets, sample libraries and sample SAS programs. These can be used as a reference while designing our own scripts as and when needed.

The second section is on the right side. This section is of much more interest at this stage. It is important to get familiar with the major elements in the section of the interface. There are 4 tabs in this portion as seen in the above section.

  1. Program – This is where we will be writing our SAS programs. All the data and proc steps are scripted in this section. Pay close attention to the small icons in dark blue shade, right below where it says ‘Code’. These are various commands to execute the SAS programs, saving programs and some other rudimentary operations that can be performed while writing the scripts. The ones of most importance being the run (the small running man icon) and saving options.
  2. Log – The second tab is the log. One of the reasons SAS is used so widely is because it provides a log for all the scripts being executed as it details all the operations performed on the raw data sets received from clients. Logs, not only allow for troubleshooting for SAS programmers, but also work as an audit document detailing the transformation steps performed.
  3. Results – This tab is simply, used to print out the reports that get generated from the Proc steps in the program. In this instance, the program should simply print out the contents of the data set ‘sample’. It is useful when more complex reports are required as these cannot be displayed in a data set.
  4. Output data – The last tab, the ‘Output Data’ tab, is simply to detail the data set created in the data set in the latest execution of the SAS program. In this case, the details of the table ‘sample’ are shown. The details include, the data types for all the columns and the details of the table itself.

See screenshots below to understand the functions of the tabs as listed above:

SAS Logs – See below how the details of the imported data into the new data sets is shown in the SAS logs.

Results Tab – This tab simply showing the contents from the ‘proc print’ step in the program.

Output Data Tab – The screenshot details of the data set created in the data step.

We recommend to play around with this interface as we go along with the sample SAS programs. It is important to get familiar with these terms and the layout so that the operation can be smooth and efficient. Please stay tuned for future posts. Also, please find relevant reference books in the sidebar, if you should want to purchase the same for further studying. Please sign up for our newsletter, so that we may keep you posted on the latest activity on our website and Youtube channel.

 




Categories SAS

BASE SAS Programming – Your First SAS Program




Welcome to the latest post in the ‘BASE SAS – Tutorial’ series. So far we have discussed the broad generic details about SAS Programming. This post onwards, we would be taking concrete steps to move ahead and actually learn SAS programming with each post. In the previous post, we had detailed steps detailing instructions to set up the SAS University Edition program. We hope you are set and ready to follow along. As a reference, we recommend that you purchase a copy of “SAS By Example” by Ron Cody. It is a great resource with a ton of examples to practice with detailed explanations for all the important concepts.

Let us start with a leap instead of baby steps. We think to learn how to script it is important to actually study scripts instead of starting with the theory to keep the reader interested. So let’s look at a sample SAS program.

Even in such a simple program, there are various things to be highlighted so pay attention to the points below:

  1. There are 2 clear sections in the SAS program in the screenshot. The first being the data step on line number 3 and the other being the proc or procedure step on line number 19.
  2. Data Step – The data step section of the program is where the data transformation steps like importing data, data cleansing, the creation of new fields etc. are scripted. All changes to existing or new data sets are done in this section of SAS programs.
    1. Note the Input statement. This statement is the most versatile statement in SAS. It allows for a variety of customizations (with the use of various options which will be discussed in future posts) to import data files in different formats. The input statement lists all the fields that are present in the raw data being imported into the data set.
    2. The Datalines statement is used to input data into a data set. It is particularly useful for demonstrating small SAS programs. In larger programs, the ‘Infile’ statement is used to import data from raw files.
  3. Proc Step – The proc step represents the invocation of the built-in SAS procedures that are used to prepare reports from transformed data sets in the data step section of SAS programs.
  4. Please note, that each statement in a SAS programs ends with a ‘;’ character. This is a mistake made by most seasoned professionals while scripting in SAS. Most errors can be traced back to missing semicolons in SAS scripts. Special attention must be paid here.
  5. The “Run” statement is used at the end of each Data or Proc step in SAS programs. It indicates the end of the section to the software. The SAS program can each section individually by simply selecting the desired section. This is where a ‘Run’ statement is very handy.

So what happens when this script is run. The expectation from this script is that we would see a new table created with the name ‘data1’ with 8 observations and 5 variables. If the program runs correctly, the log would look as shown in in the screenshot below:

We would be looking at logs for every SAS script that we discuss in this series. It is important to be able to read and understand the SAS log. It provides a lot of useful information, especially when you notice errors in a program. The resulting data sets look as shown below:

The left of the dataset shows all the columns with the respective data types and the right side displays the observations from the table itself.

The other output we get is from the proc step. As discussed, this section creates a report. Outputs from the Proc steps can be seen in the ‘Results’ tab. The result from our sample SAS program is shown in the screenshot below:

 

There is a lot of information to unpack in the simplest of SAS programs. It is important to keep the terms mentioned in this post on your radar so that you can keep an eye out for these in future posts to be able to follow along clearly.

We certainly hope this post was helpful. In the next posts, we would be looking at the relevant sections in the SAS Unversity Edition so that you can comfortably navigate the interface when you try out sample SAS programs. We strongly urge you to practice along and ask any questions in the comment section and subscribe to our newsletter for latest posts. Also, please find relevant reference books in the sidebar, if you should want to purchase the same for further studying.




BASE SAS Programming – How to Set up SAS University Edition




Hello, people. Hope you are feeling well and up to the task of learning SAS. With this post, we will start our journey to the learning SAS programming. This series ‘BASE SAS Tutorials’ should be particularly useful for beginner data analytics professionals. This series should also serve as a solid guide to review some of the important concepts for most seasoned professionals in the industry. So without any further delay, let us get started with setting up our SAS University Edition software on your local machines.

First, visit the SAS website to get the SAS University link right here. The page should appear as shown in the screenshot below:

 

This page details the procedure you can follow to install the SAS University Edition software for any of the major Operating System platforms like Windows, OSX (now macOS) & Linux. It provides the minimum system requirements for each of the platform. So please read through carefully before attempting to install the required software for the setup process.

There are two main programs that need to be downloaded from this page. First is the virtualization software, the above page provides a direct link. Second is the SAS University vApp. It is fairly simple to install the virtualization piece just like any other executable file. Once, the virtualization software is installed the below steps need to be followed to set up the virtual machine and start SAS University Edition:

  1. Import Appliance – At this point, the vApp file needs to be imported into the virtualization software. Go to File -> Import Appliance should yield a screen as shown in the screenshot below:

 

2. Setting up the machine: After importing the vApp file, create a folder on your desktop named ‘SASUniversityEdition’ with a subfolder ‘myFolder’. Then come back to the ‘VirtualBox’ window. you should now have a TAB on the left indicating the vApp file is imported.

Go on to set up the machine by going to ‘machine’ -> ‘Settings’. In the next window, click on the button ‘shared folders’ and click on the add folder button on the right side on the window.

In the window that pops up, provide the path to the folder created on the desktop and select the options ‘Auto Mount’ and ‘ Make permanent’. Now the virtual machine is now set up. Let’s move on to opening the SAS University itself.

The SAS University Edition software would ultimately be opened in your local browser. To do so, first, click on the machine -> start -> normal start. This would show another window pop with the SAS logo in white on a blue backdrop, which would result in this final screen.

Copy the address ‘http://localhost:10080″ into your browser. This would open the window as shown below:

At this screen click ‘Start SAS Studio’ and you would be taken to the SAS University window. It should look like in the screenshot below:

At this time, you are now ready to start writing your SAS Scripts. We request you to follow along with the posts which would include Sample SAS scripts or SAS programs for you to test out at your end. If you have any doubts, please leave a comment and we would revert back at the earliest. Please subscribe to our newsletter for the latest updates.




Categories SAS

SAS Programming – Introduction

Hello, fellow data analysts. With this post, we are beginning another important journey to learning a new and very important skill data in the field of data analytics. This series of posts would be tagged under the category for SAS tutorials. SAS is an extremely versatile and robust tool for data mining and analyzing data. The tool is used in most Fortune 500 companies around the globe for its powerful features and reliability. In this series of posts, we will try to cover most topics that would enable our readers to get a start on preparing for their BASE SAS certification, which a highly valuable credential to possess.

Before we start exploring the concepts in detail, we will be listing down the broad topics that would be covered in this series of posts.

  1. How the SAS tool works– The SAS black box is a pet question asked by employers world over because, for some reason, most seasoned professionals struggle to articulate the response. In any case, besides the interview, understanding how the underlying processing works, is very crucial to understanding the important concepts.
  2. Reading in data files – To analyze data, it needs to read in first. SAS can read in almost any type of data file. These can range from delimited text files to DB2 database files. To be able to work all such files using scripts is very useful as it is critical.
  3. Functions – Functions are useful for a whole lot of reasons in SAS. Not just for the creation of new fields but more importantly for data cleansing. SAS, unlike other tools, has a huge variety of functions and modified functions to facilitate complex data transformations. There are functions to deal with numeric, character and date data types. These are also the main data types in SAS, which would be discussed later.
  4. Structured Query Language – This is by far the most awesome feature we came across in SAS when we first got our hands on it. Programmers can include SQL scripts within the SAS programs. Folks who might be new tools SAS, find it easier to carry on working even with a limited knowledge of SAS initially. SQL scripting it also provides a boost in performance of the total SAS scripts. We will illustrate instances where it is beneficial to include SQL scripts within SAS scripts without going into too much detail into the SQL scripting fundamentals in this series.
  5. SAS Procedures – Although this aspect would be discussed in more detail later on in this series, its importance needs to be highlighted so, it is on your radar while preparing for the SAS certification. SAS procedures are built in powerful scripts used for creation/printing of analytics on data sets prepared in SAS programs.

Keep these points in your minds as you go through future posts. Rest assured there is a lot more to BASE SAS than just these broad headings. There are some very powerful techniques and concepts you can look forward too.

Each post would explain a concept with illustrating images and videos wherever possible to make it as easy to follow along as possible. You can follow our youtube channel for tutorials on other tools like Audit Command Language & MS Excel.

 

 

 

Categories SAS