Seeing as we are getting reasonable traffic for our efforts on this website, our team has decided to start posting specific interview questions that we feel may be of importance to our readers. In that spirit, here is the first really important interview question for SAS.
Interviews, very often ask data analytics professionals a pet question. How SAS works? It may seem like a simple question on the face of it, but it is surprising that is somewhat complicated (but easy to understand) and most seasoned professionals goof it up.
To answer, we will first detail the concept behind this question, which should hopefully present the answer itself by the end. The SAS black box is essentially a loop as soon as it detects the keyword ‘data’. A SAS program works in two stages, the compile stage and the execution stage.
In the compile stage, SAS recognizes the DATA step, which triggers this step. The following steps are performed. First, space is created to store the SAS data set that is to be created. The next step, is to create space in the memory to store individual records for in the data set. This space in the memory is called the input buffer. SAS then proceeds to review the syntax in the data step and determines the names and the data types ( and lengths) for all the variables as listed in the INPUT statement. This information is called the descriptor portion of the data set. Please note, that no data is actually read during the compile stage.
Another important concept here is that of the PDV or the Program Data Vector. The Program Data Vector is essentially a place in the memory where the information about the variables in the data set is stored i.e. variable name, variable data type and length. Any variables that are calculated during the execution of the data step are also included in the PDV. The PDV can be viewed as a table with two rows. The top row stores the information about the variables to be imported/computed in the data step and the second row is meant to store the actual values of a single row of data. At this point the compile stage is completed i.e. all the housekeeping steps are completed to start reading in the data from the source.
The first step in the execution stage is to reset all the values in the PDV to a missing value i.e. a ‘.’ character. Next the first line of data is read into the input buffer. From the input buffer, the data is then copied to the PDV. At this time, any computed fields are calculated and then the entire row of data is output to the data set specified in the data step. This process carries on as a loop for each observation in the data set.
We hope that this explanation provides some clarity about inner working of the SAS tool. This is an important concept for understanding future, more advanced concepts. Hopefully, the explanation helps you get some clarity and you are confident about this question for any interview. Please find relevant reference books in the sidebar, if you should want to purchase the same for further studying. Please sign up for our newsletter, so that we may keep you posted on the latest activity on our website and Youtube channel.