BASE SAS Programming – Understanding SAS Formats




In the previous post we discussed the use of  SAS Informats. It is a useful method to read in the data files appropriately from a variety of formatted sources of raw data files. However, when opting for this method, we end up looking at the raw data in the tables as raw values. For instance, date fields read in by this method are displayed as numeric values and it is somewhat troublesome to understand the actual values as present originally in the raw data. This is where the  SAS formats come in. SAS formats enable us to format the imported data in a way where it would seem presentable and more easily readable.

Let us consider the SAS program as example:

data test;
input  @1 Name $5. @7 Balance 4.  @12 DOB mmddyy10.   @23 Gender $1.   ;
format DOB date9. Balance 4.2;

datalines;
Jimmy 45.5 12/11/1999 M
Julie 23.2 1/14/2011 F
Tom 20 1/1/2001 M
;
run;

In the above SAS program, we are importing the date field ‘DOB’ using  SAS Informats. This field is now being read in as numeric information with respect to the SAS system date. When we apply the SAS format statement in the data step, we can change the way the numeric values are displayed. Check out the output table from this SAS program below:

SAS Program:

SAS Log:

Output Tab:

As it can be seen in the screenshot of the output tab, we are able to display the date values in a format, which is different from the SAS informat that was used to import from the raw data lines. See below the screenshot, where the dates have been formatted using the  SAS format mmddyy8.

Just SAS provides the flexibility to import raw data using a variety of SAS Informats, there are many options to format the data points as well. As you keep progressing further you will get familiar with the most commonly used  SAS Informats & Formats. It is important to practice on actual data sets to get familiar with these concepts as soon as possible. There are a significant number of questions in the BASE SAS certification examination from this topic.

You can find relevant reference books in the sidebar of this post, if you should want to purchase the same for further studying. BASE SAS Certification guide and the Little SAS book are extremely useful for preparation and also act as a wonderful reference for SAS concepts.

Please sign up for our newsletter, so that we may keep you posted on the latest activity on our website and Youtube channel.




Categories SAS

BASE SAS Programming – Understanding SAS Informats




Up to this point, we have discussed the various scenarios to understand how to input data into SAS. In this post of the “BASE SAS Tutorial” series, we would take a more detailed look into how SAS implements the use of the Informats with the Input statement.

We would of course refer you to a book like ‘SAS by Example‘ by Ron Cody to get the formal understanding.  However, we will aim to provide a more workable understanding. Let us consider the datalines as listed below:


In these data lines, one way to read in all the fields is to simply import all of the fields as character values. However, informats allow us to store the values as they are intended. For instance, if we store the date values in these data lines as a character data type, it would be unavailable to leverage any date specific calculations that we may wish to perform. We would end up converting it to a numeric/date format and then applying the desired functions.

Let us consider the outputs from both approaches i.e., importing the date field as a character and using informats.

SAS Program/Output when Imported as character:

data test2;
input  @1 Name $5. @7 Balance 4.  @12 DOB $10.   @23 Gender $1.   ;

datalines;
Jimmy 45.5 12/11/1999 M
Julie 23.2 1/14/2011  F
Tom   20    1/1/2001  M
;
run;

Output/Log:

SAS Program/Output when Imported with Informats:

data test1;
input  @1 Name $5. @7 Balance 4.  @12 mmddyy10.  @23 Gender $1.   ;

datalines;
Jimmy 45.5 12/11/1999 M
Julie 23.2 1/14/2011  F
Tom   20    1/1/2001  M
;
run;

As it can be seen in the above outputs from the two SAS programs, the SAS informats enable allow the storage of the values based on the formats present in the raw formats. However, it is worth noting that although SAS informats  allow for data points to be stored correctly, it is rather inconvenient to view them as the numeric values. It is not always clear as to what the actual date values are. These are addressed by the SAS format statements. These will be discussed in later posts. Keep in mind, SAS Informats  are available for many standard layouts for dates and numeric values (to address decimals, currencies etc.). For instance, amounts may have currency formats or the comma formats depending the region where the data is extracted. SAS Informats are very useful to capture such layouts accurately.

We certainly hope, this was useful in clarifying the understanding. Some of the most tricky questions in the BASE SAS certification are picked from this topic. We recommend, please keep practicing and working hard.

You can find relevant reference books in the sidebar of this post, if you should want to purchase the same for further studying. BASE SAS Certification guide and the Little SAS book are extremely useful for preparation and also act as a wonderful reference for SAS concepts.

Please sign up for our newsletter, so that we may keep you posted on the latest activity on our website and Youtube channel.




Categories SAS

BASE SAS Programming – Formatted Column Input




The previous post in the “BASE SAS Tutorial” series, started the discussion about the fixed column inputs. We had discussed, via a sample SAS program, how to import data in fixed columns by providing the range and the data type in the input statement. In this post, we would be discussion how to import the same data, as a formatted input.  Let us consider the sample SAS program as shown below:

data test;
input  @1 Subj $3.   @4 DOB mmddyy10.   @14 Gender $1.   @15 Balance 7.;

datalines;
00110/21/1955M 1145
00211/18/2001F 18722
00305/07/1944M 123.45
00407/25/1945F -12345
;
run;

In this program, just like in the earlier post, we are looking to import the datalines. The key difference here, can be seen in the way the Input statement is written for the same data set. The input statement uses the “@” character (used before the field name) to indicate the position of the field in the data line. For instance, the field “DOB” is starting at the position number 4 in all the data lines. The next noteworthy difference in the Input statement is the way the length of the field (along with the indication of the data type) is specified after the field name is listed. Lastly, the major change here is the use of informats in the Input statement. We would discuss the informats in more detail later on, but for now, it would be enough to consider them as providing an indication to SAS as to what format the data type is provided in the input and to store it accordingly. For instance, the field “DOB” has dates in the format of “MM/DD/YYYY”. This can be read in as a character value but it would not be useful, if we wanted to use any date specific functions on this field. Therefore, it is critical to use the appropriate informat to import the data values correctly. SAS has built standard informats which we will be discussing in later posts.

Let us consider, the results from the above sample SAS program.

Log

Output Tab:

Please pay attention to the output tab from the above sample SAS program. We have used informats to read in the data. Note how the date values are stored as numbers instead of displaying as dates are expected to be displayed. This is because SAS stores dates as numbers. The starting date in the SAS system is taken to be at 1 Jan 1960. All dates are stored in reference to this date value, which is considered as 0. The negative values in the DOB column in the output tab represents the number of days prior to the SAS system date.

At this point we hope that you, the readers are comfortable with the basic data import methods in BASE SAS. These the core concepts on which, we will be building on in future posts. It is also important to understand these from the point of view of preparing for the BASE SAS certification. So keep practicing.

You can find relevant reference books in the sidebar of this post, if you should want to purchase the same for further studying. BASE SAS Certification guide and the Little SAS book are extremely useful for preparation and also act as a wonderful reference for SAS concepts.

Please sign up for our newsletter, so that we may keep you posted on the latest activity on our website and Youtube channel.




Categories SAS

BASE SAS Programming – Fixed Column Input




Previously we discussed how to import data using the list input method. In this post  in the ‘BASE SAS tutorial” series, we would be discussing how to import data using the fixed columns method. There are two approaches within the fixed column method. The first is the Column Input. Under this approach, we provide the range of the fields as present in the raw data. Consider the below SAS program as an example.

data test;
input Subj        $ 1-3
DOB       $ 4-13
Gender  $ 14
Balance  15-21;

datalines;
00110/21/1955M 1145
00211/18/2001F 18722
00305/07/1944M 123.45
00407/25/1945F -12345
;
run;

We can the see the datalines are formatted such that all individual fields are within fixed lenghts without any overlap. In such cases, the input can written as in the SAS program above, where the range of the fields is specified along with the indication of whether or not the field is a character value. So in our example, we can see that the field DOB is being read as a character field ranging from the 4 character (from the left) to a width of 9 characters up to the 13th. All remaining fields are also defined in a similar fashion.

See below for the resulting logs and outputs from the above SAS program in the SAS University Edition interface:

LOG:

Output Tab:

 

It should be noted, that the raw data may not always be present in fixed width columns without overlap. These cases may be quite rare and may only be applicable to smaller projects. It is an important concept, nonetheless. It shows of the versatility of the Input Statement.

Please find relevant reference books in the sidebar of this post, if you should want to purchase the same for further studying. BASE SAS Certification guide and the Little SAS book are extremely useful for preparation and also act as a wonderful reference for SAS concepts.

Please sign up for our newsletter, so that we may keep you posted on the latest activity on our website and Youtube channel.




Categories SAS

BASE SAS Programming – Working with CSV Files – DSD Option




Hello people. Welcome to another post in the “BASE SAS Tutorial” series. We are making some quick progress in the last few posts. So far we have discussed how SAS reads text files as list inputs in the form of delimited files ( space delimited, or otherwise). In this post, we will look more closely at the a very useful option that is used with the infile statement. This is the DSD option.

Let us quickly first lay down the uses of the DSD option before we see its practical application.

  1. It can change any default delimiter from the space to a comma. This can be useful because very often the data files are received in the .csv format. So instead of using the DLM option, we can simply use the DSD as it also provides other benefits.
  2. If there are two consecutive delimiters, it considers the same as a missing value. The data doesn’t need to have specified missing values. This is very useful as we can imagine, how often we get data files where there are missing values. The DSD option helps to resolve this problem quite easily.
  3. Lastly, any double quotes around the character fields are stripped.

These are very useful capabilities for only three words.

Let us consider the data as shown below:

Please note the the missing value on the third record and the double quote qualifiers for the Gender column. See below the sample SAS Program to import the above sample data using the DSD option.

data sample1;
infile ‘/folders/myshortcuts/myFolder/Sample data – CSV.txt’ dsd;
input gender $
Age
Height
Weight;
run;

See below the results as they appear in the log and the output folder:

Log Tab:

Output Data Tab:

Play around with the DSD option and see how it responds to changes in the raw data file with each import. Please remember to review the log after running every sample script even if the results are all good. Any good SAS programmer must be in the habit of being able to read the SAS log. It is quite important.

You can find relevant reference books in the sidebar, if you should want to purchase the same for further studying. Please sign up for our newsletter, so that we may keep you posted on the latest activity on our website and Youtube channel.




Categories SAS

BASE SAS Programming – Data Import




Welcome to another post in the ‘BASE SAS Tutorial’ series. SAS is a very robust tool, capable of reading data from a variety of sources. In the next few posts, we will be discussing the various ways that data files are taken as an input into SAS. Let us start with text files or more specifically delimited text files.

So firstly, SAS considers spaces as the default value for the delimiter in a data file. Let us consider the piece of code below:

data sample;
infile ‘/folders/myshortcuts/myFolder/Sample data – blank delimiter.txt’;
input gender $ Age Height Weight;
run;

Screenshot of the data file being imported in the program:

The code provided above should work as it is in your SAS University software as long as there is a data file in the ‘myfolder’ location created during set up for SAS University Edition.

Let us quickly review the above information. The code is attempting to read a sample text file, which is space delimited. The file has 4 columns namely, Gender, Age, Height & Weight. Please note how the character variables are indicated in the input statement. Also please note, the infile statement. This statement is used to read in data into SAS programs. Users can provide options with this statement in order to change the delimiter depending on the file being imported.

Take time to review this the concepts shared in the post. You can create a sample data file and the use the program to test out your observations. Please try it out and share your experience in the comments below. Please find relevant reference books in the sidebar, if you should want to purchase the same for further studying. Please sign up for our newsletter, so that we may keep you posted on the latest activity on our website and Youtube channel.




Categories SAS

BASE SAS Programming – SAS Libraries




Many concepts seem intertwined when we study SAS. It is important to be patient and keep pushing through. Fundamental concepts are the key to understanding more advanced concepts. Welcome back fellow readers. We hope you have been able to follow along up to this point without much trouble.

In this post we will be looking into SAS Libraries. There are two types of libraries in SAS. There are permanent and temporary libraries. Temporary libraries are store data set only for the current session. Permanent libraries store data sets to a permanent location on the local system. Consider the screenshot below:

All the default libraries in SAS University Edition are listed in the left pane under the libraries section. Notice the ‘Work’ library. This is the default library created for any temporary data sets created in the current session. As seen in the SAS program written in the right pane, the temporary data set ‘sample’ is created in the ‘work’ library. These data sets won’t exist after the session is closed.

To understand how SAS creates the permanent libraries in SAS, let us consider the screenshots below:

In the above screenshot we can see the Libname statement being used to create a permanent library. Simultaneously consider the the screenshot below where the log indicates the successful creation of the permanent library ‘Test1’. Also notice, the changes in the left pane of SAS University interface. There is a permanent library ‘Test1’ with a data set ‘sample’ saved. As you can see the log indicates a successful run of the program, so the expected changes are now updated.

We hope this post provides some clarity on SAS libraries. You can now try to now start trying to create your own permanent libraries. Please note, if you are using any other  software to run SAS programs, similar interface elements should be present where you may locate the relevant local paths to libraries and based on how the project folders are created. For instance, the layout for BASE SAS tool would be somewhat similar but the path for the libraries would be more kin to an actual system path instead of the one shown in the sample SAS program.

Keep practicing and leave comments in case you face any challenges. Please find relevant reference books in the sidebar, if you should want to purchase the same for further studying. Please sign up for our newsletter, so that we may keep you posted on the latest activity on our website and Youtube channel.




Categories SAS

BASE SAS Programming – SAS Data Types

Welcome readers. We hope the “BASE SAS Tutorial’ series is working out for you all. So far we have tried to distill all the important elements that would be needed for first timers to get started with BASE SAS programs. If there is any thought, please let us know. From this post onward we will be discussing more direct topics related to the SAS programming so you can start practicing your own programs.

In this post let us consider the most obvious data types encountered in SAS. These are character, numeric and date formats. Just like Audit Command Language, these are the most commonly used data types in SAS. These three data types essentially capture the three major questions to be addressed in any data source i.e. what, when and how many. After all, any data source is a record of any event or commodity at a given point in time. Any analysis is based on answering these questions when data is analyzed.

Let us consider the sample SAS program in the screenshot below:

Consider the scope of the above program. The above SAS program is attempting to read a data source (data lines in this case) of two lines with the information (name, age, and DOB). The three main data types are clearly illustrated here. Take a look at the input statement in the SAS program and observe that the three different variables are input in different ways. The character variable ‘Name’ is indicated as so by the ‘$’ character. the numeric field age does not require any character. The date value has a small twist. SAS stores data only as numbers or characters.  As a matter of fact, all dates are stored as numeric values with the reference to the 1st January 1960 as the starting date. However, the date values are imported with the ‘informat’ (which would be discussed in more detail in future posts) indicating to SAS, that this field is a date. Further, in order to display the date appropriately in the dataset, the format statement (to discuss in more detail in future posts) is used. The output would be as shown in the screenshot below:

A look at the ‘Output Data’ tab shows how the data is stored in the dataset. Look at the section on the left pane where the data types are listed. The different symbols demonstrate the different datasets.

A few points to remember when importing a data set and deciding on the data types for the data values in the raw data sets. First of all, one must always aim to import in all the data fields as a character so that all values are captured so that a proper analysis may be one of the data quality overall. Secondly, there are functions for each data type (will be discussed in future posts).

Please find relevant reference books in the sidebar, if you should want to purchase the same for further studying. Please sign up for our newsletter, so that we may keep you posted on the latest activity on our website and Youtube channel.

Categories SAS