Welcome readers. We hope the “BASE SAS Tutorial’ series is working out for you all. So far we have tried to distill all the important elements that would be needed for first timers to get started with BASE SAS programs. If there is any thought, please let us know. From this post onward we will be discussing more direct topics related to the SAS programming so you can start practicing your own programs.
In this post let us consider the most obvious data types encountered in SAS. These are character, numeric and date formats. Just like Audit Command Language, these are the most commonly used data types in SAS. These three data types essentially capture the three major questions to be addressed in any data source i.e. what, when and how many. After all, any data source is a record of any event or commodity at a given point in time. Any analysis is based on answering these questions when data is analyzed.
Let us consider the sample SAS program in the screenshot below:
Consider the scope of the above program. The above SAS program is attempting to read a data source (data lines in this case) of two lines with the information (name, age, and DOB). The three main data types are clearly illustrated here. Take a look at the input statement in the SAS program and observe that the three different variables are input in different ways. The character variable ‘Name’ is indicated as so by the ‘$’ character. the numeric field age does not require any character. The date value has a small twist. SAS stores data only as numbers or characters. As a matter of fact, all dates are stored as numeric values with the reference to the 1st January 1960 as the starting date. However, the date values are imported with the ‘informat’ (which would be discussed in more detail in future posts) indicating to SAS, that this field is a date. Further, in order to display the date appropriately in the dataset, the format statement (to discuss in more detail in future posts) is used. The output would be as shown in the screenshot below:
A look at the ‘Output Data’ tab shows how the data is stored in the dataset. Look at the section on the left pane where the data types are listed. The different symbols demonstrate the different datasets.
A few points to remember when importing a data set and deciding on the data types for the data values in the raw data sets. First of all, one must always aim to import in all the data fields as a character so that all values are captured so that a proper analysis may be one of the data quality overall. Secondly, there are functions for each data type (will be discussed in future posts).
Please find relevant reference books in the sidebar, if you should want to purchase the same for further studying. Please sign up for our newsletter, so that we may keep you posted on the latest activity on our website and Youtube channel.