Audit Command Language – Data Issues

Welcome to the ‘Audit Command Language Tutorial for Beginners’. At this stage, we have discussed how to import delimited files and excel files into ACL projects to perform analytics. The next step is to perform checks to ensure that the data is imported correctly. There are two most common kinds of issues that are seen while importing data into any tool.

  1. Data Spill: This type of data issue causes the data to spill over into the next columns. This generally happens because there may be extra delimiters in some field (generally, this would happen in fields which are descriptions). In the screenshot below, the third line has an extra delimiter.
  2. Data Split: The second most common data issue is the splitting of data lines. This issues causes the data lines to be split across 2 or more lines in the raw file. The same would be reflected in the imported table in ACL. The screenshot below illustrates such an issue. The 6 line in the file has a new line character which is causing the data lines to be shifted to the next line.

Simply having the knowledge of such issues is not enough if these can’t be identified. After all, it is impossible to locate such issues in large files when working on a live project. The fastest way to identify these issues after import are as follows:

  1. Classify Command: In most kinds of data files, certain data columns have a certain set of unique values or similar looking values. For example, a data column like ‘Payment_Mode’ would only have a few possible values like ‘Card’, ‘Cash’ etc. Using the classify command on such a column, should have such values only. If there are any other values, go to the data lines with these values and verify those in the raw data file.
  2. Summarize Command: The summarize command works like the classify command to provide unique values in a column. However, along with the unique values in the column, the summarize command also provides information of the total line counts for each of the unique value and even subtotals for any numeric fields in the data. This information can be further used to locate any problem lines.

You can refer to the ACL Audit Command Language Help for more details on the syntax for these commands. There would be dedicated posts and videos detailing the use of the above statements.

There are some other techniques that you may leverage to identify such issues. There are some helpful tips in the videos below:

Data Spills

Data Split

There are a number of issues that can occur when importing data files. Now that you are aware of the most issues and how to locate the same, you can investigate your data imports for accuracy. This is probably the most tedious exercise, but also the most important, as incorrectly imported data would affect the outcome of the analytics performed.

Please keep practicing and feel free to reach out to us with your valuable feedback and comments. Please go to the website to review ACL script examples and ACL script commands sign up for our newsletter, so that we may keep you posted on the latest activity on our website and Youtube channel.