TFL programming in R versus SAS
Companies have been creating TFLs using SAS for decades but recently we’ve seen a shift in intention to do this activity in R. This trend will likely continue so it’ll be prudent to understand the differences between them, because knowing R is just the first step. The action of creating TFLs is typically static and would not harness the greatest power of R in terms of cutting-edge algorithms and coding, however, this would be one of the steps to a more balanced set of languages in the analysis ecosystem. The TFL step is quite a simple first step on the roadmap to increasing the usage of R in clinical trials analysis and reporting.
Commercial Software vs. Open Source
The SAS language is governed by SAS Institute and thereby has a support structure in place; that support structure tends to be slower to adapt to new functionalities. Because R is Open Source it can quickly adapt and update in a changing environment which is not supported by a single entity.
If new features are needed – whether they be for statistical or graphing analyses or purely output and file type-related – the R user base can update functions. The drawback is since the R user base has ‘day jobs’, the priority to update these functions may be lower on their list, may actually take longer, and may not align with the needs of the users desiring to use the functions.
Creating TFLs in General
Our general experience in using R to create TFLs is that it is possible, but certainly not necessarily intuitive or straightforward. It also is not making use of cutting-edge packages but can be reduced down to a fairly simple set. As always with creating tables and figures, the preparation of the data is often key. The generation of precise TFL tables in R is challenging due to the nature of the available packages and their usability, compared to the age-old, tried and trusted proc REPORT in SAS, however there are some advantages to the vector graphics available in R. There are a few other notable differences, some of which we cover below.
When creating TFLs, it’s important to understand how the data will be sorted. Firstly, how the data will be presented in the output. Then, which data will be selected if choosing based on position in a list (e.g. selecting first position or selecting last position).
SAS and R handle missing values differently in the sorting process. In SAS, while using a PROC SORT, missing values will sort before populated values whether they come from a character variable or a numeric variable.
In R, while using the order function, missing values will sort before populated values when ordering on a character column. Missing values will sort after populated values when ordering on a numeric column.
Let us assume there is a dataset (data frame) called anl_sort with a numeric variable (numvar) and a character variable (charvar).
Numeric Variable Sorting
SAS: proc sort data=anl_sort; by numvar; run;
R: anl_sort <- anl_sort[order(anl_sort$numvar), ]
Character Variable Sorting
SAS: proc sort data=anl_sort; by charvar; run;
R: anl_sort <- anl_sort[order(anl_sort$charvar), ]
The user must be aware of the differences if trying to reproduce the same results in SAS and R. Additionally, the differences may produce different results if any selection is done based on the position of a value in the data.
SAS and R have different algorithms for rounding data which could produce different results.
Within SAS, if the value is exactly ‘5’, it will round away from 0 (if >0, then round up; if <0 then round down). This method is what everyone is taught in school and is presumably what a customer would presume. However, it could be construed as biased as ‘5’ is exactly halfway between the rounding destinations and it always moves away from 0.
R uses the IEC 60559 method which states if the value falls exactly on ‘5’, it will round to the even number. 1.5 will round to 2; 2.5 will also round to 2. Different operating systems may produce different results based on how the number is stored in the system. This method tries to address the bias issues encountered in the normative rounding rules but could be construed as non-transparent unless it is clearly documented to the user which method was used.
This method is documented here: https://www.rdocumentation.org/packages/base/versions/3.5.2/topics/Round
This paper and partnership demonstrates the capability to use of R in everyday analysis, tapping into your workforce talent and redressing the balance of languages in your clinical trials analysis and reporting. As more companies and groups utilize R to create TFLs it’s important to start the process knowing the differences between R and SAS, since that’s been the industry standard. If you have any questions, d-wise is here to guide you as a partner. As part of our Acceleration Strategy services, we have helped multiple sponsors identify the best way to build TFLs in R, emphasizing the needs of the business. As a implementation partner, we also are here to help you mitigate any version or system issues if they arise.
If you would like to find out more about how we made the transition from SAS to R building TFLs, reach out to us!
*This article summarizes the groundbreaking 2020 phuse US Connect paper and presentation by Amol Waykar, d-wise Sr. Technology Consultant & Andrew Miskell, Data Standards Leader at Eli Lilly, comparing critical differences essential for submitting TFL to regulatory bodies in R.