Basic R Programming

DEVENDER PALSA
5 min readMay 14, 2021

1. What is R?
R is a programming language, and is an open source free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing.

2. What are the main features of R?
It is a widely used programming language for data manipulations, statistical computing and graphics.
It is an open source language so we can install R for free.
Even non-techies can understand and do programming in R
It has data structures and operators as like in c, c++, Java and Python.
It consists of R packages which are inbuilt which makes reporting the results of an analysis easy.
It has excellent visualization and graphical capabilities.
It is supported by a large user networks.
We can connect to any type of database.
We can find mostly all statistical algorithms.
It has Data Handling, Data mining , Text Mining, Big data & Machine learning capabilities

3. Setting up/Installation of R?
Download both R and RSTUDIO. Use below links to download.
Download the R
Download the RStudio IDE

4. RStudio
It is a free, and open-source integrated development environment (IDE) for R.
RStudio is an user-friendly UI for interacting with R not like R, R is command line interface so coding might be little slow for beginners.
RStudio gives us shortcuts for direct clicks.
All the commands typed in RStudio will be submitted to R, and the output will be fetched and displayed in R Studio.

5. R Environment
There are three main windows in R.
Console
Workspace
Output
A. R Console: This is where we type and submit the commands and mostly the output is presented in the console itself. Up and Down arrows used to recall previous command, and also we can type a partial command and then use ‘Tab’ key for autofill recommendations
B. Workspace:
While working in R session, all user defined objects are stored in a temporary(the objects in the workspace will last for just for that session, unless we save the workspace), working memory which is workspace.
We can enter commands interactively at the R user prompt and Up and down arrow keys scroll through your command history.
C. Output: It shows Graph , Table and Console outputs.

Practice:
Load Built-in datasets
library(datasets)

Example: Check top and last observation in the ‘iris’ dataset.
head(iris)
tail(iris)
summary(iris)

Create an Object:
“<-” or “->” or “
=” are used to indicate assignment to an object.
p<-9
k<-2+7
a<-2*45
a<-sqrt(117)

6. Working with R
An object can be created on the fly using “<-” or “->” or “=” and also we can remove objects from environment using ‘rm()’.

dp=”Devender Palsa”
dp

x <- rnorm(100,mean=20,sd=5)
x
mean(x)
m <- mean(x)
m
std<-sd(x)
sd
std

#Removing an object
rm(dp,x,std)
dp
x
std

7. R packages:
Packages are collections of functions, complied code, and sample data sets, and are stored under a directory called “library” in the R environment. By default, a set of packages(approximately 30) are installed during installation of base R. We can add required packages when they are needed for some specific purpose. For example, if we are working with data frames then probably we will use dplyr, or data.

There are now more than 16,000 R packages available for download.

Install Package using GUI:
Select the ‘Packages’ menu, and then select ‘Install R Packages’, a list of available packages on your system will be displayed, then select one and click ‘OK’, the package is attached to your current R session via the library function.
If we want to use any function, we need to install the package that contains it.

Install Package from CRAN:
install.packages(“<the package’s name>”)
library(“<the package’s name>”)

Install Package from GitHub:
We can use the install_github function from the remotes package.
remotes::install_github(“githubaccountname/packagename”)

Note: Download & Install Package, then Load a package and finally don’t forget to load the package after installation. Installed R package can be used only after attaching it.

Practice: Installing packages
Q. Draw a 3d scatter plot of three random vectors x, y, z of size 10000.
Note: Use rnorm() function to create these vectors and scatterplot3d package.

Answer:
#Installing package
install.packages(“scatterplot3d”)
#Load a package
library(scatterplot3d)

# Creating random vectors x, y, z
x <- rnorm(1000,mean=15,sd=3)
y <- rnorm(1000,mean=20,sd=5)
z <- rnorm(1000,mean=25,sd=8)
x
y
z

#plot
scatterplot3d(x,y,z)

8. List of some useful R packages:
There many useful R packages written by R’s active user community. Here i’m listing some of them.

A. Data handling Packages:
To Load Data:
RODBC, DBI, RMySQL, RPostgresSQL, RSQLite, googleAuthR, cloudyR project, downloader, XLConnect, xlsx, foreign, and haven.
To Manipulate Data:
dplyr, data.table , parallel, purrr, tidyr, plyr, reshape2, stringr, zoo, lubridate

B. Data visualization packages:
ggplot2, plotly, shiny, ggvis, rgl, dygraphs, htmlwidgets and rcdimple.

C. Data Analysis/ Modelling Packages:
survival, car, caret, mgcv, randomForest, Forecast, nnet, lme4/nlme, multcomp, vcd, glmnet, e1071.

D. Reporting Packages:
R Markdown, knitr, shiny, xtable, atable, ClinReport, R3port, greport, hreport, gt, dt, formattable, flextable, reactable, huxtable, officer and reporter.

9. Good Coding Practices(GCP) and Naming Convention in R

1. Must start with a letter (A-Z or a-z); or period(.) always followed by letter(e.g. ‘.devender’, not ‘.9devender’).
2. Can contain letters, digits (0–9), and/or periods “.”
3. Use underscores (_) to separate words within a name.
4. Use names that are concise and meaningful (e.g. virus <- ‘covid19’, not virus <- ‘bat’).
5. R is a case sensitive language example Data different from data.

Implementing a consistent coding convention. Basically, there are five naming conventions to choose from
1. underscore_separated: e.g. devender_palsa
2. period.separated: e.g. devender.palsa
3. all lowercase: e.g. devenderpalsa
4. lowerCamelCase: e.g. devenderPalsa
5. UpperCamelCase: e.g. DevenderPalsa

It’s better choosing one naming convention and sticking to it. For best practice we decided to use the underscore_separated naming convention because
a. all lowercase names are difficult to read especially for non-native readers.
b. period.separated names are confusing for users of Python and other languages in which dots are meaningful.
c. UpperCamelBack is ugly and requires excessive use of the shift button.

How to organize the code within each file?
1. Start each file with a comment(“#” is used to comment) saying who wrote it and when, what it contains, and how it fits into the larger program.
2. Load all required packages.
3. Source all required files
4. Start coding; break code into separate files (generally <2000–3000 lines).

Two important R packages are available to help you in applying the R coding style best practices:
1.styler: Which allows you to interactively restyle selected text, files, or entire projects includes an RStudio add-in (the easiest way to re-style existing code). It can be installed using the following R code: install.packages(“styler”)

2. lintr: It performs automated checks to confirm that you conform to the style guide. It can be installed using the following R code: install.packages(“lintr”).

Happy Learning !!! :)

REFERENCES:
Introduction to Clinical Trials
An Introduction to the Standard Data Tabulation Model (SDTM)
Link between Clinical Research and SDTM
Legacy clinical data for CDISC SDTM compliance and Data Unification
AMALGAMATION OF BIG DATA ANALYTICS, SDTM, LEGACY CLINICAL DATA
Analysis DataModel Implementation Guide(ADaMIG)

<script src=”https://platform.linkedin.com/badges/js/profile.js" async defer type=”text/javascript”></script>

<div class=”badge-base LI-profile-badge” data-locale=”en_US” data-size=”medium” data-theme=”dark” data-type=”VERTICAL” data-vanity=”devenderpalsa” data-version=”v1"><a class=”badge-base__link LI-simple-link” href=”https://in.linkedin.com/in/devenderpalsa?trk=profile-badge">Devender Palsa</a></div>

--

--

DEVENDER PALSA

SAS Programmer | Data Analytics | Clinical Trials | CDISC