A Quick Introduction to the R Ecosystem

What is R?

R is a dynamically typed interpreted programming language geared toward statistics and data analysis. However, R is quite capable as an expressive general-purpose language. Common applications include:

Predictive modeling.
Machine learning.
Extract-Transform-Load.
Data visualization.
Dashboards.
Spreadsheet replacement.
Report generation.
Interactive documents.
Web scraping.
Reactive websites.

R has a freely available feature-rich IDE, RStudio, developed by the namesake company. A wide variety of packages are available for installation from Comprehensive R Archive Network (CRAN), ROpenSci, Bioconductor, Neuroconductor and various code repositories accessible via the devtools package.

Why R?

Demand for data science and data engineering is heating up, making it vital to learn one or more programming languages proficient at data processing. Aside from R, popular languages in this domain include Python, Scala, F# and Julia. So why, personal preferences aside, would one choose R?

R is especially appealing for applications in its particular domain thanks to strong support by RStudio and a highly specialized community of professionals and scientists. At the time of writing, CRAN has 17464 packages available. The examples below are just a small sample of the interesting and unique features that R and its ecosystem provide.

Instant feedback

R offers a REPL, or read-eval-print-loop, for tinkering with code snippets and getting immediate feedback. The RStudio IDE has a pane for keeping track of variables, and it also has a spreadsheet view for tabular data. Most packages are well-documented and running the command ?objectName will display the documentation.

Vector operations

R allows defining basic sequences in a few different ways:

Consider this Fibonacci function:

fib = function(n) {
 if (n < 2) {
   1
 } else {
   fib(n - 1) + fib(n - 2)
 }
}

We can invoke the function how one may expect:

..or with sapply and a vector of values:

Many sorts of vector operations are supported. We can take the previous result, multiply every value by 2, and then add element-by-element to another sequence:

Pipe

The pipe function %>% is available in the magrittr package. A pipe is used to pass a value as the first argument to a function. The following expressions are equivalent:

sum(3, 4, 5)
3 %>% sum(4, 5)

Non-standard evaluation

R supports metaprogramming and non-standard evaluation. These features allow R to manipulate the abstract syntax tree (AST) before evaluation. One of the primary uses of this is capturing the names of arguments within a function, which allows for the following elegant call to the subset function:

Data frames

Data frames are not unique to R, but they are a built-in data type that's been around since R's predecessor language, S. Similar to a spreadsheet or a matrix, manipulating data stored in a data frame is surprisingly straightforward with helpful functions from the dplyr package. Let's take a look at a data frame called mtcars:

Suppose we want to find the top 5 cars having the highest horsepower-per-displacement ratio (a common bragging point for auto manufacturers), and the cars must have at least 200 horsepower. It's as easy as this:

So how did that work?

select limits the data frame to just the fields we care about
filter limits the results to horsepower greater than or equal to 200
mutate creates a new column called hp_per_disp and sets its value to be horsepower divided by displacement
arrange sorts the rows by hp_per_disp, and the negative sign indicates descending order
top_n limits the data frame to the top rows

Formulas

The tilde operator ~ is used to define formulas for statistical modeling. The following code block would be read as "z modeled by x and y and the interaction between x and y":

z ~ x + y + x:y

Here we can see how easy it is to make a visualization displaying the relationship between miles-per-gallon, cylinder count, and displacement:

m = model.frame(hwy ~ cyl + displ, data = mpg)  
plot(m)

In case you haven't seen a faceted graph like this before: each individual graph is at the intersection of some variables. For example, the top-right tile is a graph of mpg by displacement, and the tile below that is cylinders by displacement.

Purrr

The purrr package takes R's functional programming to another level by utilizing many of the powerful features we've just covered. Here's just a very small example where purrr's keep, walk, and reduce functions are used to print the primes from 1 to 1000 and then return their sum:

1:1000 %>%
 keep(isPrime) %>%
 walk(print) %>%
 reduce(`+`)

Rather than see everything get printed to the console at the end of the computation, we could modify the code a bit to watch the isPrime function generate results in real-time:

1:100000 %>%
  walk(~ ifelse(
    isPrime(.),
    print(.),
    NA
  ))

Shiny

Shiny is the premier web framework for R, enabling interactive graphics and components. Let's dive in by creating the skeleton for a shiny app:

# import the shiny web framework  
library(shiny)

# construct the UI  
ui = fluidPage(
   # displays text
   textOutput('message')  
)

# create the server logic  
server = function(input, output) {

   # provides text to the UI
   output$message = renderText({
       # random number function
       runif(1)
   })

}  
# run the app  
shinyApp(ui = ui, server = server)

The application is divided into the user interface (running in the browser) and the application logic running server-side. After installing shiny and running this code, RStudio will open our browser to localhost and we'll be greeted with a random number. This function in the UI displays text which is provided by the message renderer in the server:

What if we wanted to change the displayed number to change when the user clicks a button? First, let's add a button to the UI:

ui = fluidPage(
   # the button's id is "randomize_button", and the button's text is "Randomize"
   actionButton('randomize_button', 'Randomize'),
   textOutput('message')  
)

We should add an event listener to do something when the button is clicked. The way to do this in shiny is with the observeEvent function:

server = function(input, output) {
   # listens for events from the randomize_button
   observeEvent(input$randomize_button, {} )
...

This observer doesn't do anything yet. We can have it modify what's called a reactive value; a value that, when changed, automatically causes any dependent functions to re-evaluate. In this case, we want a reactive value to hold our random number.

server = function(input, output) {
   # creates a reactiveVal containing a random number  
   randomNumber = reactiveVal(runif(1))
...

The observer should modify randomNumber, and the text renderer should read from it. Putting it all together we have this:

server = function(input, output) {
   randomNumber = reactiveVal(runif(1))

   observeEvent(input$randomize_button, {
       # assign a new value to randomNumber
       randomNumber(runif(1))
   })
   
   output$message = renderText({
       # read the value from randomNumber
       randomNumber()  
   })
}

Looking good so far! Just to show how multiple components can be updated using these building blocks, here's what it would look like if we generated a random data frame and used the plotly library to generate a 3D mesh:

library(shiny)
library(plotly)

# construct the UI
ui = fluidPage(
   fluidRow(
       # display a button and table in the left-most column
       column(
           width = 3,
           actionButton('randomize_button', 'Randomize'),
           tableOutput('table'),
       ),
       # display a plot in the next column
       column(
           width = 9,
           plotlyOutput('plot')
       )
   )
)

# pure functions
randomDataFrame = function(num_rows, num_cols) {
   data.frame(replicate(num_cols, runif(num_rows)))
}

# create the server logic
server = function(input, output) {
   
   # define initial data for our table and plot
   randomData = reactiveVal(randomDataFrame(100, 3))
   
   # when the randomize button is clicked, modify the randomData variable
   observeEvent(input$randomize_button, {
       randomData(randomDataFrame(100, 3))
   })
   
   # populate the table with the randomData variable
   output$table = renderTable({
       randomData()
   })
   
   # populate the plot with a mesh plot of the randomData  
   output$plot = renderPlotly({
       plot_ly(
           randomData(),
           x = ~X1,
           y = ~X2,
           z = ~X3,
           type = "mesh3d",
           intensity = ~X3,
           colors= colorRamp(rainbow(5))
       )
   })
}

# run the app
shinyApp(ui = ui, server = server)

Other noteworthy packages

swirl
renv
tidyverse
wakefield
lubridate
plumber
htmlwidgets
shinytest
testthat
rmarkdown
bookdown
furrr

Deployment options

Rocker

Rocker is a base docker image of Debian with R installed and ready to use. Rocker also has alternative images configured to include the Tidyverse or Shiny.

shinyapps.io

Shinyapps is a cloud hosting environment offering push-button deployment of Shiny apps from the RStudio IDE. There's even a free tier.

Shiny Server

Shiny Server Open Source is a freely offered hosting solution for Shiny apps, interactive markdown documents, and visualizations.

RStudio Connect

Connect is an enterprise solution for hosting all sorts of R applications. Connect has push-button deployment via the RStudio IDE, as well as version rollbacks. Applications have configurable permissions and scaling.

ShinyProxy

ShinyProxy is a scalable service offering scaling and self-healing of containerized Shiny apps. LDAP authentication is also available. ShinyProxy may be seen as a free and open-source alternative to RStudio Connect.

Closing thoughts

I hope I've been able to pique your interest; there is much more to explore within R than I have described here. While R has plenty of quirks, it sets itself apart from other languages with exciting and unique features. Whether you're in need of statistical modeling, capable data visualization, or have an interest in functional and metaprogramming, give R a try for a sure way to learn some new approaches.